« Last post by Stefan on March 22, 2015, 08:30:53 AM »
Microsoft Corp is making its biggest push into the heavily pirated Chinese consumer computing market this summer by offering free upgrades to Windows 10 to all Windows users, regardless of whether they are running genuine copies of the software.
The move is an unprecedented attempt by Microsoft to get legitimate versions of its software onto machines of the hundreds of millions of Windows users in China. Recent studies show that three-quarters of all PC software is not properly licensed there.
Unlike the competition, Intel’s shader hardware has a full set of registers dedicated to each hardware thread. The red and green team each lose thread occupancy if a shader has a lot of register pressure, but not the blue team, they just exploit their ridiculous process advantage and pack the little suckers in, and then stop worrying about it. Our shader has quite a bit of register pressure in it, but that doesn’t hurt Intel’s concurrency one bit. Their enormous register file functions as a big on-chip buffer.
Even though it is possible to implement geometry shaders efficiently, the fact that two of the three vendors don’t do it that way means that the GS is not a practical choice for production use. It should be avoided wherever possible.
It is flawed, in that it injects a serialized, high bandwidth operation into an already serialized part of the pipeline. It requires a lot of per-thread storage. It is clearly a very unnatural fit for wide SIMD machines. However, this little exercise has made me wonder if it can’t be redeemed by spreading a single instance across multiple warps/wavefronts, squeezing ILP out of a DLP architecture. Perhaps I’ll try and write a compute shader that does this.