Why Larrabee Stumbled
In December 09, intel announced the release of its “Larrabee” GPU would be indefinitely delayed. Andrew Richards, CEO of Codeplay explains the reasons why intel failed its counterattack against nVidia and AMD in a very interesting article .
He starts by explaining that intel had no chance to catch up with nVidia or AMD by using a similar GPU architecture, so
What to do? Well, the solution for Intel was to put everything in software. (…) All they had to do was develop a very high-quality x86-based data-parallel multi-core CPU/GPU.
The big problem is that games graphics isn’t that much “data parallel”, as Richard explain :
Every pixel can be shaded at the same time, making it possible to implement very parallel, therefore very fast, data-parallel -processors. So, it is now widely understood that games and graphics exhibit the holy grail of performance: data parallelism.(…) Game graphics isn’t really data parallel at all. Not in the strict sense. The type of parallelism in game graphics is much more complex: it consists of several different types of parallelism, the most common being “pipeline parallelism”, which is why people talk about the “graphics pipeline”. What actually happens in game graphics is that there is special hardware, which is different for different GPUs, which makes sure that only pixels that don’t overlap get given to the data-parallel shader cores.
On Larrabee, much of this work has to be done in software by developers who are no specialists of multithreaded programming, on a hardware that requires a lot of bandwidth to guarantee cache coherency between up 16 or 32 cores. With 48 cores, the 1024 bit wide on chip bus would saturate : Larrabee is not a scalable architecture.
The biggest shock for Larrabee was the discovery that it takes longer to implement something in software on a multi-core CPU than custom hardware. Larrabee should be able to have the latest DirectX features working before the competition, but in reality they were last. The first was AMD, who actually implemented more DirectX features in hardware than NVIDIA. So, we have the opposite of conventional wisdom: custom-hardware gives better time-to-market than software-on-a-multicore-CPU.
Andrew Richards ends his paper by claiming that the best architecture will be the one that requires the lowest power for the task, as a risk narrows : “dark silicon” (chips with so many transistors that they cannot be powered all together) and concludes
Intel will, of course, have to produce a GPU. But for the meantime, a Larrabee without graphics, but with data-parallel cores, will be a formidable HPC (High Performance Computing) accelerator.
This opinion looks shared by intel , so Larrabee might come back from the dead, one day…
- Andrew Richards,”Why Intel Larrabee Really Stumbled: Developer Analysis“, Bright Side of News, May 27, 2010
- Bill Kircos, “An Update On Our Graphics-related Programs“, Technology@Intel, May 25, 2010