NVIDIA GF100 Architecture Details

2010/01/18 JeGX

[youtube j9F3W-v6PNI]

GT100 tesselation demo

After the first global overview in September 2009, NVIDIA has released new details on its new Fermi GT100 architecture. Here is a summary of NVIDIA’s GT100 architecture features in equations:

The CUDA core is the primary working unit of the GF100 (Each CUDA core is fully IEEE 754-2008 compliant) – GF100 = 512 CUDA cores
Streaming Multiprocessor (SM) = 32 CUDA cores – GF100 = 16 SM
4 SFU per SM (SFU – Special Function Unit – executes transcendental instructions such as sin, cosine, …) – GF100 = 64 SFUs
Graphics Processing Cluster (GPC) = 4 SM – GF100 = 4 GPC
1 Raster Engine per GPC (raster engine = rasterization, z-culling). A raster engine processes 8 pixels per clock – GF100 = 32 pixels per clock
1 PolyMorph Engine per SM (PolyMorph Engine: execution unit that handles geometry for GF100: vertex fetch, tessellation, viewport transform, attribute setup, and stream output) – GF100 = 16 PolyMorph Engines
4 Texture Units per SM – GF100 = 48 Texture Units
6 partitions of 8 ROPs (ROPs perform blending or AA) – GF100 = 48 ROPs