
Here is a small demo that shows how NVIDIA GPUs render fragments by visualizing some NVIDIA specific GLSL built-in variables:
– gl_ThreadInWarpNV
– gl_WarpIDNV
– gl_SMIDNV
These variables are available in a GLSL shader if the GL_NV_shader_thread_group extension is supported and enabled.
In short, in a NVIDIA GPU, a fragment is rendered by a thread. This thread executes the fragment shader for a particular fragment. Threads are grouped in WARPs and each warp contains 32 threads. WARPs are themselves grouped in SMs (Streaming Multiprocessor). Each SM contains 64 warps or 2048 threads.
The GPU of the GeForce GTX 1080 (a GP102) contains 20 SMs which gives us 1280 Warps or 40960 threads. 40960 is the maximum instantaneous threads in flight.
In a NVIDIA GPU, each GPU core can run up to 16 threads simultaneously. The GTX 1080 has 2560 cores which leads to 2560*16 = 40960 threads.

Then for a GTX 1080, the following relations are equivalent and give the same number of threads:
– Threads per GPU core (16) * number of GPU cores (2560) = 40960 threads
or
– Number of SMs (20) * number of threads per SM (2048) = 40960 threads
Each executing thread knows to what WARP it belongs thanks to the gl_WarpIDNV. Each WARP knows the SM it belongs to thanks to the gl_SMIDNV. And a thread in a WARP is identified by its ID: gl_ThreadInWarpNV.
The following screenshot shows two simple quads. The top quad is colored by gl_WarpIDNV. The bottom quad is colored by gl_SMIDNV. In both cases, the color (black to red) is related to the ID number: when ID=0, the color is black and color increases up to red for ID=max_value. On both quads, I also added in green (black to green) the threads of the first WARP (gl_WarpIDNV=0). As you can see, all threads of the first WARP do not work in the same zone but are spread over the whole quad.

Here is the vertex shader:
#version 150
in vec4 gxl3d_Position;
uniform mat4 gxl3d_ModelViewProjectionMatrix;
void main()
{
gl_Position = gxl3d_ModelViewProjectionMatrix * gxl3d_Position;
}
and the fragment shader:
#version 430
#extension GL_NV_shader_thread_group : require
uniform int stg_mode; // 0 (by WarpID) or 1 (by SMID)
out vec4 FragColor;
void main()
{
float r = 0.0;
float g = 0.0;
float b = 0.0;
if (gl_WarpIDNV == 0)
{
g = float(gl_ThreadInWarpNV) / float(gl_WarpSizeNV-1);
}
else
{
if (stg_mode == 0)
r = float(gl_WarpIDNV) / float(gl_WarpsPerSMNV-1);
else
r = float(gl_SMIDNV) / float(gl_SMCountNV-1);
}
FragColor = vec4(r, g, b, 1.0);
}
References
- GL_NV_shader_thread_group extension specification
- Life of a triangle – NVIDIA’s logical pipeline
- How many concurrent threads are running on my GeForce GTX 1080 Ti?