OpenGL 4.2 Atomic Counter Demo: Rendering Order of Fragments


OpenGL 4.2 Atomic Counter Demo, GeeXLab



Article menu:



1 – Atomic Counters: Overview

Some weeks ago I found this video showing in which order the fragments are rendered:






Pretty cool. But how does it work? Answer: thanks to the use of atomic counters.

Atomic counters are a new feature of OpenGL 4.2. Atomic counters are available in all stages: vertex, pixel, geometry, tessellation control and evaluation shaders. The complete specification is available here: GL_ARB_shader_atomic_counters.

OpenGL logo



In a word, an atomic counter is memory buffer (actually a buffer object that stores one or several unsigned int) that can be incremented (GLSL built-in function atomicCounterIncrement()) or decremented (GLSL built-in function atomicCounterDecrement()) by a shader.

GPU Caps Viewer, Atomic Counter, OpenGL 4.2
Information about atomic counters in GPU Caps Viewer



One simple but very interesting application of atomic counters is the visualization of the rendering order of fragments. In the pixel shader, an atomic counter is incremented each time the shader is invoked. Then if you convert the counter value to a color, you can quite easily visualize the order in which fragments are rendered. Nice!

Once the buffer object of the atomic counter is initialized and bound (see OpenGL code snippets at the end of the article), we can write to the atomic counter with the GLSL function atomicCounterIncrement().

The following code allows to shows where the ten first fragments are rendered:

#version 420 compatibility
layout(binding=0, offset=0) uniform atomic_uint ac;
void main(void)
{
  uint counter = atomicCounterIncrement(ac);
  if (counter < 10)
    gl_FragColor = vec4(1, 0, 0, 1);
  else
    gl_FragColor = vec4(0, 0, 0, 1);
}



The following images shows these 10 first fragments:

GeeXLab, Atomic Counter demo on GeForce GTX 460
The ten first fragments

GeeXLab, Atomic Counter demo on GeForce GTX 460
Zoom in on the ten first fragments



Now let’s see the rendering order of all pixels of a quad (made up of 2 triangles) with the following shader:

#version 420 compatibility
layout(binding = 0, offset = 0) uniform atomic_uint ac;
void main(void)
{
  uint counter = atomicCounterIncrement(ac);
  float r = (counter/255) / 255.f;
  gl_FragColor = vec4(r, 0, 0, 1);
}

On the screen, the dimensions of the quad are 256×256 (65536 pixels to render). First fragments are rendered in black, last fragments are rendered in red. Here is the rendering order of a GeForce GTX 460:

GeeXLab, Atomic Counter demo on GeForce GTX 460
Atomic counter, pixels rendering order on a GeForce GTX 460



And now the rendering order of a Radeon HD 6970:

GeeXLab, Atomic Counter demo on Radeon HD 6970
Atomic counter, pixels rendering order on a Radeon HD 6970



Or do you prefer the rendering order of a Radeon HD 7770:

GeeXLab, Atomic Counter demo on Radeon HD 7770
Atomic counter, pixels rendering order on a Radeon HD 7770



As you can see, the way the pixels are rendered (rasterizer pattern) is quite different. The GeForce draws many small areas and fills both triangles from top to the bottom while the Radeon GPU draws larger zones and fills one triangle from top to bottom and the second from bottom to top. The GPU of HD 6000 seems to draw fragments in group of 32×32 pixels (size of the squares you can see on the screenshot). Radeon HD 7000 GPU looks like more to GeForce GTX 400 rendering.



2 – Atomic Counters: the Demo


The demo for GeeXLab is available in the GLSL_Atomic_Counter/ folder of GeeXLab code sample pack:
Download GeeXLab Code Samples Pack Version 2009.10.02
This demo requires GeeXLab 0.3.3+.


How to run the demo: start GeeXLab and drop the demo file (DEMO_Atomic_Counter.xml) in GeeXLab. That’s all.

GeeXLab, OpenGL 4.2 demo: atomic counter, live coding



3 – Atomic Counters: OpenGL Details

For OpenGL developers, here are some pieces of code that show you how to init and use atomic counters. First thing, the OpenGL code to initialize an atomic counter buffer object:

GLuint ac_buffer = 0;
glGenBuffers(1, &ac_buffer);
glBindBuffer(GL_ATOMIC_COUNTER_BUFFER, ac_buffer);
glBufferData(GL_ATOMIC_COUNTER_BUFFER, sizeof(GLuint), NULL, GL_DYNAMIC_DRAW);
glBindBuffer(GL_ATOMIC_COUNTER_BUFFER, 0);



Now how to enable the atomic counter buffer object:

glBindBufferBase(GL_ATOMIC_COUNTER_BUFFER, 0, ac_buffer);



And to end up, how to reset the value of the atomic counter:

glBindBuffer(GL_ATOMIC_COUNTER_BUFFER, ac_buffer);
GLuint* ptr = (GLuint*)glMapBufferRange(GL_ATOMIC_COUNTER_BUFFER, 0, sizeof(GLuint),
                                        GL_MAP_WRITE_BIT | 
                                        GL_MAP_INVALIDATE_BUFFER_BIT | 
                                        GL_MAP_UNSYNCHRONIZED_BIT);
ptr[0] = value;
glUnmapBuffer(GL_ATOMIC_COUNTER_BUFFER);
glBindBuffer(GL_ATOMIC_COUNTER_BUFFER, 0); 

12 thoughts on “OpenGL 4.2 Atomic Counter Demo: Rendering Order of Fragments”

  1. Groovounet

    @fellix

    The froggy fragsniffer works on AMD hardware has well and there is no OpenGL drivers issue for that.

    However: using the The froggy fragsniffer requires tuning even on NVIDIA and will either produce not visible result or reboot your PC.
    It also require WinXP because WinVista+ forbide the OpenGL drivers to directly render to the framebuffer.

    On Radeon there is no reboot but we can’t see the expected result from Fragsniffer because AMD drivers render to a texture even on WinXP and then blit to the framebuffer.

  2. Corwin

    what a mess! I don’t get why there is something like that. I always thought that framebuffer fills pixel by pixel, line by line, like old good ZX Spectrum screen

  3. Robert

    What a timing…
    I just finished my own ‘rasterization order visualization’ demo based on atomic counters:
    http://renderingpipeline.com/2012/03/gpu-rasterizer-pattern/

    But instead of coloring the fragments based on the counter value I first stored them in an integer render target so in a second pass I can watch the rasterization in ‘slow motion’ (videos are online of a quad and a killeroo).

    This kind of demo is realy the ‘hello world’ of atomic counters (or even OpenGL 4.2?) 😉

  4. Groovounet

    This would be very inefficient for the multiple caches a GPU contains. A more efficient and classic pattern is this one: http://en.wikipedia.org/wiki/Z-order_curve

    We see that most of a processing order of the Radeons is done Z-order because atomic operation and fragment execution are synchronous.

    However, on GeForce there are asynchronous so we can only see the high level patterns of the fragment executions but I am pretty convince that the underlying execution of the fragments follows a sort of Z-order approach.

  5. jK

    nice images :)
    The chess pattern on the Radeon HD 6970 is interesting.
    Neither assumed that they changed such fundamental things between the HD6xxx & HD7xxx series.

  6. Corwin

    it seems my 560Ti has a problem with this demo, I got only around 30 fps and demo is stuttering badly, I use the latest WHQL drivers

  7. mbentrup

    Hmm, couldn’t you use this to measure vertex cacheing by incrementing a per-vertex counter in the VS ?

    That way you should be able to see how often a vertex shader runs for each vertex.

  8. Robert

    mbentrup: You can’t do that. While you can visualize the run-order of the vertex shader and how it’s interlaced with the fragment shader (see my post about rasterization order and specificly this video: http://www.youtube.com/watch?v=5vEAO6icz7M ), vertex caching (post-transformation cache) must be switched off when you use atomic counters. The cache assumes that the vertex shader is deterministic which it was up till OpenGL 4.2 with the introduction of atomic counters – such a cache itself is not part of the OpenGL specs and caching a VS that is indeterministic would produce non spec-conform output. The shader compiler has to check for such side-effects and switch the cache off in such cases… (similar: the early z-test gets switched off as soon as the FS modifies the Z value)

  9. mbentrup

    The OpenGL 4.2 spec explicitly states in the chapter about Memory Access Ordering that the number of Vertex shader invocations for a single vertex is implementation-defined, so there’s no reason to turn the Vertex cache off.

  10. Robert

    mbentrup: your right. Sadly, there is no guarantee that the behaviour with the counters is the same as without them. But worth a shot.

Comments are closed.