Programmable Blending on Mobile and Desktop GPUs (OpenGL)



Sunrise...



In the latest iteration of iOS (iOS 6), APPLE has exposed a new extension in OpenGL ES: GL_APPLE_shader_framebuffer_fetch. Even if it’s limited to mobile platform, this extension is interesting because it brings the programmable blending. On all current OpenGL implementation, blending is configurable (glBlendFunc) but not programmable.

In short, GL_APPLE_shader_framebuffer_fetch allows to read, from a pixel shader, the value of the framebuffer. This value can be then combined (it’s the programmable blending) with the current color value of the fragment shader to update the framebuffer.


Update: NVIDIA already exposes the same extension for its Tegra plateform: GL_NV_shader_framebuffer_fetch. According to the spec, GL_NV_shader_framebuffer_fetch exists since 2006. More infor can be found HERE.


Here is the overview of GL_APPLE_shader_framebuffer_fetch from the specification:

Conventional OpenGL blending provides a configurable series of operations
that can be used to combine the output values from a fragment shader with
the values already in the framebuffer. While these operations are
suitable for basic image compositing, other compositing operations or
operations that treat fragment output as something other than a color
(normals, for instance) may not be expressible without multiple passes or
render-to-texture operations.

This extension provides a mechanism whereby a fragment shader may read
existing framebuffer data as input. This can be used to implement
compositing operations that would have been inconvenient or impossible with
fixed-function blending. It can also be used to apply a function to the
framebuffer color, by writing a shader which uses the existing framebuffer
color as its only input.



GL_APPLE_shader_framebuffer_fetch introduces a new built-in variable in the GLSL: gl_LastFragData. gl_LastFragData is actually an array:

#extension GL_APPLE_shader_framebuffer_fetch : enable
vec4 gl_LastFragData[gl_MaxDrawBuffers];



For example, the additive belnding can be achieved with something like this:

#extension GL_APPLE_shader_framebuffer_fetch : require
void main()
{
  vec3 c = get_some_kool_color();
  gl_FragColor.rgb = c + gl_LastFragData[0].rgb;
  gl_FragColor.a = 1.0;
}



Now the important question: will we see the fetching of the framebuffer on current desktop graphics hardware in a near future? The very short answer is: no. And the short explanation is: the design of the graphics hardware on mobile plateforms makes it possible to access to the content of the framebuffer from the fragment shader while this is not possible on current desktop GPUs because it would involve significant architectural changes. Maybe in few years (NVIDIA Maxwell or AMD Pirates Islands GPUs?) …

For the curious reader, here’s a more detailed explanation that @grahamsellers gave me about the difference between mobile and desktop GPUs:


The reason that this works (or is possible) on the mobile cores is that they are tile based deferred renderers and shade all the geometry that touches a tile in huge batches. When the GPU is about to shade a tile, the contents of the tile are brought into on chip registers (not even memory, real registers) and stored there for the life of the tile. As registers, the GPU has extremely low latency access to them and so can provide their current values to the shader core.

On a desktop core, although rendering is still tile-based, it is not deferred and we have more of an immediate mode architecture where primitives are rendered as they are received by the core and are not batched up. The tile based blending hardware (the render backend) still keeps data on-chip, but using more of a traditional cache architecture. The shader core has no direct access to the contents of the cache. Also, because geometry is not sorted prior to blending, there is dedicated hardware in the backend to re-sort all the fragments to keep blending order correct. Primitives may be shaded out of order, so while the shader is running the most recent data isn’t even in the cache yet, which means having access to it really wouldn’t help.






And to end up this article, here are some related links:




Geeks3D.com

↑ Grab this Headline Animator