[Test] OpenGL Geometry Instancing: GeForce GTX 480 vs Radeon HD 5870

Article Index


3 – Geometry Instancing Techniques



Here is the description of each geometry instancing technique I used:

  • F2 key: simple instancing: there is one source for geometry (a mesh) and this geometry is rendered for each instance. The tranformation matrix of each instance is calculated on the CPU. OpenGL rendering uses the glDrawElements() function. The number of draw calls is equal to the number of instances. The most simple and inefficient geometry instancing technique…

  • F3 key: slow pseudo-instancing: there is one source for geometry (a mesh) and this geometry is rendered for each instance. Now the tranformation matrix is computed on the GPU and per-instance data is passed via uniform variables (a vec4 for teh position and a vec4 for the orientation). OpenGL rendering uses the glDrawElements() function. The number of draw calls is equal to the number of instances. This technique is faster than F2 (simple instancing).

  • F4 key: Pseudo-Instancing: there is one source for geometry (a mesh) and this geometry is rendered for each instance. The tranformation matrix is computed on the GPU and per-instance data is passed via persistent vertex attributes. Persistent vertex attributes are for example the normal, the texture coordinates or the color (respectively set with glNormal(), glMultiTexCoord() and glColor()). This technique has been shown by NVIDIA in the following whitepaper: GLSL Pseudo-Instancing. OpenGL rendering uses the glDrawElements() function. The number of draw calls is equal to the number of instances. pseudo-instancing is extremely efficient on NVIDIA hardware. See results below…

  • F5 key: geometry instancing: it’s the real hardware instancing (HW GI). There is one source for geometry (a mesh) and rendering is done by batchs of 64 instances per draw-call. Actually on NVIDIA hardware, 400 instances can be rendered with one draw call but that does not work on ATI due to the limitation of the number of vertex uniforms. 64 instances per batch work fine on both ATI and NVIDIA. The tranformation matrix is computed on the GPU and per-batch data is passed via uniform arrays: there is an uniforn array of vec4 for positions and another vec4 array for rotations. OpenGL rendering uses the glDrawElementsInstancedARB() function. The GL_ARB_draw_instanced extension is required. The HW GI allows to drastically reduce the number of draw calls: for the 20,000-asteroid belt, we have 20000/64 = 313 draw calls instead of 20,000.

  • F6 key: geometry instancing with uniform buffer: it’s still the real hardware instancing (HW GI). There is one source for geometry (a mesh) and rendering is done by batchs of… 1000 instances per draw-call. A 1000-instance batch works fine on the GTX 480 and the HD 5870. The tranformation matrix is computed on the GPU and per-batch data is passed via a big buffer of uniforms: an uniform buffer object or UBO. This technique requires the support of the GL_ARB_uniform_buffer_object extension. UBO allows a huge reduction of the number of draw calls: for 20,000 instances and 1000 instances per draw call, the complete asteroid belt requires 20 draw calls! Like the previous GI technique, OpenGL rendering uses the glDrawElementsInstancedARB() function.




Pages: 1 2 3 4 5




Geeks3D.com

↑ Grab this Headline Animator