OpenCL 1,000,000 particles demo
OpenCL 1,000,000 particles demo (with sine)
- GPU Computing: GeForce and Radeon OpenCL Test (Part 1)
- GPU Computing: GeForce and Radeon OpenCL Test (Part 2)
- GPU Computing: GeForce and Radeon OpenCL Test (Part 3)
Third OpenCL Test: 1,000,000 Particles
This demo is a direct implementation of the 1,000,000 particles demo you can find HERE. There is a batch file (Start_OpenCL_Particles.bat) in GPU Caps folder that allows you to use either the normal version of the OpenCL kernel or the sine version. You can also change the number of particles (/cl_particles_gpu=2000000 for example).
This demo use an interesting feature of OpenCL: GL interoperability or GL Interop. GL Interop is the abilty to communicate between OpenCL and OpenGL: OpenCL can directly update OpenGL buffers without passing via the host app. By default GL Interop is disabled on GPU Caps. To enable it, use this command line parameter: /cl_demo_gl_interop_enabled.
When GL Interop is enabled, particles positions are computed by an OpenCL kernel and OpenCL automatically updates an OpenGL VBO with the new positions. This OpenGL VBO is used for the rendering of particles.
When GL Interop is disabled, particles positions are still computed by an OpenCL kernel but this time, all positions are copied (first copy) from OpenCL into a memory buffer in host app memory space. This memory buffer is then used to update (second copy) an OpenGL VBO for particles rendering. GL Interop avoids both extra copies.
All tests have been done with GPU Caps 1.8.2 PRO (GPU Caps Viewer 1.8.2 is also fine but there is no benchmarking support) with the following system:
– Windows Vista SP2 32-bit
– system memory: 2GB 1333 DDR3
– CPU: Intel Core 2 Extreme CPU X9650 @ 3.00GHz
– NVIDIA driver: R195.62
– AMD driver: Catalyst 9.12 hotfix
1,000,000 particles – GL Interop disabled
1,000,000 particles – GL Interop enabled
Currently Radeons don’t support GL Interop due to the lack of cl_khr_gl_sharing extension on AMD’s platform (that’s why GL Interop is disabled by default in GPU Caps). See HERE for more details.
But on NVIDIA platform, this extension is supported and the boost of performance is impressive: from 85 FPS to around 200 FPS for the GTX 280! Same thing for the GTS 250. And the GTS 250 is faster than the Radeon HD 5870. Still a story of optimization in my OpenCL for the HD 5000 series or a bug in AMD’s OpenCL implementation?
Fourth OpenCL Test: 4D Quaternion Julia
OpenCL 4D Quaternion Julia demo
This demo is a direct port of the Ray Traced Quaternion Julia Set sample. The OpenCL kernel is available in the media folder in GPU Caps directory. Quaternion Julia Set has the same GL interop option than 1M particles demo.
Ray Traced Quaternion Julia Set – GL interop disabled
I’m sorry but I can’t find the graph for GL Interop enabled and I don’t remember if there was a particular problem…
These results reflect the current state of NVIDIA and AMD OpenCL drivers. So I guess that all these results will vary with new ForceWare and Catalyst drivers (and my changes in my OpenCL code 😉 ).
With first OpenCL drivers (R190.89 or R195.39), GeForce cards required some optimizations in the OpenCL code to reveal their potential while Radeon cards were not much affected by the optimizations (especially explicit work group size and native_ functions).
Now with the latest R195.62, NVIDIA has improved the OpenCL implemetation and things go really better.
But there are still some problems like a GTS 250 faster than a GTX 280 in some tests or no GL Interop support in AMD platform.
Currently OpenCL is not yet mature and it’s totally understandable: first OpenCL implementations have emerged few months ago…
I’ll do again a benchmark session when OpenCL drivers will be more stable and, above all, when AMD will properly support OpenCL (GL Interop, no need to install the ATI Stream SDK).
And with NVIDIA’s GT100 that should be available soon, new OpenCL tests will be very interesting. So stay tuned!
Don’t hesitate to post your remarks or interpretations of these results.