[TEST] GPU Computing – GeForce and Radeon OpenCL Test (Part 4 and Conclusion)

2010/01/19 JeGX

OpenCL - GPU Caps Viewer - NVIDIA - AMD

OpenCL 1 million particles demo
OpenCL 1,000,000 particles demo

OpenCL 1 million particles demo
OpenCL 1,000,000 particles demo (with sine)

Related articles:

Third OpenCL Test: 1,000,000 Particles

This demo is a direct implementation of the 1,000,000 particles demo you can find HERE. There is a batch file (Start_OpenCL_Particles.bat) in GPU Caps folder that allows you to use either the normal version of the OpenCL kernel or the sine version. You can also change the number of particles (/cl_particles_gpu=2000000 for example).

This demo use an interesting feature of OpenCL: GL interoperability or GL Interop. GL Interop is the abilty to communicate between OpenCL and OpenGL: OpenCL can directly update OpenGL buffers without passing via the host app. By default GL Interop is disabled on GPU Caps. To enable it, use this command line parameter: /cl_demo_gl_interop_enabled.

When GL Interop is enabled, particles positions are computed by an OpenCL kernel and OpenCL automatically updates an OpenGL VBO with the new positions. This OpenGL VBO is used for the rendering of particles.

When GL Interop is disabled, particles positions are still computed by an OpenCL kernel but this time, all positions are copied (first copy) from OpenCL into a memory buffer in host app memory space. This memory buffer is then used to update (second copy) an OpenGL VBO for particles rendering. GL Interop avoids both extra copies.

All tests have been done with GPU Caps 1.8.2 PRO (GPU Caps Viewer 1.8.2 is also fine but there is no benchmarking support) with the following system:
– Windows Vista SP2 32-bit
– system memory: 2GB 1333 DDR3
– CPU: Intel Core 2 Extreme CPU X9650 @ 3.00GHz
– NVIDIA driver: R195.62
– AMD driver: Catalyst 9.12 hotfix

OpenCL 1M Particles demo - No GL Interop
1,000,000 particles – GL Interop disabled

OpenCL 1M Particles demo - GL Interop
1,000,000 particles – GL Interop enabled

Currently Radeons don’t support GL Interop due to the lack of cl_khr_gl_sharing extension on AMD’s platform (that’s why GL Interop is disabled by default in GPU Caps). See HERE for more details.

But on NVIDIA platform, this extension is supported and the boost of performance is impressive: from 85 FPS to around 200 FPS for the GTX 280! Same thing for the GTS 250. And the GTS 250 is faster than the Radeon HD 5870. Still a story of optimization in my OpenCL for the HD 5000 series or a bug in AMD’s OpenCL implementation?

Fourth OpenCL Test: 4D Quaternion Julia

OpenCL 4D Quaternion Julia demo

This demo is a direct port of the Ray Traced Quaternion Julia Set sample. The OpenCL kernel is available in the media folder in GPU Caps directory. Quaternion Julia Set has the same GL interop option than 1M particles demo.

OpenCL Ray Traced Quaternion Julia Set demo - No GL Interop
Ray Traced Quaternion Julia Set – GL interop disabled

I’m sorry but I can’t find the graph for GL Interop enabled and I don’t remember if there was a particular problem…

Conclusion

These results reflect the current state of NVIDIA and AMD OpenCL drivers. So I guess that all these results will vary with new ForceWare and Catalyst drivers (and my changes in my OpenCL code 😉 ).

With first OpenCL drivers (R190.89 or R195.39), GeForce cards required some optimizations in the OpenCL code to reveal their potential while Radeon cards were not much affected by the optimizations (especially explicit work group size and native_ functions).

Now with the latest R195.62, NVIDIA has improved the OpenCL implemetation and things go really better.

But there are still some problems like a GTS 250 faster than a GTX 280 in some tests or no GL Interop support in AMD platform.

Currently OpenCL is not yet mature and it’s totally understandable: first OpenCL implementations have emerged few months ago…

I’ll do again a benchmark session when OpenCL drivers will be more stable and, above all, when AMD will properly support OpenCL (GL Interop, no need to install the ATI Stream SDK).

And with NVIDIA’s GT100 that should be available soon, new OpenCL tests will be very interesting. So stay tuned!

Don’t hesitate to post your remarks or interpretations of these results.

12 thoughts on “[TEST] GPU Computing – GeForce and Radeon OpenCL Test (Part 4 and Conclusion)”

Pingback: [TEST] GPU Computing – GeForce and Radeon OpenCL Test (Part 1) - 3D Tech News, Pixel Hacking, Data Visualization and 3D Programming - Geeks3D.com

Robin 2010/01/19 at 14:30

I bought a GT240 yesterday for game dev. Nvidia is taking too long time for me.

Mohamad 2010/01/19 at 19:27

This shows nVidia is much mature than ATI when it comes to gpgpu. Fermi is not here yet and GT200 is comparable to Radeon 5000.

During my tests, ati’s driver has a lot of issues. Now I am using GTX 280 and I am happy. No bugs in driver and I can use linux much safer and bug free.

Pingback: ATI Stream SDK 2.01 Adds D3D Interop to OpenCL - 3D Tech News, Pixel Hacking, Data Visualization and 3D Programming - Geeks3D.com

Pingback: NVIDIA R197.15: First OpenGL 3.3 Drivers Available and 12 New OpenGL Extensions - 3D Tech News, Pixel Hacking, Data Visualization and 3D Programming - Geeks3D.com

Pingback: ATI Stream SDK 2.1 Adds OpenGL Interop to OpenCL - 3D Tech News, Pixel Hacking, Data Visualization and 3D Programming - Geeks3D.com

Grey 2010/08/17 at 15:43

Is this benchmark done again wiht the new SDK and are the results available? I’m searching for the best fitting graphic card for my master thesis with openCL. It should not be too expensive. I thought a HD5870 would be a good choise (~370€), but now I’m not sure any more.

Grey 2010/08/18 at 09:59

Ah, I see the page was updated. Thank you. Can any one do this benchmark with a Nvidia 480GTX? It would be nice to know the result of this card, too.

JeGX Post Author2010/08/18 at 10:16

I have the scores for a GTX 460 + Julia 4D:

– GL interop enabled
— NVIDIA GeForce GTX 460 (R258.96): 116 FPS

– GL interop disabled
— NVIDIA GeForce GTX 460 (R258.96): 103 FPS

Grey 2010/08/18 at 13:28

So this is already much faster than the 5870. I need it to implement some sort of ray traycing. Looks like the Nvidia cards are much faster in opencl than the Ati cars….

jinglin.zhang 2010/10/27 at 18:51

Can you share me the code about the test2 which i am interested. i am a start leaner about the Opencl and i use the GTX250 and NIDIA SDK.|Thanks for you contribution and sharing.

Simon 2010/11/04 at 07:20

Why are most OpenCL benchmarks 3D visualisations with results in frames per second? I thought that’s what graphics cards were already designed to do through other 3D SDKs like OpenGL and DirectX? The real benefit of OpenCL is its ability to support computational processing for certain highly parralel applications. I would like it if someone could design some benchmarks that achieve something other than 3D visualisation…

Comments are closed.