GPU RayCaster Demo Using NVIDIA CUDA

Demo with post-processing effect

Demo without post-processing effect

MX^ADD (i hope the nickname is right), a game developer, has released a raycaster demo that uses GPU acceleration via NVIDIA CUDA. CUDA is used to build the KD-Tree that is the main partitioning structure for the ray casting algorithm. This demo requires a CUDA-capable graphics card with Computing Capability 1.1 or better. The Computing Capability describes the features supported by a CUDA hardware:

Multiprocessors Compute Capability
GeForce GTX 280 30 1.3
GeForce GTX 260 24 1.3
GeForce 9800 GX2 2×16 1.1
GeForce 9800 GTX 16 1.1
GeForce 8800 GTX/Ultra 16 1.0
GeForce 8800 GT 14 1.1
GeForce 8800 GTS 12 1.0
GeForce 9600 GSO 12 1.1
GeForce 9600 GT 8 1.1
GeForce 8400 GS/GT 2 1.1

A up to date table can be found HERE.

From this table you can see the old but powerful GeForce 8800 GTX has only Computing Capability 1.0 so if you have such a card, you can say bye bye to the demo. Even a weak GeForce 8400 GS can run the demo (in slide show mode but it can run it!). I don’t understand why NVIDIA has limited the compute capability of the GeForce 8800 GTX. I don’t think it’s a hardware problem… Actually it’s a hardware problem (See update 2 at the end of the post).

This demo also requires the latest DirectX runtime (Nov 2008 – d3dx9_40.dll). But I didn’t manage to launch it (the demo!). I emailed the author to know more about the problem (see reply in Update 1).

Anyway, full source code is provided, so even if you’re unlucky like me, you can all the same dive into the source code to learn how to use CUDA…

Update 1
I just received an email from the author. The demo requires Microsoft Visual C++ 2008 Redistributable Package.
Ok, after that, you have to modify the registry with the file Data/register.reg and create the folder d:/MXRayCaster/. Once done, you can launch the demo… And don’t press the P key…

Update 2: GeForce 8800 GTX and Compute Capability 1.0
The author just emailed me some additional information about the compute capabilty of the GeForce 8800 GTX:

GeForce 8800 GTX is compute capabilty 1.0 and it is a hardware limitation because 8800 GTX’s memory controller is unable to do atomic operations across blocks, 8800 GTX was made on the older chip G80, all others GPU from 8*00 series were made on G92 (editor’s note: from CUDA doc, the main feature of compute capabilty 1.1 is the support for atomic functions operating on 32-bit words in global memory). The G80 chip was build before NVIDIA pushed CUDA 1.0 to the public, and since CUDA 1.0 has no atomic operations at all, the 8800 GTX was fully CUDA compatible, the next generation of CUDA added atomic operations and sudenly 8800 GTX became outdated a bit. All in all 8800 GTX is capable to raycast KD/BIH it is just UNABLE to construct it, since index to the leaf indices table must be incremented atomically across constructing blocks (sure you could limit construction to just one block, but this way, the tree builder will chew the data in something around 2 minutes instead of few seconds).

