DirectX 11 includes different version of compute shaders (CS): 5.0 (for DX11 hardware), 4.1 (for DX10.1 hardware) and 4.0 (for DX10 hardware). CS 4.x have some limitations like a maximun of 768 threads per group (or threads per block in CUDA terminology) or 16Kb for thread group shared memory (shared memory per block in CUDA). If you have a GeForce 8+, you can use GPU Caps Viewer (CUDA panel) to have an overview of these values.
From X-bit labs, CS 5.0 will also offer better interaction with graphics pipeline (e.g., it can output to textures). If I’m not wrong, CUDA or OpenCL have already such an interaction between the compute API and the rendering API (for example using PBOs in CUDA/OpenGL).
Anyway, compute APIs such as CUDA, OpenCL, ATI Stream or DirectX11 Compute Shader be will in the spotlight in the next months.