Particle systems are very easy to parallelize because they have many independent objects that can be grouped together in any way and processed in any order.
In this demo, Mike Yi and Quentin Froemke, two Intel engineers, show how to get the most of your multi-core CPU using multi-threaded code with complex particle movement using aerodynamic calculations (lift and drag). The demo uses Intel n-way threaded framework called Threading Building Blocks (TBB) and some SIMD optimizations.
As you can see on the screenshot, the four cores of my Intel X9650 are used and the average FPS is around 35.
The rendering is done with Direct3D 10.
You can download the binaries and source code HERE.