CPU PhysX: x87, SSE and PhysX SDK 3.0


Intel 8087 math coprocessor
Intel 8087 math coprocessor

PhysX is again under attack (see HERE or HERE for some examples). This time, the attack comes from the realworldtech.com website.

The article in question, PhysX87: Software Deficiency, reveals that most of PhysX CPU codepath instructions use the x87 floating point instruction set (fp87). Many years ago, the x87 was a coprocessor (8087) that was designed to help the main processor (the 8086 processor) to process mathematical instructions. Today, a modern physics engine should use SSE / SSE2 (Streaming SIMD Extensions) instruction set to deal with math functions. And according to the article author, SSE code would lead to an overall speed of 2X to 4X the original PhysX fp87 code.

In this wikipedia page about SSE2, we can found:

The FPU (x87) instructions usually store intermediate results with 80 bits of precision. When legacy FPU software algorithms are ported to SSE2, certain combinations of math operations or input datasets can result in measurable numerical deviation: this is of critical importance to scientific computations, if the calculation results must be compared against results generated from a different machine architecture.

A simple recompilation of PhysX using SSE2 instruction set might lead to some incorrect calculation results, maybe this is an element of anwser…

Bryan Del Rizzo (NVIDIA senior PR manager), in this article, said that PhysX 3.x SDK, the new major branch of the PhysX engine SDK, will enable SSE code by default. What’s more, the forthcoming PhysX 3.x SDK will introduce new automatic multi-threading (for multi-core CPUs) support.

The new SDK will automatically take advantage of however many cores are available, or the number of cores set by the developer, and will also provide the option of a “thread pool” from which “the physics simulation can draw resources that run across all cores.”

I can’t wait to test the PhysX 3.0 SDK and of course to update FluidMark as well.

I did some PhysX CPU tests with FluidMark 1.2.0 by varying the number of particles. I used 3 emitters in order to fully load all X9650 cores. Here are the results:

Test bed:
– CPU: Quad core X9650 @ 3GHz (default clock)
– Windows 7 64-bit
– GPU 1: GTX 480
– GPU 2: GT 240 (dedicated PhysX card)

Common FluidMark settings: 3 emitters, PhysX multi-core ON, Async mode ON, 60sec, 1024×768 windowed.

120,000 particles
– GTX 480: PhysX: 175 (29 SPS) – GraphX: 347 (57 FPS)
– GT 240: PhysX: 86 (14 SPS) – GraphX: 539 (88 FPS)
– CPU: PhysX: 22 (3 SPS) – GraphX: 580 (95 FPS)

10,000 particles
– GTX 480: PhysX: 1312 (215 SPS) – GraphX: 2699 (443 FPS)
– GT 240: PhysX: 935 (153 SPS) – GraphX: 2953 (484 FPS)
– CPU: PhysX: 476 (77 SPS) – GraphX: 2749 (450 FPS)

5,000 particles
– GTX 480: PhysX: 1873 (307 SPS) – GraphX: 3391 (556 FPS)
– GT 240: PhysX: 1607 (263 SPS) – GraphX: 3620 (593 FPS)
– CPU: PhysX: 1311 (214 SPS) – GraphX: 3237 (530 FPS)

With many particles (120k), CPU PhysX is behind GPU PhysX and SSE code won’t change nothing. But it’s another story with few particles (5k): 214 SPS for the CPU and 263 SPS for the GT 240. And if SSE really improves the speed, maybe a CPU could dominate a GeForce in PhysX simulations…

Let’s wait for PhysX 3.0 SDK and FluidMark update 😉

We have to keep in mind that PhysX CPU is very important because many kind of simulations are not GPU accelerated like rigid body collisions. Rigid body collisions are processed only by the CPU and, for sure, a gain in math functions would be very appreciated.

GeeXLab - PhysX rigid body collisions demo
GeeXLab – PhysX rigid body collisions demo – I will release the demo shortly 😉