CPU PhysX: x87, SSE and PhysX SDK 3.0


Intel 8087 math coprocessor
Intel 8087 math coprocessor

PhysX is again under attack (see HERE or HERE for some examples). This time, the attack comes from the realworldtech.com website.

The article in question, PhysX87: Software Deficiency, reveals that most of PhysX CPU codepath instructions use the x87 floating point instruction set (fp87). Many years ago, the x87 was a coprocessor (8087) that was designed to help the main processor (the 8086 processor) to process mathematical instructions. Today, a modern physics engine should use SSE / SSE2 (Streaming SIMD Extensions) instruction set to deal with math functions. And according to the article author, SSE code would lead to an overall speed of 2X to 4X the original PhysX fp87 code.

In this wikipedia page about SSE2, we can found:

The FPU (x87) instructions usually store intermediate results with 80 bits of precision. When legacy FPU software algorithms are ported to SSE2, certain combinations of math operations or input datasets can result in measurable numerical deviation: this is of critical importance to scientific computations, if the calculation results must be compared against results generated from a different machine architecture.

A simple recompilation of PhysX using SSE2 instruction set might lead to some incorrect calculation results, maybe this is an element of anwser…

Bryan Del Rizzo (NVIDIA senior PR manager) said that PhysX 3.x SDK, the new major branch of the PhysX engine SDK, will enable SSE code by default. What’s more, the forthcoming PhysX 3.x SDK will introduce new automatic multi-threading (for multi-core CPUs) support.

The new SDK will automatically take advantage of however many cores are available, or the number of cores set by the developer, and will also provide the option of a “thread pool” from which “the physics simulation can draw resources that run across all cores.”

I can’t wait to test the PhysX 3.0 SDK and of course to update FluidMark as well.

I did some PhysX CPU tests with FluidMark 1.2.0 by varying the number of particles. I used 3 emitters in order to fully load all X9650 cores. Here are the results:

Test bed:
– CPU: Quad core X9650 @ 3GHz (default clock)
– Windows 7 64-bit
– GPU 1: GTX 480
– GPU 2: GT 240 (dedicated PhysX card)

Common FluidMark settings: 3 emitters, PhysX multi-core ON, Async mode ON, 60sec, 1024×768 windowed.

120,000 particles
– GTX 480: PhysX: 175 (29 SPS) – GraphX: 347 (57 FPS)
– GT 240: PhysX: 86 (14 SPS) – GraphX: 539 (88 FPS)
– CPU: PhysX: 22 (3 SPS) – GraphX: 580 (95 FPS)

10,000 particles
– GTX 480: PhysX: 1312 (215 SPS) – GraphX: 2699 (443 FPS)
– GT 240: PhysX: 935 (153 SPS) – GraphX: 2953 (484 FPS)
– CPU: PhysX: 476 (77 SPS) – GraphX: 2749 (450 FPS)

5,000 particles
– GTX 480: PhysX: 1873 (307 SPS) – GraphX: 3391 (556 FPS)
– GT 240: PhysX: 1607 (263 SPS) – GraphX: 3620 (593 FPS)
– CPU: PhysX: 1311 (214 SPS) – GraphX: 3237 (530 FPS)

With many particles (120k), CPU PhysX is behind GPU PhysX and SSE code won’t change nothing. But it’s another story with few particles (5k): 214 SPS for the CPU and 263 SPS for the GT 240. And if SSE really improves the speed, maybe a CPU could dominate a GeForce in PhysX simulations…

Let’s wait for PhysX 3.0 SDK and FluidMark update 😉

We have to keep in mind that PhysX CPU is very important because many kind of simulations are not GPU accelerated like rigid body collisions. Rigid body collisions are processed only by the CPU and, for sure, a gain in math functions would be very appreciated.

GeeXLab - PhysX rigid body collisions demo
GeeXLab – PhysX rigid body collisions demo – I will release the demo shortly 😉

7 thoughts on “CPU PhysX: x87, SSE and PhysX SDK 3.0”

  1. Pingback: [Test] Simple x87 vs SSE2 Performance Test With Matrix Multiplication - 3D Tech News, Pixel Hacking, Data Visualization and 3D Programming - Geeks3D.com

  2. Psolord

    Rigid body demo looks super cool. Waiting!

    Hope they get their act straight with PhysX SDK 3.0

    They must come to understand that they are only dapaging physx by not allowing its proper use in non Nvidia systems. Not everyone wants or likes Nvidia anyway.

    I’ve seen 4.6X performance increase on my Core i7 going from single to multi threading PhysX in fluidmark 1.2.0

    Now if you can get 4X with multithreading alone and another 2X from SSE, you instantly have a factor of 8X in physx games. That’s means that a game that would run at 10fps on a Radeon, would instantly jump to 80fps! WTF?

  3. Leith Bade

    Remember that CUDA has bigger issues with FP errors than SSE…

  4. DrBalthar

    Simple answer is don’t use PhysX there are alternatives out there. PhysX offers nothing that the alternatives do not have!

  5. Pingback: FluidMark 1.2.0: CPU PhysX avec et sans Multi-Threading | JeGX's Infamous Lab

  6. Pingback: [GPU Tool] FluidMark 1.2.2 Updated With PhysX SDK - 3D Tech News, Pixel Hacking, Data Visualization and 3D Programming - Geeks3D.com

Comments are closed.