Introduction to GPU Computing
NVIDIA R195 display driver includes new extensions to its OpenCL 1.0 conformant driver
OpenCL 1.0 Extensions: NVIDIA is the only vendor supporting OpenCL features beyond the minimum conformance level. New extensions released by NVIDIA include support for double precision, OpenGL interoperability and the new OpenCL Installable Client Device (ICD). These new features supplement existing NVIDIA-only support for 2D image, 32-bit atomics and byte addressable stores.
NVIDIA Tesla 20-series based on Fermi, the new generation CUDA processor architecture
NVIDIA Tesla C2050 GPU Computing Processor
NVIDIA NV100 (or Fermi) is less powerful than GeForce GTX 285?
– C2050 will deliver 520 GFLOPS of IEEE 754-2008 Dual Precision format and 1.040 TFLOPS of single precision.
– C2070 stands a bit better, 630 GLOPS of Dual-Precision and 1.26 TFLOPS in Single Precision.
– The single-precision numbers are the inconvenient reason why nVidia didn’t mention single-precision performance on any of the Tesla launch slides.
– EVGA’s factory overclocked GT200-based GTX 285 FTW board, packs 1.063 single-precision TFLOPS.
OpenCL sort-of rutt-etra
Swimming in OpenCL
Processing audio data to produce a fancy visualization of its spectral content.
Some numbers to process an audio test file on a 8-core Mac Pro:
– Initial Approach (Matlab): 492 seconds
– NSOperationQueue: 115 seconds
– OpenCL Attempt 1 (scalar code): 30.87 seconds
– OpenCL Attempt 2 (vectorized code): 14.1 seconds
OpenCL sort-of particles