CUDA vs Quad Core Performance Test

Started by Stefan, December 01, 2010, 12:29:48 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.


Quote...matrix multiplication program that can multiply two 1,536x1,536 dimension matrices using a single Nvidia GPU, a single thread on your primary CPU, and 4 threads using OpenMP

******Matrix Multiplication Performance Analysis CUDA program*******
based on Nvidia reference program with OpenMP for CPU multithreading

Select which GPU to run the test on. Enter 1 for the first GPU, etc.
Select the number of threads for the CPU test.
Select the block multiple for the matrix size. (version one is 96)
(64 for 1024x1024, 96 for 1536x1536, 128 for 2048x2048, etc.)

device name: GeForce GTX 465    <----- creating CUDA context on this device
device sharedMemPerBlock: 49152
device totalGlobalMem: 1041694720
device regsPerBlock: 32768
device warpSize: 32
device memPitch: 2147483647
device maxThreadsPerBlock: 1024
device maxThreadsDim[0]: 1024
device maxThreadsDim[1]: 1024
device maxThreadsDim[2]: 64
device maxGridSize[0]: 65535
device maxGridSize[1]: 65535
device maxGridSize[2]: 1
device totalConstMem: 65536
device major: 2
device minor: 0
device clockRate: 810000
device textureAlignment: 512
device deviceOverlap: 1
device multiProcessorCount: 11
Total CUDA cores: 352

Processing time for GPU: 16 (ms)
Processing time for CPU 1 thread: 3703 (ms)
Processing time for CPU 4 threads: 984 (ms)
CPU multithread speedup: 3.763211, efficiency: 94.080284
CPU to GPU time ratio (CUDA Speedup): 61.500000