GpuTest 0.4.0 for Windows, OSX and Linux
...matrix multiplication program that can multiply two 1,536x1,536 dimension matrices using a single Nvidia GPU, a single thread on your primary CPU, and 4 threads using OpenMP
******Matrix Multiplication Performance Analysis CUDA program*******based on Nvidia reference program with OpenMP for CPU multithreadingSelect which GPU to run the test on. Enter 1 for the first GPU, etc.Select the number of threads for the CPU test.Select the block multiple for the matrix size. (version one is 96)(64 for 1024x1024, 96 for 1536x1536, 128 for 2048x2048, etc.)device name: GeForce GTX 465 <----- creating CUDA context on this devicedevice sharedMemPerBlock: 49152 device totalGlobalMem: 1041694720 device regsPerBlock: 32768 device warpSize: 32 device memPitch: 2147483647 device maxThreadsPerBlock: 1024 device maxThreadsDim[0]: 1024 device maxThreadsDim[1]: 1024 device maxThreadsDim[2]: 64 device maxGridSize[0]: 65535 device maxGridSize[1]: 65535 device maxGridSize[2]: 1 device totalConstMem: 65536 device major: 2 device minor: 0 device clockRate: 810000 device textureAlignment: 512 device deviceOverlap: 1 device multiProcessorCount: 11 Total CUDA cores: 352 Processing time for GPU: 16 (ms) Processing time for CPU 1 thread: 3703 (ms) Processing time for CPU 4 threads: 984 (ms) CPU multithread speedup: 3.763211, efficiency: 94.080284 CPU to GPU time ratio (CUDA Speedup): 61.500000