I tested the samples from
ATI Stream SDK 2.2 (OpenCL 1.1) with an NVIDIA GTX 465 and
Forceware 258.19You know a sample is incompatible if GPU usage = 0%
Constant Bandwidth - 99% GPU usage
AccessType : single(static index)
VectorElements : 4
Bandwidth : 1331.02 GB/s
AccessType : single(dynamic index)
VectorElements : 4
Bandwidth : 847.839 GB/s
AccessType : linear
VectorElements : 4
Bandwidth : 26.2818 GB/s
AccessType : random
VectorElements : 4
Bandwidth : 19.087 GB/s
LDS bandwidth - 53% GPU usage
AccessType : single
VectorElements : 1
Bandwidth : 820.885 GB/s
AccessType : linear
VectorElements : 1
Bandwidth : 824.71 GB/s
PCIE bandwidthHost to device : 1.8002 GB/s - 85% GPU usage
Device to host : 2.34058 GB/s - 70% GPU usage
Memory bandwidth 53-78% GPU usage
-----------------------------------------
Copy 1D FastPath : 72.8925 GB/s
-----------------------------------------
Copy 1D CompletePath : 72.3641 GB/s
-----------------------------------------
Copy 2D 32-bit (64x2) : 67.5548 GB/s
Copy 2D 128-bit (64x2) : 83.6624 GB/s
-----------------------------------------
Copy 2D 32-bit (64x4) : 70.8011 GB/s
Copy 2D 128-bit (64x4) : 81.9781 GB/s
-----------------------------------------
Copy 2D 32-bit (8x8) : 37.8293 GB/s
Copy 2D 128-bit (8x8) : 81.5374 GB/s
-----------------------------------------
Copy 2D 32-bit (256x1) : 72.0669 GB/s
Copy 2D 128-bit (256x1) : 82.2084 GB/s
-----------------------------------------
Copy 2D 32-bit (32x2) : 46.1947 GB/s
Copy 2D 128-bit (32x2) : 82.5491 GB/s
-----------------------------------------
Copy 2D 32-bit (64x1) : 47.8214 GB/s
Copy 2D 128-bit (64x1) : 81.668 GB/s
-----------------------------------------
Copy 2D 32-bit (16x16) : 67.2074 GB/s
Copy 2D 128-bit (16x16) : 81.3934 GB/s
-----------------------------------------
Copy 2D 32-bit (16x4) : 42.3963 GB/s
Copy 2D 128-bit (16x4) : 82.2428 GB/s
-----------------------------------------
Copy 2D 32-bit (1x64) : 7.94972 GB/s
Copy 2D 128-bit (1x64) : 35.0866 GB/s
-----------------------------------------
Copy 1D 128-bit : 235.827 GB/s
-----------------------------------------
NoCoal Copy 1D 32-bit : 99.3111 GB/s
-----------------------------------------
Split Copy 1D 32-bit : 32.9356 GB/s
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.1 CUDA 3.2.1
Platform Name: NVIDIA CUDA
Platform Vendor: NVIDIA Corporation
Platform Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
Platform Name: NVIDIA CUDA
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 4318
Max compute units: 11
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 64
Max work group size: 1024
Preferred vector width char: 1
Preferred vector width short: 1
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Max clock frequency: 810Mhz
Address bits: 32
Max memory allocation: 260423680
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 4352
Alignment (bits) of base address: 4096
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 128
Cache size: 180224
Global memory size: 1041694720
Constant buffer size: 65536
Max number of constant args: 9
Local memory type: Scratchpad
Local memory size: 49152
Profiling timer resolution: 1000
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 0000000002C78F20
Name: GeForce GTX 465
Vendor: NVIDIA Corporation
Driver version: 258.19
Profile: FULL_PROFILE
Version: OpenCL 1.1 CUDA
Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64
Error : atomics mismatch!
Error : Bytes mismatch!
Error : d3d10Sharing mismatch!
Error : glSharing mismatch!
Error : images mismatch!
Error : printf mismatch!
Error : deviceAttributeQuery mismatch!
Failed!
