Author Topic: ATI Stream 2.2 samples tested with Fermi  (Read 2529 times)

0 Members and 1 Guest are viewing this topic.

Stefan

  • Global Moderator
  • Hero Member

  • Offline
  • *****

  • 2912
    • View Profile
ATI Stream 2.2 samples tested with Fermi
« on: August 12, 2010, 05:22:28 PM »
I tested the samples from ATI Stream SDK 2.2 (OpenCL 1.1) with an NVIDIA GTX 465 and Forceware 258.19
You know a sample is incompatible if GPU usage = 0%  :P

Constant Bandwidth - 99% GPU usage
AccessType   : single(static index)
VectorElements   : 4
Bandwidth   : 1331.02 GB/s

AccessType   : single(dynamic index)
VectorElements   : 4
Bandwidth   : 847.839 GB/s

AccessType   : linear
VectorElements   : 4
Bandwidth   : 26.2818 GB/s

AccessType   : random
VectorElements   : 4
Bandwidth   : 19.087 GB/s

LDS bandwidth - 53% GPU usage
AccessType   : single
VectorElements   : 1
Bandwidth   : 820.885 GB/s

AccessType   : linear
VectorElements   : 1
Bandwidth   : 824.71 GB/s

PCIE bandwidth
Host to device : 1.8002 GB/s - 85% GPU usage
Device to host : 2.34058 GB/s - 70% GPU usage

Memory bandwidth 53-78% GPU usage
-----------------------------------------
Copy 1D FastPath   : 72.8925 GB/s
-----------------------------------------
Copy 1D CompletePath   : 72.3641 GB/s
-----------------------------------------
Copy 2D 32-bit (64x2)   : 67.5548 GB/s
Copy 2D 128-bit (64x2)   : 83.6624 GB/s
-----------------------------------------
Copy 2D 32-bit (64x4)   : 70.8011 GB/s
Copy 2D 128-bit (64x4)   : 81.9781 GB/s
-----------------------------------------
Copy 2D 32-bit (8x8)   : 37.8293 GB/s
Copy 2D 128-bit (8x8)   : 81.5374 GB/s
-----------------------------------------
Copy 2D 32-bit (256x1)   : 72.0669 GB/s
Copy 2D 128-bit (256x1)   : 82.2084 GB/s
-----------------------------------------
Copy 2D 32-bit (32x2)   : 46.1947 GB/s
Copy 2D 128-bit (32x2)   : 82.5491 GB/s
-----------------------------------------
Copy 2D 32-bit (64x1)   : 47.8214 GB/s
Copy 2D 128-bit (64x1)   : 81.668 GB/s
-----------------------------------------
Copy 2D 32-bit (16x16)   : 67.2074 GB/s
Copy 2D 128-bit (16x16)   : 81.3934 GB/s
-----------------------------------------
Copy 2D 32-bit (16x4)   : 42.3963 GB/s
Copy 2D 128-bit (16x4)   : 82.2428 GB/s
-----------------------------------------
Copy 2D 32-bit (1x64)   : 7.94972 GB/s
Copy 2D 128-bit (1x64)   : 35.0866 GB/s
-----------------------------------------
Copy 1D 128-bit    : 235.827 GB/s
-----------------------------------------
NoCoal Copy 1D 32-bit    : 99.3111 GB/s
-----------------------------------------
Split Copy 1D 32-bit    : 32.9356 GB/s

Quote from: CLINFO
Number of platforms:             1
  Platform Profile:             FULL_PROFILE
  Platform Version:             OpenCL 1.1 CUDA 3.2.1
  Platform Name:                NVIDIA CUDA
  Platform Vendor:             NVIDIA Corporation
  Platform Extensions:          cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll


  Platform Name:                NVIDIA CUDA
Number of devices:             1
  Device Type:                CL_DEVICE_TYPE_GPU
  Device ID:                4318
  Max compute units:             11
  Max work items dimensions:          3
    Max work items[0]:             1024
    Max work items[1]:             1024
    Max work items[2]:             64
  Max work group size:             1024
  Preferred vector width char:          1
  Preferred vector width short:          1
  Preferred vector width int:          1
  Preferred vector width long:          1
  Preferred vector width float:          1
  Preferred vector width double:       1
  Max clock frequency:             810Mhz
  Address bits:                32
  Max memory allocation:          260423680
  Image support:             Yes
  Max number of images read arguments:    128
  Max number of images write arguments:    8
  Max image 2D width:          8192
  Max image 2D height:          8192
  Max image 3D width:          2048
  Max image 3D height:    2048
  Max image 3D depth:          2048
  Max samplers within kernel:       16
  Max size of kernel argument:          4352
  Alignment (bits) of base address:       4096
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                Yes
    Quiet NaNs:                Yes
    Round to nearest even:          Yes
    Round to zero:             Yes
    Round to +ve and infinity:          Yes
    IEEE754-2008 fused multiply-add:       Yes
  Cache type:                Read/Write
  Cache line size:             128
  Cache size:                180224
  Global memory size:             1041694720
  Constant buffer size:             65536
  Max number of constant args:          9
  Local memory type:             Scratchpad
  Local memory size:             49152
  Profiling timer resolution:          1000
  Device endianess:             Little
  Available:                Yes
  Compiler available:             Yes
  Execution capabilities:            
    Execute OpenCL kernels:          Yes
    Execute native function:          No
  Queue properties:            
    Out-of-Order:             Yes
    Profiling :                Yes
  Platform ID:                0000000002C78F20
  Name:                   GeForce GTX 465
  Vendor:                NVIDIA Corporation
  Driver version:             258.19
  Profile:                FULL_PROFILE
  Version:                OpenCL 1.1 CUDA
  Extensions:                cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll  cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64


Error : atomics mismatch!
Error : Bytes mismatch!
Error : d3d10Sharing mismatch!
Error : glSharing mismatch!
Error : images mismatch!
Error : printf mismatch!
Error : deviceAttributeQuery mismatch!
Failed!