Author Topic: Intel OpenCL SDK tested on NVIDIA and ATI platform  (Read 5427 times)

0 Members and 1 Guest are viewing this topic.

Stefan

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2795
    • View Profile
Intel OpenCL SDK tested on NVIDIA and ATI platform
« on: November 16, 2010, 06:54:48 PM »
Intel OpenCL SDK tested on desktop with NVIDIA GPU and Core2Quad 9450



NVIDIA's OpenCL Device Query fails to recognise Intel platform.



GPUCapsviewer recognises NVIDIA and Intel platform, but is confused and continues using NVIDIA platform, albeit Intel is selected.





AMD's Stream 2.2 CLINFO recognises NVIDIA and Intel platform, but prints some errors:

Code: [Select]
Number of platforms: 2
  Platform Profile: FULL_PROFILE
  Platform Version: OpenCL 1.0 CUDA 3.2.1
  Platform Name: NVIDIA CUDA
  Platform Vendor: NVIDIA Corporation
  Platform Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
  Platform Profile: FULL_PROFILE
  Platform Version: OpenCL 1.1 WINDOWS
  Platform Name: Intel OpenCL
  Platform Vendor: Intel Corporation
  Platform Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_gl_sharing cl_khr_byte_addressable_store cl_khr_icd


  Platform Name: NVIDIA CUDA
Number of devices: 1
  Device Type: CL_DEVICE_TYPE_GPU
  Device ID: 4318
  Max compute units: 11
  Max work items dimensions: 3
    Max work items[0]: 1024
    Max work items[1]: 1024
    Max work items[2]: 64
  Max work group size: 1024
  Preferred vector width char: 1
  Preferred vector width short: 1
  Preferred vector width int: 1
  Preferred vector width long: 1
  Preferred vector width float: 1
  Preferred vector width double: 1
  Max clock frequency: 810Mhz
  Address bits: 17240136165097504
  Max memory allocation: 260423680
  Image support: Yes
  Max number of images read arguments: 128
  Max number of images write arguments: 8
  Max image 2D width: 4096
  Max image 2D height: 32768
  Max image 3D width: 2048
  Max image 3D height: 2048
  Max image 3D depth: 2048
  Max samplers within kernel: 16
  Max size of kernel argument: 4352
  Alignment (bits) of base address: 4096
  Minimum alignment (bytes) for any datatype: 128
  Single precision floating point capability
    Denorms: Yes
    Quiet NaNs: Yes
    Round to nearest even: Yes
    Round to zero: Yes
    Round to +ve and infinity: Yes
    IEEE754-2008 fused multiply-add: Yes
  Cache type: Read/Write
  Cache line size: 128
  Cache size: 180224
  Global memory size: 1041694720
  Constant buffer size: 65536
  Max number of constant args: 9
  Local memory type: Scratchpad
  Local memory size: 49152
  Profiling timer resolution: 1000
  Device endianess: Little
  Available: Yes
  Compiler available: Yes
  Execution capabilities:
    Execute OpenCL kernels: Yes
    Execute native function: No
  Queue properties:
    Out-of-Order: Yes
    Profiling : Yes
  Platform ID: 02551550
  Name: GeForce GTX 465
  Vendor: NVIDIA Corporation
  Driver version: 261.00
  Profile: FULL_PROFILE
  Version: OpenCL 1.0 CUDA
  Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll  cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64


Error : atomics mismatch!
Error : Bytes mismatch!
Error : d3d10Sharing mismatch!
Error : glSharing mismatch!
Error : images mismatch!
Error : printf mismatch!
Error : deviceAttributeQuery mismatch!
Failed!
  Platform Name: Intel OpenCL
Number of devices: 1
  Device Type: CL_DEVICE_TYPE_CPU
  Device ID: 32902
  Max compute units: 4
  Max work items dimensions: 3
    Max work items[0]: 1024
    Max work items[1]: 1024
    Max work items[2]: 1024
  Max work group size: 1024
  Preferred vector width char: 16
  Preferred vector width short: 8
  Preferred vector width int: 4
  Preferred vector width long: 2
  Preferred vector width float: 4
  Preferred vector width double: 2
  Max clock frequency: 3200Mhz
  Address bits: 17240136165097504
  Max memory allocation: 536838144
  Image support: Yes
  Max number of images read arguments: 128
  Max number of images write arguments: 128
  Max image 2D width: 8192
  Max image 2D height: 8192
  Max image 3D width: 2048
  Max image 3D height: 2048
  Max image 3D depth: 2048
  Max samplers within kernel: 128
  Max size of kernel argument: 1024
  Alignment (bits) of base address: 1024
  Minimum alignment (bytes) for any datatype: 128
  Single precision floating point capability
    Denorms: Yes
    Quiet NaNs: Yes
    Round to nearest even: Yes
    Round to zero: No
    Round to +ve and infinity: No
    IEEE754-2008 fused multiply-add: No
  Cache type: Read/Write
  Cache line size: 64
  Cache size: 6291456
  Global memory size: 2147352576
  Constant buffer size: 131072
  Max number of constant args: 128
  Local memory type: Global
  Local memory size: 32768
  Profiling timer resolution: 279
  Device endianess: Little
  Available: Yes
  Compiler available: Yes
  Execution capabilities:
    Execute OpenCL kernels: Yes
    Execute native function: Yes
  Queue properties:
    Out-of-Order: Yes
    Profiling : Yes
  Platform ID: 02E11220
  Name: GenuineIntel
  Vendor: Intel Corporation
  Driver version: 1.1
  Profile: FULL_PROFILE
  Version: OpenCL 1.1
  Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_gl_sharing cl_khr_byte_addressable_store


Error : atomics mismatch!
Error : Bytes mismatch!
Error : d3d10Sharing mismatch!
Error : glSharing mismatch!
Error : images mismatch!
Error : printf mismatch!
Error : deviceAttributeQuery mismatch!
Failed!
« Last Edit: November 16, 2010, 07:17:07 PM by Stefan »

Stefan

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2795
    • View Profile
Re: Intel OpenCL SDK tested on NVIDIA and ATI platform
« Reply #1 on: November 16, 2010, 07:21:59 PM »
Intel OpenCL SDK tested on notebook with ATI GPU and Core i3 330m



Same as above: NVIDIA's OpenCL Device Query fails to recognise Intel platform.



GPUCapsviewer recognises ATI and Intel platform, but is confused and continues using ATI platform, albeit Intel is selected.
You see that because you can run either GPU or CPU demos.





AMD's Stream 2.2 CLINFO recognises ATI (2 devices) and Intel platform (1 device)

Code: [Select]
Number of platforms: 2
  Platform Profile: FULL_PROFILE
  Platform Version: OpenCL 1.1 ATI-Stream-v2.2 (302)
  Platform Name: ATI Stream
  Platform Vendor: Advanced Micro Devices, Inc.
  Platform Extensions: cl_khr_icd cl_amd_event_callback cl_khr_d3d10_sharing
  Platform Profile: FULL_PROFILE
  Platform Version: OpenCL 1.1 WINDOWS
  Platform Name: Intel OpenCL
  Platform Vendor: Intel Corporation
  Platform Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_gl_sharing cl_khr_byte_addressable_store cl_khr_icd


  Platform Name: ATI Stream
Number of devices: 2
  Device Type: CL_DEVICE_TYPE_CPU
  Device ID: 4098
  Max compute units: 4
  Max work items dimensions: 3
    Max work items[0]: 1024
    Max work items[1]: 1024
    Max work items[2]: 1024
  Max work group size: 1024
  Preferred vector width char: 16
  Preferred vector width short: 8
  Preferred vector width int: 4
  Preferred vector width long: 2
  Preferred vector width float: 4
  Preferred vector width double: 0
  Max clock frequency: 2128Mhz
  Address bits: 32
  Max memory allocation: 536870912
  Image support: No
  Max size of kernel argument: 4096
  Alignment (bits) of base address: 1024
  Minimum alignment (bytes) for any datatype: 128
  Single precision floating point capability
    Denorms: Yes
    Quiet NaNs: Yes
    Round to nearest even: Yes
    Round to zero: Yes
    Round to +ve and infinity: Yes
    IEEE754-2008 fused multiply-add: No
  Cache type: Read/Write
  Cache line size: 64
  Cache size: 32768
  Global memory size: 1073741824
  Constant buffer size: 65536
  Max number of constant args: 8
  Local memory type: Global
  Local memory size: 32768
  Profiling timer resolution: 481
  Device endianess: Little
  Available: Yes
  Compiler available: Yes
  Execution capabilities:
    Execute OpenCL kernels: Yes
    Execute native function: Yes
  Queue properties:
    Out-of-Order: No
    Profiling : Yes
  Platform ID: 00EDD40C
  Name: Intel(R) Core(TM) i3 CPU       M 330  @ 2.13GHz
  Vendor: GenuineIntel
  Driver version: 2.0
  Profile: FULL_PROFILE
  Version: OpenCL 1.1 ATI-Stream-v2.2 (302)
  Extensions: cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_printf cl_khr_d3d10_sharing
  Device Type: CL_DEVICE_TYPE_GPU
  Device ID: 4098
  Max compute units: 2
  Max work items dimensions: 3
    Max work items[0]: 128
    Max work items[1]: 128
    Max work items[2]: 128
  Max work group size: 128
  Preferred vector width char: 16
  Preferred vector width short: 8
  Preferred vector width int: 4
  Preferred vector width long: 2
  Preferred vector width float: 4
  Preferred vector width double: 0
  Max clock frequency: 750Mhz
  Address bits: 32
  Max memory allocation: 134217728
  Image support: Yes
  Max number of images read arguments: 128
  Max number of images write arguments: 8
  Max image 2D width: 8192
  Max image 2D height: 8192
  Max image 3D width: 2048
  Max image 3D height: 2048
  Max image 3D depth: 2048
  Max samplers within kernel: 16
  Max size of kernel argument: 1024
  Alignment (bits) of base address: 32768
  Minimum alignment (bytes) for any datatype: 128
  Single precision floating point capability
    Denorms: No
    Quiet NaNs: Yes
    Round to nearest even: Yes
    Round to zero: Yes
    Round to +ve and infinity: Yes
    IEEE754-2008 fused multiply-add: Yes
  Cache type: None
  Cache line size: 0
  Cache size: 0
  Global memory size: 536870912
  Constant buffer size: 65536
  Max number of constant args: 8
  Local memory type: Scratchpad
  Local memory size: 32768
  Profiling timer resolution: 1
  Device endianess: Little
  Available: Yes
  Compiler available: Yes
  Execution capabilities:
    Execute OpenCL kernels: Yes
    Execute native function: No
  Queue properties:
    Out-of-Order: No
    Profiling : Yes
  Platform ID: 00EDD40C
  Name: Cedar
  Vendor: Advanced Micro Devices, Inc.
  Driver version: CAL 1.4.879
  Profile: FULL_PROFILE
  Version: OpenCL 1.1 ATI-Stream-v2.2 (302)
  Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops cl_khr_d3d10_sharing


Passed!
  Platform Name: Intel OpenCL
Number of devices: 1
  Device Type: CL_DEVICE_TYPE_CPU
  Device ID: 32902
  Max compute units: 4
  Max work items dimensions: 3
    Max work items[0]: 1024
    Max work items[1]: 1024
    Max work items[2]: 1024
  Max work group size: 1024
  Preferred vector width char: 16
  Preferred vector width short: 8
  Preferred vector width int: 4
  Preferred vector width long: 2
  Preferred vector width float: 4
  Preferred vector width double: 2
  Max clock frequency: 2133Mhz
  Address bits: 32
  Max memory allocation: 536838144
  Image support: Yes
  Max number of images read arguments: 128
  Max number of images write arguments: 128
  Max image 2D width: 8192
  Max image 2D height: 8192
  Max image 3D width: 2048
  Max image 3D height: 2048
  Max image 3D depth: 2048
  Max samplers within kernel: 128
  Max size of kernel argument: 1024
  Alignment (bits) of base address: 1024
  Minimum alignment (bytes) for any datatype: 128
  Single precision floating point capability
    Denorms: Yes
    Quiet NaNs: Yes
    Round to nearest even: Yes
    Round to zero: No
    Round to +ve and infinity: No
    IEEE754-2008 fused multiply-add: No
  Cache type: Read/Write
  Cache line size: 64
  Cache size: 262144
  Global memory size: 2147352576
  Constant buffer size: 131072
  Max number of constant args: 128
  Local memory type: Global
  Local memory size: 32768
  Profiling timer resolution: 481
  Device endianess: Little
  Available: Yes
  Compiler available: Yes
  Execution capabilities:
    Execute OpenCL kernels: Yes
    Execute native function: Yes
  Queue properties:
    Out-of-Order: Yes
    Profiling : Yes
  Platform ID: 03B96D88
  Name: GenuineIntel
  Vendor: Intel Corporation
  Driver version: 1.1
  Profile: FULL_PROFILE
  Version: OpenCL 1.1
  Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_gl_sharing cl_khr_byte_addressable_store


Passed!

Stefan

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2795
    • View Profile
Re: Intel OpenCL SDK tested on NVIDIA and ATI platform
« Reply #2 on: November 16, 2010, 07:50:09 PM »
OpenCL demos actually running on INTEL platform



I managed to run the demos on Intel platform by disabling GPU vendor from registry (no reboot required).
To disable a platform, set its value dword:00000001



Here are the locations on 64 bit Windows:

Quote
Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Khronos\OpenCL\Vendors]
"nvcuda.dll"=dword:00000000
"intelocl.dll"=dword:00000000

Quote
Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Khronos\OpenCL\Vendors]
"atiocl.dll"=dword:00000000
"atiocl64.dll"=dword:00000000
"intelocl.dll"=dword:00000000
« Last Edit: November 16, 2010, 08:32:52 PM by Stefan »

Stefan

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2795
    • View Profile
Re: Intel OpenCL SDK tested on NVIDIA and ATI platform
« Reply #3 on: November 16, 2010, 08:13:49 PM »
Benchmark: ATI Stream CPU mode vs. INTEL



Configuration: Intel Core i3 330m / CEDAR 5470, window size 600 x 1 pixel to get higher framerates

           ATI Intel
Julia       4   5
Particles  21  36
Deformer   29  27
PostFX     79  60


As you can see the results are inconsistent.