Author Topic: ATI OpenCL Optimization Case Study: Diagonal Sparse Matrix Vector Multiplication  (Read 2320 times)

0 Members and 1 Guest are viewing this topic.

Stefan

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2795
    • View Profile
This article discusses performance optimizations for AMD GPUs and CPUs using as a case study a simple, yet widely used computationally intensive kernel: Diagonal Sparse Matrix Vector Multiplication. We look at several topics which come up during OpenCL™ performance optimization and apply them to our case study:

   1. Translating C code to OpenCL™
   2. Choosing data structures for dense, aligned memory accesses
   3. Using local, on-chip memory
   4. Vectorizing the computation for higher efficiency
   5. Using OpenCL™ images to improve effective memory bandwidth
   6. Parallelism for multicore processors

At the end of our journey, we'll have a high-performance kernel for both the AMD Radeon™ HD 5870 GPU, as well as the AMD Phenom™ II X4 965 CPU.