Programming a Matrix Multiplication for GPUs with CUDA

CUDA makes it possible to program the GPU with the language C. This article will show you the steps to code a matrix multiplication routine in CUDA:

  • allocate memory on the GPU with cudaMalloc or cudaMallocPitch (for aligned memory allocation)
  • move data to the GPU with cudaMemcpy2D
  • select the kernel domain, write the kernel and run it
  • move results back from the GPU to the host with cudaMemcpy2D
  • free resources with cudaFree


Related posts:

  1. CUDA Programming: CuPP C++ Framework and ISC 2009 Tutorials (CUDA / OpenCL)
  2. NVIDIA CUDA Programming Best Practices Guide
  3. SIGGRAPH 2008 Presentations: Programming with CUDA
  4. CUDA 2.0 Available
  5. [English]CUDA Enabled GPU Products[/English][French]GPUs Compatibles CUDA[/French]