CUDA makes it possible to program the GPU with the language C. This article will show you the steps to code a matrix multiplication routine in CUDA:
- allocate memory on the GPU with cudaMalloc or cudaMallocPitch (for aligned memory allocation)
- move data to the GPU with cudaMemcpy2D
- select the kernel domain, write the kernel and run it
- move results back from the GPU to the host with cudaMemcpy2D
- free resources with cudaFree