We've just released the
CUDA C Programming Best Practices Guide. This guide is designed to help developers programming for the CUDA architecture using C with CUDA extensions implement high performance parallel algorithms and understand best practices for GPU Computing. Chapters on the following topics and more are included in the guide:
* Introduction to Parallel Computing with CUDA
* Performance Metrics
* Memory Optimizations
* Execution Configuration Optimizations
* Instruction Optimizations
* Control Flow
* Debugging
* Numerical Accuracy and Precision
* Performance Optimization Strategies
This will be included with the 2.3 toolkit, but we decided to release it now because it's definitely worthwhile reading for any CUDA C developer (a lot of collected internal wisdom on proven optimization strategies, for example). Feel free to post any questions or comments
in this thread.