Author Topic: NVIDIA CUDA Toolkit 4.0 RC  (Read 2419 times)

0 Members and 1 Guest are viewing this topic.

ljbade

  • NewsPosters
  • Jr. Member

  • Offline
  • *

  • 92
    • View Profile
NVIDIA CUDA Toolkit 4.0 RC
« on: April 07, 2011, 02:47:57 AM »
NVIDIA have announced the public availability of the CUDA Toolkit 4.0 RC which was previously only available to registered developers.
http://developer.nvidia.com/cuda-toolkit-40

Quote
Release Highlights
Easier Application Porting

    * Share GPUs across multiple threads
    * Use all GPUs in the system concurrently from a single host thread
    * No-copy pinning of system memory, a faster alternative to cudaMallocHost()
    * C++ new/delete and support for virtual functions
    * Support for inline PTX assembly
    * Thrust library of templated performance primitives such as sort, reduce, etc.
    * NVIDIA Performance Primitives (NPP) library for image/video processing
    * Layered Textures for working with same size/format textures at larger sizes and higher performance

Faster Multi-GPU Programming

    * Unified Virtual Addressing
    * GPUDirect v2.0 support for Peer-to-Peer Communication

New & Improved Developer Tools

    * Automated Performance Analysis in Visual Profiler
    * C++ debugging in cuda-gdb
    * GPU binary disassembler for Fermi architecture (cuobjdump)