Author Topic: NVIDIA CUDA Toolkit 9.0.176 final release  (Read 391 times)



0 Members and 1 Guest are viewing this topic.

Stefan

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4069
    • View Profile
NVIDIA CUDA Toolkit 9.0.176 final release
« on: September 26, 2017, 05:40:39 AM »
 Download Here: https://developer.nvidia.com/cuda-downloads

Quote
New Features  General CUDA
  • CUDA 9 now supports Multi-Process Service (MPS) on Volta GPUs. For information about enhancements to Volta MPS, see Multi-Process Service (http://docs.nvidia.com/deploy/mps/) in the NVIDIA GPU Deployment and Management Documentation.
  • A code sample for the new CUDA C++ warp matrix functions has been added.
CUDA Tools
  • CUDA Compilers. Microsoft Visual Studio 2017 (starting with Update 1) support is beta. Only the RTM version (vc15.0) is fully supported.
  • CUDA Compilers. An attempt to define a __global__ function in a friend declaration now generates an NVCC diagnostic. Previously, compilation would fail at the host compilation step.
   Unsupported Features  General CUDA
  • CUDA library. The built-in functions __float2half_rn() and __half2float() have been removed. Use equivalent functionality in the updated fp16 header file from the CUDA toolkit.
  • CUDA library. cuBLAS GemmEx routines, namely cublas<t>gemm extensions for mixed precision, are supported only on GPUs based on the Maxwell or later architectures. These routines are not supported on GPUs based on the Kepler architecture, namely Tesla K40 or K80.
  • The environment variable for disabling unified memory, CUDA_DISABLE_UNIFIED_MEMORY,  is no longer supported.
    Resolved Issues  General CUDA
  • MPS Server returns an exit status of 1 when it successfully exits.
  • The performance of cudaLaunchCooperativeKernelMultiDevice() APIs has been improved.
  • Strict alias warnings when using GCC to compile code that uses __half data types (cuda_fp16.h) have been disabled.
  • The Warp Intrinsics __shfl() function for FP16 data types now has a *_sync equivalent.
CUDA Libraries
  • The cufftXtMalloc() API now allocates the correct amount of memory for multi-GPU 2D and 3D plans.