Author Topic: NVIDIA CUDA Toolkit 11.0.2 final release  (Read 1854 times)

0 Members and 1 Guest are viewing this topic.

Stefan

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4637
NVIDIA CUDA Toolkit 11.0.2 final release
« on: July 07, 2020, 09:20:23 PM »
Download now

 This section summarizes the changes in CUDA 11.0 GA since the 11.0 RC release.


General CUDA
 
  • Added support for Ubuntu 20.04 LTS on x86_64 platforms.
  • Arm server platforms (arm64 sbsa) are supported with NVIDIA T4 GPUs.
NPP New Features
 
  • Batched Image Label Markers Compression that removes sparseness between marker label IDs output from LabelMarkers call.
  • Image Flood Fill functionality fills a connected region of an image with a specified new value.
  • Stability and performance fixes to Image Label Markers and Image Label Markers Compression.
nvJPEG New Features
 
  • nvJPEG allows the user to allocate separate memory pools for each chroma subsampling format. This helps avoid memory re-allocation overhead. This can be controlled by passing the newly added flag NVJPEG_FLAGS_ENABLE_MEMORY_POOLS to the nvjpegCreateEx API.
  • nvJPEG encoder now allow compressed bitstream on the GPU Memory.
cuBLAS New Features
 
  • cuBLASLt Matrix Multiplication adds support for fused ReLU and bias operations for all floating point types except double precision (FP64).
  • Improved batched TRSM performance for matrices larger than 256.
cuSOLVER New Features
 
  • Add 64-bit API of GESVD. The new routine cusolverDnGesvd_bufferSize() fills the missing parameters in 32-bit API cusolverDn[S|D|C|Z]gesvd_bufferSize() such that it can estimate the size of the workspace accurately.
  • Added the single process multi-GPU Cholesky factorization capabilities POTRF, POTRS and POTRI in cusolverMG library.
cuSOLVER Resolved Issues
 
  • Fixed an issue where SYEVD/SYGVD would fail and return error code 7 if the matrix is zero and the dimension is bigger than 25.
cuSPARSE New Features
 
  • Added new Generic APIs for Axpby (cusparseAxpby), Scatter (cusparseScatter), Gather (cusparseGather), Givens rotation (cusparseRot). __nv_bfloat16/ __nv_bfloat162 data types and 64-bit indices are also supported.
  • This release adds the following features for cusparseSpMM:
     
    • Support for row-major layout for cusparseSpMM for both CSR and COO format
    • Support for 64-bit indices
    • Support for __nv_bfloat16 and __nv_bfloat162 data types
    • Support for the following strided batch mode:
      • Ci=A⋅Bi
      • Ci=Ai⋅B
      • Ci=Ai⋅Bi
cuFFT New Features
 
  • cuFFT now accepts __nv_bfloat16 input and output data type for power-of-two sizes with single precision computations within the kernels.