Author Topic: NVIDIA CUDA Toolkit 11.0.3 Update 1  (Read 4110 times)

0 Members and 1 Guest are viewing this topic.

Stefan

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4683
NVIDIA CUDA Toolkit 11.0.3 Update 1
« on: August 07, 2020, 05:47:42 PM »
Download now

 New Features
 
  • General CUDA
    • CUDA 11.0 Update 1 is a minor update that is binary compatible with CUDA 11.0. This release will work with all versions of the R450 NVIDIA driver.
    • Added support for SUSE SLES 15.2 on x86_64 and arm64 platforms.
    • A new user stream priority value has been added. This will lower the value of greatestPriority returned from cudaDeviceGetStreamPriorityRange by 1, allowing for applications to create "low, medium, high" priority streams rather than just "low, high".
  • CUDA Compiler
    • NVCC now supports new flags --forward-unknown-to-host-compiler and --forward-unknown-to-host-linker to forward unknown flags to the host compiler and linker, respectively. Please see the nvcc documentation or output of nvcc --help for details.
  • cuBLAS
    • The cuBLAS API was extended with a new function: cublasSetWorkspace(), which allows the user to set the cuBLAS library workspace to a user-owned device buffer, which will be used by cuBLAS to execute all subsequent calls to the library on the currently set stream.
    • The cuBLASLt experimental logging mechanism can be enabled in two ways:
      • By setting the following environment variables before launching the target application:
        • CUBLASLT_LOG_LEVEL=<level> - where level is one of the following levels:
          • "0" - Off - logging is disabled (default)
          • "1" - Error - only errors will be logged
          • "2" - Trace - API calls that launch CUDA kernels will log their parameters and important information
          • "3" - Hints - hints that can potentially improve the application's performance
          • "4" - Heuristics - heuristics log that may help users to tune their parameters
          • "5" - API Trace - API calls will log their parameter and important information
        • CUBLASLT_LOG_MASK=<mask> - while mask is a combination of the following masks:
          • "0" - Off
          • "1" - Error
          • "2" - Trace
          • "4" - Hints
          • "8" - Heuristics
          • "16" - API Trace
        • CUBLASLT_LOG_FILE=<value> - where value is a file name in the format of "<file_name>.%i"; %i will be replaced with the process ID. If CUBLASLT_LOG_FILE is not defined, the log messages are printed to stdout.
      • By using the runtime API functions defined in the cublasLt header:
        • typedef void(*cublasLtLoggerCallback_t)(int logLevel, const char* functionName, const char* message) - A type of callback function pointer.
        • cublasStatus_t cublasLtLoggerSetCallback(cublasLtLoggerCallback_t callback) - Allows to set a call back functions that will be called for every message that is logged by the library.
        • cublasStatus_t cublasLtLoggerSetFile(FILE* file) - Allows to set the output file for the logger. The file must be open and have write permissions.
        • cublasStatus_t cublasLtLoggerOpenFile(const char* logFile) - Allows to give a path in which the logger should create the log file.
        • cublasStatus_t cublasLtLoggerSetLevel(int level) - Allows to set the log level to one of the above mentioned levels.
        • cublasStatus_t cublasLtLoggerSetMask(int mask) - Allows to set the log mask to a combination of the above mentioned masks.
        • cublasStatus_t cublasLtLoggerForceDisable() - Allows to disable to logger for the entire session. Once this API is being called, the logger cannot be reactivated in the current session.
  Resolved Issues
 
  • CUDA Libraries: CURAND
    • Fixed an issue that caused linker errors about the multiple definitions of mtgp32dc_params_fast_11213 and mtgpdc_params_11213_num when including curand_mtgp32dc_p_11213.h in different compilation units.
  • CUDA Libraries: cuBLAS
    • Some tensor core accelerated strided batched GEMM routines would result in misaligned memory access exceptions when batch stride wasn't a multiple of 8.
    • Tensor core accelerated cublasGemmBatchedEx (pointer-array) routines would use slower variants of kernels assuming bad alignment of the pointers in the pointer array. Now it assumes that pointers are well aligned, as noted in the documentation.
  • Math API
    • nv_bfloat16 comparison functions could trigger a fault with misaligned addresses.
    • Performance improvements in half and nv_bfloat16 basic arithmetic implementations.
  • CUDA Tools
    • A non-deterministic hanging issue on calls to cusolverRfBatchSolve() has been resolved.
    • Resolved an issue where using libcublasLt_sparse.a pruned by nvprune caused applications to fail with the error cudaErrorInvalidKernelImage.
    • Fixed an issue that prevented code from building in Visual Studio if placed inside a .cu file.
  Known Issues
 
  • nvJPEG
    • NVJPEG_BACKEND_GPU_HYBRID has an issue when handling bit-streams which have corruption in the scan.
   Deprecations
 None.