Author Topic: NVIDIA CUDA Toolkit 11.1 released  (Read 4906 times)

0 Members and 1 Guest are viewing this topic.


  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4802
NVIDIA CUDA Toolkit 11.1 released
« on: September 24, 2020, 05:35:49 PM »
Download now
  • Added support for NVIDIA Ampere GPU architecture based GA10x GPUs GPUs (compute capability 8.6), including the GeForce RTX-30 series.
  • Enhanced CUDA compatibility across minor releases of CUDA will enable CUDA applications to be compatible with all versions of a particular CUDA major release.
  • CUDA 11.1 adds a new PTX Compiler static library that allows compilation of PTX programs using set of APIs provided by the library. See for details.
  • Added the 7.1 version of the Parallel Thread Execution instruction set architecture (ISA). For more details on new (sm_86 target, mma.sp) and deprecated instructions, see in the PTX documentation.
  • Added support for Fedora 32 and Debian 10.3 Buster on x86_64 platforms.
  • Unified programming model for:
    • async-copy
    • async-pipeline
    • async-barrier (cuda::barrier)
  • Added hardware accelerated sparse texture support.
  • Added support for read-only mapping for cudaHostRegister.
  • Multi-threaded launch to different CUDA streams is supported.
  • CUDA Graphs enhancements:
    • improved graphExec update
    • external dependencies
    • extended memcopy APIs
    • presubmit
  • Introduced new system level interface using /dev based capabilities for cgroups style isolation with MIG.
  • Improved MPS error handling when using multi-GPUs.
  • A fatal GPU exception generated by a Volta+ MPS client will be contained within the devices affected by it and other clients using those devices. Clients running on the other devices managed by the same MPS server can continue running as normal.
  • Users can now configure and query the per-context time slice duration for a GPU via nvidia-smi. Configuring the time slice will require administrator privileges and the allowed settings are default, short, medium and long. The time slice will only be applicable to CUDA applications that are executed after the configuration is applied.
  • Improved detection and reporting of unsupported configurations.