NVIDIA Ampere GA100 GPU: 8192 CUDA Cores and 54-Billion Transistors


NVIDIA Ampere GPU

NVIDIA’s boss has unveiled in his kitchen (better than in the toilet) the new A100 GPU based on the Ampere architecture. The A100 GPU is the Tensor Core GPU implementation of the full GA100 GPU. The A100 does not have RT cores (Ray Tracing cores) and is focused on datacenters. The GA100 has RT cores but this number is not known yet.

Full Ampere GA100 GPU specifications:

  • GA100 GPU built on a 7nm manufacturing process
  • 54-billion transistors
  • 8192 CUDA cores
  • 128 SMs (64 CUDA cores per SM)
  • Tensor cores: 512 (4 tensor cores per SM)
  • Third Generation Tensor Core (TensorFloat-32 TF32 Tensor Core)
  • New Bfloat16 (BF16)/FP32 mixed-precision Tensor Core operations
  • FP32 performance: 23 TFLOPS
  • FP64 performance: 11.5 TFLOPS (FP64 = 1/2 * FP32)
  • Memory: 48GB HBM2 – memory bus width: 6144-bit (6 HBM2 stacks, 12 512-bit memory controllers)
  • CUDA Compute Capability: 8.0

NVIDIA Ampere GA100 full GPU architecture:
NVIDIA Ampere GA100 full GPU architecture

NVIDIA Ampere GA100 streaming multiprocessor (SM):
NVIDIA Ampere GA100 streaming multiprocessor (SM)

 
A100 Tensor Core GPU specifications:

  • GA100 GPU built on a 7nm manufacturing process
  • 54-billion transistors
  • 6912 CUDA cores
  • 108 SMs (64 CUDA cores per SM)
  • Tensor cores: 432 (4 tensor cores per SM)
  • FP32 performance: 19.5 TFLOPS
  • FP64 performance: 9.7 TFLOPS (FP64 = 1/2 * FP32)
  • Memory: 40GB HBM2 – memory bus width: 5120-bit (5 HBM2 stacks, 10 512-bit memory controllers)
  • CUDA Compute Capability: 8.0
  • TDP: 400W

 
Links:





3 thoughts on “NVIDIA Ampere GA100 GPU: 8192 CUDA Cores and 54-Billion Transistors”

  1. NV

    https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf

    NVIDIA A100 Tensor Core GPU Architecture whitepaper.

    https://devblogs.microsoft.com/directx/directx-heart-linux/

    “What about the most popular compute API out there today, you ask?

    We are pleased to announce that NVIDIA CUDA acceleration is also coming to WSL! CUDA is a cross-platform API and can communicate with the GPU through either the WDDM GPU abstraction on Windows or the NVIDIA GPU abstraction on Linux.

    We worked with NVIDIA to build a version of CUDA for Linux that directly targets the WDDM abstraction exposed by /dev/dxg. This is a fully functional version of libcuda.so which enables acceleration of CUDA-X libraries such as cuDNN, cuBLAS, TensorRT.

    Support for CUDA in WSL will be included with NVIDIA’s WDDMv2.9 driver. Similar to D3D12 support, support for the CUDA API will be automatically installed and available on any glibc-based WSL distro if you have an NVIDIA GPU. The libcuda.so library gets deployed on the host alongside libd3d12.so, mounted and added to the loader search path using the same mechanism described previously.

    In addition to CUDA support, we are also bringing support for NVIDIA-docker tools within WSL. The same containerized GPU workload that executes in the cloud can run as-is inside of WSL. The NVIDIA-docker tools will not be pre-installed, instead remaining a user installable package just like today, but the package will now be compatible and run in WSL with hardware acceleration.

    For more details and the latest on the upcoming NVIDIA CUDA support in WSL, please visit http://developer.nvidia.com/cuda/wsl.”

    WDDM 2.9 support coming to Windows and Nvidia WDDM 2.9 driver coming in the future .

  2. Stefan

    A100 comes in at least 3 variations 😉

    NVIDIA_DEV.20B0 = “NVIDIA GRID A100X”
    NVIDIA_DEV.20BE = “NVIDIA GRID A100A”
    NVIDIA_DEV.20BF = “NVIDIA GRID A100B”

Leave a Comment

Your email address will not be published. Required fields are marked *