NVIDIA Volta GV100 GPU Announced
NVIDIA’s CEO just announced, during GTC 2017 keynote, the new Volta GV100 GPU that powers the Tesla V100 compute accelerator.
In a word, a full GV100 has 5120 CUDA cores, 320 texture units, a 4096-bit HBM2 memory interface and is built with a 12nm FFN process.
A full GV100 GPU consists of six GPCs, 84 Volta SMs, 42 TPCs (each including two SMs), and eight 512-bit memory controllers (4096 bits total). Each SM has 64 FP32 Cores, 64 INT32 Cores, 32 FP64 Cores, and 8 new Tensor Cores. Each SM also includes four texture units.
With 84 SMs, a full GV100 GPU has a total of 5376 FP32 cores, 5376 INT32 cores, 2688 FP64 cores, 672 Tensor Cores, and 336 texture units. Each memory controller is attached to 768 KB of L2 cache, and each HBM2 DRAM stack is controlled by a pair of memory controllers. The full GV100 GPU includes a total of 6144 KB of L2 cache.
Volta architecture comes with a new streaming multiprocessor (Volta SM), a CUDA compute capability 7.0 and introduces the Tensor Cores, the most important feature of the Volta GV100.
Tesla P100 delivered considerably higher performance for training neural networks compared to the prior generation NVIDIA Maxwell and Kepler architectures, but the complexity and size of neural networks have continued to grow. New networks that have thousands of layers and millions of neurons demand even higher performance and faster training times.
New Tensor Cores are the most important feature of the Volta GV100 architecture to help deliver the performance required to train large neural networks. Tesla V100’s Tensor Cores deliver up to 120 Tensor TFLOPS for training and inference applications. Tensor Cores provide up to 12x higher peak TFLOPS on Tesla V100 for deep learning training compared to P100 FP32 operations, and for deep learning inference, up to 6x higher peak TFLOPS compared to P100 FP16 operations. The Tesla V100 GPU contains 640 Tensor Cores: 8 per SM.
More information about the GV100 architectuire can be found in this article: Inside Volta: The World’s Most Advanced Data Center GPU.