Author Topic: NVIDIA Turing GPU Architecture in-depth  (Read 2064 times)

0 Members and 1 Guest are viewing this topic.


  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2217
NVIDIA Turing GPU Architecture in-depth
« on: September 14, 2018, 06:29:26 PM »
A 87-page PDF document that covers the Turing GPU architecture:


New Streaming Multiprocessor (SM)
Turing introduces a new processor architecture, the Turing SM, that delivers a dramatic boost in
shading efficiency, achieving 50% improvement in delivered performance per CUDA Core
compared to the Pascal generation. These improvements are enabled by two key architectural
changes. First, the Turing SM adds a new independent integer datapath that can execute
instructions concurrently with the floating-point math datapath. In previous generations,
executing these instructions would have blocked floating-point instructions from issuing. Second,
the SM memory path has been redesigned to unify shared memory, texture caching, and memory
load caching into one unit. This translates to 2x more bandwidth and more than 2x more capacity
available for L1 cache for common workloads.


Turing Tensor Cores
Tensor Cores are specialized execution units designed specifically for performing the tensor /
matrix operations that are the core compute function used in Deep Learning. Similar to Volta
Tensor Cores, the Turing Tensor Cores provide tremendous speed-ups for matrix computations at
the heart of deep learning neural network training and inferencing operations. Turing GPUs
include a new version of the Tensor Core design that has been enhanced for inferencing. Turing
Tensor Cores add new INT8 and INT4 precision modes for inferencing workloads that can tolerate
quantization and don’t require FP16 precision. Turing Tensor Cores bring new deep learningbased
AI capabilities to GeForce gaming PCs and Quadro-based workstations for the first time. A
new technique called Deep Learning Super Sampling (DLSS) is powered by Tensor Cores. DLSS
leverages a deep neural network to extract multidimensional features of the rendered scene and
intelligently combine details from multiple frames to construct a high-quality final image. DLSS
uses fewer input samples than traditional techniques such as TAA, while avoiding the algorithmic
difficulties such techniques face with transparency and other complex scene elements.

Real-Time Ray Tracing Acceleration
Turing introduces real-time ray tracing that enables a single GPU to render visually realistic 3D
games and complex professional models with physically accurate shadows, reflections, and
refractions. Turing’s new RT Cores accelerate ray tracing and are leveraged by systems and
interfaces such as NVIDIA’s RTX ray tracing technology, and APIs such as Microsoft DXR, NVIDIA
OptiX™, and Vulkan ray tracing to deliver a real-time ray tracing experience.