Author Topic: Optimizing Vulkan (VKR) and DirectX12 (DXR) Applications Using Nsight Graphics  (Read 1877 times)

0 Members and 1 Guest are viewing this topic.


  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2692
Many GPU performance analysis tools are based on a capture and replay mechanism, where a frame is first captured (either in-memory or to disk), and then replayed multiple times to be profiled. Nsight Graphics: GPU Trace differs in that it directly profiles the frames emitted by a live application, with no constraint on subsequent frames to be identical. This approach makes the tool simpler than replay-based profilers, and therefore less likely to fail as graphics APIs evolve.

In a GDC 2019 talk, we showed how to apply the top-down P3 performance-triage method for optimizing any DX12 GPU workload using GPU Trace. One year later, with the release of Nsight Graphics 2020.2, the tool has evolved substantially. First, it now officially supports the Vulkan API (and all extensions, including VK_NV_ray_tracing). Second, it has a new “Advanced Mode”, which captures additional metrics over subsequent frames and presents them in a single view.

Specifically, in release 2020.2, the Advanced Mode metrics are:

All of the SM Warp-Issue-Stall Reasons (“Why are my warp latencies high?”)
All of the SM Warp-Launch-Stall Reasons (“Why is my warp occupancy low?”)
The L1TEX Hit Rate (“Should I increase the spatial locality of my L1TEX accesses?”)
A L2 Traffic Breakdown by Source Unit (“What are the GPU units that are causing most of the L2/VRAM memory traffic?”)

In this post, we show how to apply the P3 method using a performance-optimization example from Wolfenstein: Youngblood (VKR).

- Optimizing VK/VKR and DX12/DXR Applications Using Nsight Graphics: GPU Trace Advanced Mode Metrics
- Wolfenstein: Youngblood Update Adds Ray-Traced Reflections, NVIDIA DLSS and NVIDIA Highlights

NVIDIA GPU Trace with Wolfenstein: Youngblood RTX