Author Topic: Debugging Vulkan driver crash (similar to NVIDIA Aftermath)  (Read 580 times)



0 Members and 1 Guest are viewing this topic.

JeGX

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1430
    • View Profile
    • Geeks3D.com
Debugging Vulkan driver crash (similar to NVIDIA Aftermath)
« on: March 30, 2018, 04:21:57 PM »
Quote
New generation, explcit graphics APIs (Vulkan and DirectX 12) are more efficient, involve less CPU overhead. Part of it is that they don't check most errors. In old APIs (Direct3D 9, OpenGL) every function call was validated internally, returned success of failure code, while driver crash indicated a bug in driver code. New APIs, on the other hand, rely on developer doing the right thing. Of course some functions still return error code (especially ones that allocate memory or create some resource), but those that record commands into a command buffer just return void. If you do something illegal, you can expect undefined behavior. You can use Validation Layers / Debug Layer to do some checks, but otherwise everything may work fine on some GPUs, you may get incorrect result, or you may experience driver crash or timeout (called "TDR"). Good thing is that (contrary to old Windows XP), crash inside graphics driver doesn't cause "blue screen of death" or machine restart. System just restarts graphics hardware and driver, while your program received VK_ERROR_DEVICE_LOST code from one of functions like vkQueueSubmit. Unfortunately, you then don't know which specific draw call or other command caused the crash.

NVIDIA proposed solution for that: they created NVIDIA Aftermath library. It lets you (among other things) record commands that write custom "marker" data to a buffer that survives driver crash, so you can later read it and see which command was successfully executed last. Unfortunately, this library works only with NVIDIA graphics cards and only in D3D11 and D3D12.

I was looking for similar solution for Vulkan. When I saw that Vulkan can "import" external memory, I thought that maybe I could use function vkCmdFillBuffer to write immediate value to such buffer and this way implement the same logic. I then started experimenting with extensions: VK_KHR_get_physical_device_properties_2, VK_KHR_external_memory_capabilities, VK_KHR_external_memory, VK_KHR_external_memory_win32, VK_KHR_dedicated_allocation. I was basically trying to somehow allocate a piece of system memory and import it to Vulkan to write to it as Vulkan buffer. I tried many things: CreateFileMapping + MapViewOfFile, HeapCreate + HeapAlloc and other ways, with various flags, but nothing worked for me. I also couldn't find any description or sample code of how these extensions could be used in Windows to import some system memory as Vulkan buffer.

Everything changed when I learned that creating normal device memory and buffer inside Vulkan is enough! It survives driver crash, so its content can be read later via mapped pointer. No extensions required. I don't think this is guaranteed by specification, but it seems to work on both AMD and NVIDIA cards.

...

Links:
- http://asawicki.info/news_1677_debugging_vulkan_driver_crash_-_equivalent_of_nvidia_aftermath.html
- https://github.com/sawickiap/MISC/blob/master/VulkanAfterCrash.h