GPU Caps Viewer 1.43.1.0 vk_phong_lighting2 background flicker + hang.

Started by Dorian, December 05, 2019, 02:30:14 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Dorian

Hi,

In the test mentioned in title I see flickering background on Intel Skylake gfx, and gpu hang after some time.
On NVidia 1080 there is no flicker, but after some time torus stops spinning (fps counter updates).

I couldn't attach validation layers due to bug mentioned in other topic (truncate handles to 32-bit),
but I've done an experiment of trying this demo with GeeXLab_0.29.7.0_FREE_win64 (copied whole gxldemos folder)

Launching:
GeeXLab.exe /no_menubar /demofile="./gxldemos/vk-lighting/main2.xml"

Gives following errors in Khronos validation layer (from SDK 1.1.126.0):

VUID-vkBeginCommandBuffer-commandBuffer-00049(ERROR / SPEC): msgNum: 0 - Calling vkBeginCommandBuffer() on active VkCommandBuffer 0x2b5c3ed00c0[] before it has completed. You must check command buffer fence before this call. The Vulkan spec states: commandBuffer must not be in the recording or pending state. (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-vkBeginCommandBuffer-commandBuffer-00049)
    Objects: 1
       [0]  0x2b5c3ed00c0, type: 6, name: NULL
VUID-vkBeginCommandBuffer-commandBuffer-00049(ERROR / SPEC): msgNum: 0 - Calling vkBeginCommandBuffer() on active VkCommandBuffer 0x2b5c3ed00c0[] before it has completed. You must check command buffer fence before this call. The Vulkan spec states: commandBuffer must not be in the recording or pending state. (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-vkBeginCommandBuffer-commandBuffer-00049)
    Objects: 1
       [0]  0x2b5c3ed00c0, type: 6, name: NULL


Flicker and hang is gone if I enforce vkQueueWaitIdle after each vkQueueSubmit. So either error is directly recoding unfinished command buffer or some other synchronization error.

Dorian

Looking at api_dump of presentation commands:

Thread 0, Frame 1315:
vkQueueSubmit(queue, submitCount, pSubmits, fence) returns VkResult VK_SUCCESS (0):
    queue:                          VkQueue                          = 000001C100AB5C30
    submitCount:                    uint32_t                         = 1
    pSubmits:                       const VkSubmitInfo*              = 000001C100942ED8
        pSubmits[0]:                    const VkSubmitInfo               = 000001C100942ED8:
            sType:                          VkStructureType                  = VK_STRUCTURE_TYPE_SUBMIT_INFO (4)
            pNext:                          const void*                      = NULL
            waitSemaphoreCount:             uint32_t                         = 1
            pWaitSemaphores:                const VkSemaphore*               = 000001C100942F98
                pWaitSemaphores[0]:             const VkSemaphore                = 000001C1009FBAC0
            pWaitDstStageMask:              const VkPipelineStageFlags*      = 00000057424FF140
                pWaitDstStageMask[0]:           const VkPipelineStageFlags       = 1024 (VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT)
            commandBufferCount:             uint32_t                         = 1
            pCommandBuffers:                const VkCommandBuffer*           = 000001C109934358
                pCommandBuffers[0]:             const VkCommandBuffer            = 000001C1098CA460
            signalSemaphoreCount:           uint32_t                         = 0
            pSignalSemaphores:              const VkSemaphore*               = NULL
    fence:                          VkFence                          = 000001C1097B3690

Thread 0, Frame 1315:
vkQueuePresentKHR(queue, pPresentInfo) returns VkResult VK_SUCCESS (0):
    queue:                          VkQueue                          = 000001C100AB5C30
    pPresentInfo:                   const VkPresentInfoKHR*          = 00000057424FF600:
        sType:                          VkStructureType                  = VK_STRUCTURE_TYPE_PRESENT_INFO_KHR (1000001001)
        pNext:                          const void*                      = NULL
        waitSemaphoreCount:             uint32_t                         = 0
        pWaitSemaphores:                const VkSemaphore*               = NULL
        swapchainCount:                 uint32_t                         = 1
        pSwapchains:                    const VkSwapchainKHR*            = 000001C17DEAAD08
            pSwapchains[0]:                 const VkSwapchainKHR             = 000001C100946560
        pImageIndices:                  const uint32_t*                  = 000001C17DEAAD20
            pImageIndices[0]:               const uint32_t                   = 1
        pResults:                       VkResult*                        = NULL


There is no pSignalSemaphores in vkQueueSubmit, which should be passed to vkQueuePresentKHR as pWaitSemaphores.

Spec quote: https://vulkan.lunarg.com/doc/view/1.0.33.0/linux/vkspec.chunked/ch29s06.html:
QuoteThe processing of the presentation happens in issue order with other queue operations, but semaphores have to be used to ensure that prior rendering and other commands in the specified queue complete before the presentation begins.

JeGX

Yes Vulkan synchronization is tricky because I want to get the fastest possible frame loop. When I use semaphores to sync presentation, it works, but it's slower and sometimes it's slower than OpenGL!  I will update GeeXLab with a modified version of the Vulkan plugin that uses semaphores. So you'll be able to test it.

JeGX

I reproduced the issue (torus stops spinning).  It's weird because I added the semaphore to improve sync robustness and it looks like the performances are not impacted. I was sure I saw some performance penalties in the past.  I will release a new build of GeeXLab tomorrow with semaphore + shader module fix in {64+32}-bit.


Dorian

Thank you JeGX, I see improvement. I don't see hang anymore both on NVidia and Intel.

On Intel I still see flicker (not so often as before) and not on all tests. Torus from CapsViewer flickers, torus from demo pack does not.

On version GeeXLab_0.29.8.1_FREE_win64
When I run:
GeeXLab.exe  /no_menubar /demofile="./demopack-vk/torus/vk/main.xml" with khronos validation (1.1.126.0), I get a single error repeated multiple times (probably every frame).

[...]
UNASSIGNED-CoreValidation-DrawState-QueueForwardProgress(ERROR / SPEC): msgNum: 0 - VkQueue 0x2b820bde2a8[] is waiting on VkSemaphore 0x3047930000000016[] that has no way to be signaled.
    Objects: 1
        [0] 0, type: 6, name: NULL
[...]


If I copy torus example form 1.43.1.0, I get flicker on Intel, and some new errors too:
GeeXLab.exe  /no_menubar /demofile="./gxldemos_capsViewer/vk-lighting/main2.xml"

UNASSIGNED-CoreValidation-DrawState-QueueForwardProgress(ERROR / SPEC): msgNum: 0 - VkQueue 0x1fd665efa98[] is waiting on VkSemaphore 0x3047930000000016[] that has no way to be signaled.
    Objects: 1
        [0] 0, type: 6, name: NULL
VUID-vkBeginCommandBuffer-commandBuffer-00049(ERROR / SPEC): msgNum: 0 - Calling vkBeginCommandBuffer() on active VkCommandBuffer 0x1fd6b26ca68[] before it has completed. You must check command buffer fence before this call. The Vulkan spec states: commandBuffer must not be in the recording or pending state. (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-vkBeginCommandBuffer-commandBuffer-00049)
    Objects: 1
        [0] 0x1fd6b26ca68, type: 6, name: NULL
UNASSIGNED-CoreValidation-DrawState-QueueForwardProgress(ERROR / SPEC): msgNum: 0 - VkQueue 0x1fd665efa98[] is waiting on VkSemaphore 0x3047930000000016[] that has no way to be signaled.
    Objects: 1
        [0] 0, type: 6, name: NULL
VUID-vkBeginCommandBuffer-commandBuffer-00049(ERROR / SPEC): msgNum: 0 - Calling vkBeginCommandBuffer() on active VkCommandBuffer 0x1fd6b26ca68[] before it has completed. You must check command buffer fence before this call. The Vulkan spec states: commandBuffer must not be in the recording or pending state. (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-vkBeginCommandBuffer-commandBuffer-00049)
    Objects: 1
        [0] 0x1fd6b26ca68, type: 6, name: NULL
UNASSIGNED-CoreValidation-DrawState-QueueForwardProgress(ERROR / SPEC): msgNum: 0 - VkQueue 0x1fd665efa98[] is waiting on VkSemaphore 0x3047930000000016[] that has no way to be signaled.
    Objects: 1
        [0] 0, type: 6, name: NULL
[...]
Multiple UNASSIGNED-CoreValidation-DrawState-QueueForwardProgress
[...]

UNASSIGNED-CoreValidation-DrawState-QueueForwardProgress(ERROR / SPEC): msgNum: 0 - VkQueue 0x1fd665efa98[] is waiting on VkSemaphore 0x3047930000000016[] that has no way to be signaled.
    Objects: 1
        [0] 0, type: 6, name: NULL
VUID-vkFreeCommandBuffers-pCommandBuffers-00047(ERROR / SPEC): msgNum: 0 - Attempt to free VkCommandBuffer 0x1fd6b26ca68[] which is in use. The Vulkan spec states: All elements of pCommandBuffers must not be in the pending state (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-vkFreeCommandBuffers-pCommandBuffers-00047)
    Objects: 1
        [0] 0x1fd6b26ca68, type: 6, name: NULL
VUID-vkDestroyFramebuffer-framebuffer-00892(ERROR / SPEC): msgNum: 0 - Cannot call vkDestroyFramebuffer on VkFramebuffer 0xedbd50000000010[] that is currently in use by a command buffer. The Vulkan spec states: All submitted commands that refer to framebuffer must have completed execution (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-vkDestroyFramebuffer-framebuffer-00892)
    Objects: 1
        [0] 0xedbd50000000010, type: 24, name: NULL
VUID-vkDestroyFramebuffer-framebuffer-00892(ERROR / SPEC): msgNum: 0 - Cannot call vkDestroyFramebuffer on VkFramebuffer 0x1f91b40000000011[] that is currently in use by a command buffer. The Vulkan spec states: All submitted commands that refer to framebuffer must have completed execution (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-vkDestroyFramebuffer-framebuffer-00892)
    Objects: 1
        [0] 0x1f91b40000000011, type: 24, name: NULL
VUID-vkDestroyImageView-imageView-01026(ERROR / SPEC): msgNum: 0 - Cannot call vkDestroyImageView on VkImageView 0x948acd0000000008[] that is currently in use by a command buffer. The Vulkan spec states: All submitted commands that refer to imageView must have completed execution (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-vkDestroyImageView-imageView-01026)
    Objects: 1
        [0] 0x948acd0000000008, type: 14, name: NULL
VUID-vkDestroyImageView-imageView-01026(ERROR / SPEC): msgNum: 0 - Cannot call vkDestroyImageView on VkImageView 0xa540ac0000000009[] that is currently in use by a command buffer. The Vulkan spec states: All submitted commands that refer to imageView must have completed execution (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-vkDestroyImageView-imageView-01026)
    Objects: 1
        [0] 0xa540ac0000000009, type: 14, name: NULL




I haven't analyzed api_dump yet.

JeGX

Ok there is still an issue in the command buffer submit / swapchain presentation / sync. I will work on this.

Dorian

I see you fixed this in 0.28.9.2.
I tested is successfully both on NVidia and Intel and it works and no validation errors are printed.
Torus demo project from CapsViewer 1.43.1.0 also works if launched with this SDK.

Thanks JeGX!

JeGX

Cool to know it's fixed now.
I will update GPU Caps Viewer in the next days.