How to Hack and Speed Up Direct3D 11 Render Calls

DirectX logo

Humus has explained a cool hack that allows to speed up a bit the Direct3D 11 render calls. The ID3D11DeviceContext is an abstract class with pure virtual functions. The principle of the hack is to directly access to real function pointer avoiding indirections due to the virtual table. For example, here how we can accelerate the function ID3D11DeviceContext::DrawIndexed().

In the virtual table of the ID3D11DeviceContext object, the entry index of DrawIndexed 12 because there are 12 function pointers before: DrawIndexed is the 6th member declared in ID3D11DeviceContext. ID3D11DeviceContext inherits from ID3D11DeviceChild which has 4 virtual functions and which in turn inherits from IUnknown which has 3: DrawIndexed is the 13th function of ID3D11DeviceContext so its zero-based index is 12.

typedef void (STDMETHODCALLTYPE *DrawIndexed_func) 
 (ID3D11DeviceContext *, UINT, UINT, INT); 

static DrawIndexed_func fast_DrawIndexed = 0;

ID3D11DeviceContext* ctx

void **virtual_table = *(void ***)ctx; 
fast_DrawIndexed = (DrawIndexed_func)(virtual_table[12]); 

fast_DrawIndexed(ctx, num_faces, 0, 0);

This hack allows to reduce the assembly code by 2 instructions: 6 instructions against 8 for the standard call.

I quickly tested this hack in the D3D11 render path of MSI Kombustor and it works fine. I didn’t noticed a real difference in speed, but it should be faster because of smaller number of instructions.

Here is a overview of the ID3D11DeviceContext class hierarchy:

    virtual HRESULT STDMETHODCALLTYPE QueryInterface(......) = 0; 
    virtual ULONG STDMETHODCALLTYPE AddRef(......) = 0; 
    virtual ULONG STDMETHODCALLTYPE Release(......) = 0; 

ID3D11DeviceChild : public IUnknown
    virtual void STDMETHODCALLTYPE GetDevice(......) = 0; 
    virtual HRESULT STDMETHODCALLTYPE GetPrivateData(......) = 0; 
    virtual HRESULT STDMETHODCALLTYPE SetPrivateData(......) = 0; 
    virtual HRESULT STDMETHODCALLTYPE SetPrivateDataInterface(......) = 0; 

ID3D11DeviceContext : public ID3D11DeviceChild
    virtual void STDMETHODCALLTYPE VSSetConstantBuffers(......) = 0; 
    virtual void STDMETHODCALLTYPE PSSetShaderResources(......) = 0; 
    virtual void STDMETHODCALLTYPE PSSetShader(......) = 0; 
    virtual void STDMETHODCALLTYPE PSSetSamplers(......) = 0; 
    virtual void STDMETHODCALLTYPE VSSetShader(......) = 0; 
    virtual void STDMETHODCALLTYPE DrawIndexed(......) = 0; 



  • hornet

    …will only be faster if you’re actually CPU limited. Guessing your GPU-testing app is GPU-limited, assuming your wrote it well :)

  • susheel

    2 instructions isn’t very much. Maybe not worth it. What is the cost of those two instructions in real time terms? On modern CPUs most instructions like mov are very very fast.

    Then the question arises about portability. Considering this code works with a 64bit compiler, what about 32bit? Also different compilers *could* manage the VTable differently. Will this work with with other compilers?

    99% of the bottleneck in speed (if there is one) with regards to the Direct3D Draw*() functions is data transfer across to the gfx card. If there is indeed a bottle-neck why not batch and try to minimize draw calls?

    On a side note — Direct3D pointers are COM based and should not directly be typecast. This might work in this case but generalizing this behavior for other functions might lead to unexpected and catastrophic failure — since we do not know how D3D internally manages it’s reference counted COM pointers.