(Demo) Meshlets and Mesh Shaders (Vulkan)


GeeXLab - Mesh Shaders - Meshlets in Vulkan



Downloads

For an introduction to mesh shaders with GeeXLab, here are some links:

 
A mesh shader can output (send to the rasterizer) only a limited number of primitives (point, lines or triangles). For example, on a GeForce RTX 2070, the mesh shader can output a maximum of 256 vertices and 512 primitives:

OpenGL:
– GL_MAX_MESH_OUTPUT_VERTICES_NV: 256
– GL_MAX_MESH_OUTPUT_PRIMITIVES_NV: 512

Vulkan (VkPhysicalDeviceMeshShaderPropertiesNV structure):
– maxMeshOutputVertices: 256
– maxMeshOutputPrimitives: 512

When primitive mode is triangle, the output of a mesh shader is always a small mesh, called a meshlet. The smallest meshlet is a triangle like in this article.

With these limitations (max output vertices and max output primitives), how can we process/render an existing big mesh with a mesh shader?

A solution to render a big mesh with a mesh shader is to split that big mesh in multiple meshlets. After what, each meshlet is processed by a work group.

Example: if the big mesh is decomposed into 10 meshlets, the processing with mesh shaders would be:

num_meshlets = 10
...
...
local first = 0
local num_workgroups = num_meshlets
gh_vk.draw_mesh_tasks(first, num_workgroups)

 
I added in GeeXLab 0.32.0 new functions to generate meshlets and access to meshlet data (vertices and indices). Here is how to generate meshlets for a torus:

mesh = gh_mesh.create_torus(5.0, 1.5, 50)
-- max values for GPU buffer allocation
meshlet_max_vertices = 64 
meshlet_max_triangles = 126 
meshlet_max_indices = meshlet_max_triangles * 3

meshlet_vertices = 64
meshlet_indices = 16x3 -- 16 triangles
gh_mesh.meshlet_generate(mesh, meshlet_vertices, meshlet_indices)

The gh_mesh.meshlet_generate() function generates meshlets using a basic algorithm (the indices buffer is scanned and triangles are added to a meshlet until max vertices or max indices is reached). You can alternatively code your own meshlet generation routine, all you need are the following functions of the GeeXLab API:
gh_mesh.get_face_vertex_indices()
gh_object.get_num_faces()

A possible structure (in C) for a meshlet could be:

struct Meshlet
{
  uint32_t vertices[64];
  uint indices[378]; // up to 126 triangles
  uint32_t vertex_count;
  uint32_t index_count;
};

This structure is used in the demo.

A meshlet does not store the real position of vertices, only indices of vertices in the real vertices list.

Why a maximum of 64 vertices and 126 triangles?

NVIDIA in this article recommends using up to 64 vertices and 126 primitives:

We recommend using up to 64 vertices and 126 primitives. The ‘6’ in 126 is not a typo. The first generation hardware allocates primitive indices in 128 byte granularity and and needs to reserve 4 bytes for the primitive count. Therefore 3 * 126 + 4 maximizes the fit into a 3 * 128 = 384 bytes block. Going beyond 126 triangles would allocate the next 128 bytes. 84 and 40 are other maxima that work well for triangles.

126 primitives give 378 indices. 64 vertices and 378 indices are the max values recommended and these max values are required for memory allocation (for the storage buffer and in the mesh shader). In practice, 42 triangles is more or less the max number of primitives that works fine:

meshlet_vertices = 64
meshlet_indices = 42x3 -- 42 triangles
gh_mesh.meshlet_generate(mesh, meshlet_vertices, meshlet_indices)

In the demo, I set 16 triangles per meshlet:

meshlet_vertices = 64
meshlet_indices = 16x3 -- 16 triangles
gh_mesh.meshlet_generate(mesh, meshlet_vertices, meshlet_indices)

 
Here is the mesh shader used in the demo:

#version 450

#extension GL_NV_mesh_shader : require

layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in;
layout(triangles, max_vertices = 64, max_primitives = 126) out;

//-------------------------------------
// transform_ub: Uniform buffer for transformations
//
layout (std140, binding = 0) uniform uniforms_t
{ 
  mat4 ViewProjectionMatrix;
  mat4 ModelMatrix;
} transform_ub;

//-------------------------------------
// vb: storage buffer for vertices.
//
struct s_vertex
{
	vec4 position;
	vec4 color;
};

layout (std430, binding = 1) buffer _vertices
{
	s_vertex vertices[];
} vb;


//-------------------------------------
// mbuf: storage buffer for meshlets.
//
struct s_meshlet
{
	uint vertices[64];
	uint indices[378]; // up to 126 triangles
	uint vertex_count;
	uint index_count;
};

layout (std430, binding = 2) buffer _meshlets
{
	s_meshlet meshlets[];
} mbuf;


// Mesh shader output block.
//
layout (location = 0) out PerVertexData
{
  vec4 color;
} v_out[];   // [max_vertices]


// Color table for drawing each meshlet with a different color.
//
#define MAX_COLORS 10
vec3 meshletcolors[MAX_COLORS] = {
  vec3(1,0,0), 
  vec3(0,1,0),
  vec3(0,0,1),
  vec3(1,1,0),
  vec3(1,0,1),
  vec3(0,1,1),
  vec3(1,0.5,0),
  vec3(0.5,1,0),
  vec3(0,0.5,1),
  vec3(1,1,1)
};

void main()
{
  uint mi = gl_WorkGroupID.x;
  uint thread_id = gl_LocalInvocationID.x;

  uint vertex_count = mbuf.meshlets[mi].vertex_count;
  for (uint i = 0; i < vertex_count; ++i)
  {
    uint vi = mbuf.meshlets[mi].vertices[i];
    vec4 Pw = transform_ub.ModelMatrix * vb.vertices[vi].position;
    vec4 P = transform_ub.ViewProjectionMatrix * Pw;

    // GL->VK conventions...
    P.y = -P.y;
    P.z = (P.z + P.w) / 2.0;

    gl_MeshVerticesNV[i].gl_Position = P;

    v_out[i].color = vb.vertices[vi].color * vec4(meshletcolors[mi%MAX_COLORS], 1.0);
  }

  uint index_count = mbuf.meshlets[mi].index_count;
  gl_PrimitiveCountNV = uint(index_count) / 3;

  for (uint i = 0; i < index_count; ++i)
  {
    gl_PrimitiveIndicesNV[i] = uint(mbuf.meshlets[mi].indices[i]);
  }
}

 
The demo

The demo is available in the mesh shaders demopack in the geexlab-demopack-mesh-shaders/vk/meshlets/ folder.

The meshlet demo generates meshlets from a mesh torus, fills GPU buffers (a storage buffer for mesh vertices, a second storage buffer for meshlets and an uniform buffer for transformation matrices) and creates pipeline objects (meshlet_pipeline_wireframe and meshlet_pipeline_solid).

The rendering of the meshlets with the mesh shader (FRAME script) is done with piece of Lua code:

gh_vk.descriptorset_bind(meshlet_ds)

local pipeline = meshlet_pipeline_solid
if (wireframe == 1) then
  pipeline = meshlet_pipeline_wireframe
end
gh_vk.pipeline_bind(pipeline)
		
local first = 0
local num_workgroups = num_meshlets_render
gh_vk.draw_mesh_tasks(first, num_workgroups )

 
num_meshlets_render contains the number of meshlets that have to be rendered. This variable can be controlled with a slider in the ImGui interface.

 
In the following screenshot, all 313 meshlets are drawn:
GeeXLab - Mesh Shaders - Meshlets in Vulkan

 
In this screenshot, only 188 meshlets are rendered:
GeeXLab - Mesh Shaders - Meshlets in Vulkan





Leave a Comment

Your email address will not be published. Required fields are marked *