Recent Posts

Pages: [1] 2 3 ... 10
3D-Tech News Around The Web / NVIDIA CUDA Toolkit 11.0.2 final release
« Last post by Stefan on July 07, 2020, 09:20:23 PM »
Download now

 This section summarizes the changes in CUDA 11.0 GA since the 11.0 RC release.

General CUDA
  • Added support for Ubuntu 20.04 LTS on x86_64 platforms.
  • Arm server platforms (arm64 sbsa) are supported with NVIDIA T4 GPUs.
NPP New Features
  • Batched Image Label Markers Compression that removes sparseness between marker label IDs output from LabelMarkers call.
  • Image Flood Fill functionality fills a connected region of an image with a specified new value.
  • Stability and performance fixes to Image Label Markers and Image Label Markers Compression.
nvJPEG New Features
  • nvJPEG allows the user to allocate separate memory pools for each chroma subsampling format. This helps avoid memory re-allocation overhead. This can be controlled by passing the newly added flag NVJPEG_FLAGS_ENABLE_MEMORY_POOLS to the nvjpegCreateEx API.
  • nvJPEG encoder now allow compressed bitstream on the GPU Memory.
cuBLAS New Features
  • cuBLASLt Matrix Multiplication adds support for fused ReLU and bias operations for all floating point types except double precision (FP64).
  • Improved batched TRSM performance for matrices larger than 256.
cuSOLVER New Features
  • Add 64-bit API of GESVD. The new routine cusolverDnGesvd_bufferSize() fills the missing parameters in 32-bit API cusolverDn[S|D|C|Z]gesvd_bufferSize() such that it can estimate the size of the workspace accurately.
  • Added the single process multi-GPU Cholesky factorization capabilities POTRF, POTRS and POTRI in cusolverMG library.
cuSOLVER Resolved Issues
  • Fixed an issue where SYEVD/SYGVD would fail and return error code 7 if the matrix is zero and the dimension is bigger than 25.
cuSPARSE New Features
  • Added new Generic APIs for Axpby (cusparseAxpby), Scatter (cusparseScatter), Gather (cusparseGather), Givens rotation (cusparseRot). __nv_bfloat16/ __nv_bfloat162 data types and 64-bit indices are also supported.
  • This release adds the following features for cusparseSpMM:
    • Support for row-major layout for cusparseSpMM for both CSR and COO format
    • Support for 64-bit indices
    • Support for __nv_bfloat16 and __nv_bfloat162 data types
    • Support for the following strided batch mode:
      • Ci=A⋅Bi
      • Ci=Ai⋅B
      • Ci=Ai⋅Bi
cuFFT New Features
  • cuFFT now accepts __nv_bfloat16 input and output data type for power-of-two sizes with single precision computations within the kernels.
3D-Tech News Around The Web / Screen space shadows
« Last post by JeGX on July 07, 2020, 10:55:54 AM »
After working on Spartan game engine for so long, it became increasingly obvious that there are many interesting things that I could be writing about. However, I kept postponing it as I was growing fond of Hugo and didn’t want to invest any content (or money) on WordPress anymore. The good thing is that I’ve finally found the courage to transition to this slick and fast site you’re browsing now!

I want to start things off with a simple and short blog post, yet have some immediate results we can enjoy. You know, something like the kind of instant gratification you get by watching a Bob Ross episode. An approach which I believe to be one of the most efficient forms of conveying information. So, let’s explore something that with a little bit of effort, might give us just that. Time for some screen space shadows


Loss of small-scale detail when doing shadow mapping is a typical problem, especially with lights that aim to cover a large portion of the scene (like directional lights). As we’ve seen, screen space shadows can help a lot but before we explore them in further detail, let’s see how most of the games we enjoy handle small-scale shadow quality:

- The player is allowed to keep increasing the shadow resolution. It’s a costly approach but it works and it happens to be the most common.
- The player sees lights with very high shadow resolution, during key moments like character close-ups. This approach doesn’t suffer from typical screen space issues but it does involve the hard work of manually tweaking lights, per scene.
- The player gets the extra treatment that is screen space shadows. In some cases, the shadows are even aided by information from other render passes, which helps alleviate some screen space issues even further.

Full article:

Screen space shadow demo

Code: [Select]
// Settings
static const uint  g_sss_steps            = 8;     // Quality/performance
static const float g_sss_ray_max_distance = 0.05f; // Max shadow length
static const float g_sss_tolerance        = 0.01f; // Error in favor of reducing gaps
static const float g_sss_step_length      = g_sss_ray_max_distance / (float)g_sss_steps;

float ScreenSpaceShadows(Surface surface, Light light)
    // Compute ray position and direction (in view-space)
    float3 ray_pos = mul(float4(surface.position, 1.0f), g_view).xyz;
    float3 ray_dir = mul(float4(-light.direction, 0.0f), g_view).xyz;

    // Compute ray step
    float3 ray_step = ray_dir * g_sss_step_length;

    // Ray march towards the light
    float occlusion = 0.0;
    float2 ray_uv   = 0.0f;
    for (uint i = 0; i < g_sss_steps; i++)
        // Step the ray
        ray_pos += ray_step;
        // Compute the difference between the ray's and the camera's depth
        ray_uv            = project_uv(ray_pos, g_projection);
        float depth_z     = get_linear_depth(ray_uv);
        float depth_delta = ray_pos.z - depth_z;
        // If the ray is behind what the camera "sees" (positive depth_delta)
        if (abs(g_sss_tolerance - depth_delta) < g_sss_tolerance)
            // Consider the pixel to be shadowed/occluded
            occlusion = 1.0f;

    // Fade out as we approach the edges of the screen
    occlusion *= screen_fade(ray_uv);
    return 1.0f - occlusion;
mimalloc (pronounced "me-malloc") is a general purpose allocator with excellent performance characteristics. Initially developed by Daan Leijen for the run-time systems of the Koka and Lean languages.  It is a drop-in replacement for malloc and can be used in other programs without code changes.

Main features:

- small and consistent: the library is about 6k LOC using simple and consistent data structures. This makes it very suitable to integrate and adapt in other projects. For runtime systems it provides hooks for a monotonic heartbeat and deferred freeing (for bounded worst-case times with reference counting).

- free list sharding: the big idea: instead of one big free list (per size class) we have many smaller lists per memory "page" which both reduces fragmentation and increases locality -- things that are allocated close in time get allocated close in memory. (A memory "page" in mimalloc contains blocks of one size class and is usually 64KiB on a 64-bit system).

- eager page reset: when a "page" becomes empty (with increased chance due to free list sharding) the memory is marked to the OS as unused ("reset" or "purged") reducing (real) memory pressure and fragmentation, especially in long running programs.

- secure: mimalloc can be built in secure mode, adding guard pages, randomized allocation, encrypted free lists, etc. to protect against various heap vulnerabilities. The performance penalty is usually around 10% on average over our benchmarks.

- first-class heaps: efficiently create and use multiple heaps to allocate across different regions. A heap can be destroyed at once instead of deallocating each object separately.

- bounded: it does not suffer from blowup [1], has bounded worst-case allocation times (wcat), bounded space overhead (~0.2% meta-data, with at most 12.5% waste in allocation sizes), and has no internal points of contention using only atomic operations.

- fast: In our benchmarks (see below), mimalloc outperforms other leading allocators (jemalloc, tcmalloc, Hoard, etc), and usually uses less memory (up to 25% more in the worst case). A nice property is that it does consistently well over a wide range of benchmarks. There is also good huge OS page support for larger server programs.

3D-Tech News Around The Web / OpenVX API for Raspberry Pi
« Last post by JeGX on July 07, 2020, 10:44:38 AM »
Raspberry Pi is excited to bring the Khronos OpenVX 1.3 API to our line of single-board computers.

OpenVX for computer vision

OpenVX™ is an open, royalty-free API standard for cross-platform acceleration of computer vision applications developed by The Khronos Group.

Now with added Raspberry Pi

The Khronos Group and Raspberry Pi have come together to work on an open-source implementation of OpenVX™ 1.3, which passes the conformance on Raspberry Pi. The open-source implementation passes the Vision, Enhanced Vision, & Neural Net conformance profiles specified in OpenVX 1.3 on Raspberry Pi.

- Full Article
- OpenVX sample implementation @ github
- OpenVX homepage

OpenVX logo

Today we released PIX 2006.26 which can be downloaded here. This release contains support for fence signal-wait arrows in GPU captures, document tab behavior improvements, buffer viewer enhancements, and many bugfixes.

- article:
- downloads:
3D-Tech News Around The Web / NVIDIA GeForce RTX 3070 and RTX 3070 Ti Possible Specs
« Last post by JeGX on July 07, 2020, 10:33:14 AM »
Rumored specifications of GeForce RTX 3070 and 3070 Ti:

GeForce RTX 3070
- GPU: GA104-300
- CUDA cores: 2944
- Streaming Multiprocessors: 46
- Memory: 8GB GDDR6, 256-bit

GeForce RTX 3070 Ti
- GPU: GA104-400
- CUDA cores: 3072
- Streaming Multiprocessors: 48
- Memory: 8GB GDDR6, 256-bit

Thanks for the pointer to assimp.  I downloaded assimp and added triangle strips to the AC loader and it works fine now.
3D-Tech News Around The Web / Vulkan API specifications 1.2.146 released
« Last post by Stefan on July 04, 2020, 08:11:51 PM »
Change log for July 3, 2020 Vulkan 1.2.146 spec update:

  * Update release number to 146 for this update.

Github Issues:

  * Fix valid usage generation script for optional bitmasks in a
    non-optional array (public pull request 1228).
  * Add lunr to `package.json` and update the locally cached copy (public
    pull request 1238).
  * Require that newly released extensions have etext:*_SPEC_VERSION `1`
    (public issue 1263).
  * Add to the NOTE in slink:VkPhysicalDeviceIDProperties, advising
    implementations on returning unique pname:deviceUUID values and avoiding
    hardwired values, especially 0 (public issue 1273).
  * Add noscript fallback for HTML output (public pull request 1289).
  * Fix duplicated VUIDs in flink:vkCmdExecuteGeneratedCommandsNV (public
    pull request 1304).
  * Fix link markup in <<ray-traversal, Ray Traversal>> chapter, nested link
    markup, and linear equation markup in
    <<textures-unnormalized-to-integer>> (public pull requests 1305, 1306,

Internal Issues:

  * Add comments to extending enums in the generated API interfaces showing
    which core version and/or extensions provide the enum, matching recent
    changes to show this information for commands and structures (internal
    issue 1431).
  * Only allow code:Invocation memory scope in the
    <<spirvenv-module-validation-standalone, Standalone SPIR-V Validation>>
    section when memory semantics is *None* (internal issue 1782).
  * Make reflow script handle literal block delimiters and lines containing
    only whitespace properly (internal issues 2039, 2042).
  * Clarify definition of <<limits-maxFragmentCombinedOutputResources,
    pname:maxFragmentCombinedOutputResources>> (internal issue 2236).
  * Add missing `errorcodes=` XML attributes for some
    `<<VK_EXT_display_control>>` commands.
  * Clarify <<features-extentperimagetype, allowed extent values based on
    image type>> and the related <<limits-maxImageDimension1D>>,
    <<limits-maxImageDimension2D>>, <<limits-maxImageDimension3D>>,
    <<limits-maxImageDimensionCube>> limits (internal merge request 3922).
  * Remove redundant valid usage statement
    VUID-VkFramebufferCreateInfo-flags-03188 (internal merge request 3934).
  * Update style guide to recommend new extension spec language be contained
    in existing asciidoctor files, unless it's of enough scope to create a
    and new chapter (internal merge request 3955).

New Extensions:

  * `<<VK_EXT_directfb_surface>>` (public pull requests 1292, 1294).
  * `<<VK_EXT_fragment_density_map2>>` (internal merge request 3914).
Geeks3D's GPU Tools / Re: Furmak FPS
« Last post by Max on July 03, 2020, 05:45:35 PM »
I just tried it and its works now, thanks you very much for the support  :)
Geeks3D's GPU Tools / Re: Furmak FPS
« Last post by JeGX on July 03, 2020, 05:39:52 PM »
Curious... I also had some minor issues with FurMark... First time it happens. Seems to be related to driver 451.48.

Did you try to press on the I key? This key allows to hide/show information.
Pages: [1] 2 3 ... 10