Recent Posts

Pages: 1 ... 4 5 [6] 7 8 ... 10
Some notes on implementing ARB_shader_storage_buffer OpenGL extension in Mesa and the Intel i965 driver.

In my previous post I introduced ARB_shader_storage_buffer, an OpenGL 4.3 feature that is coming soon to Mesa and the Intel i965 driver. While that post focused on explaining the features introduced by the extension, in this post I’ll dive into some of the implementation aspects, for those who are curious about this kind of stuff. Be warned that some parts of this post will be specific to Intel hardware.


Another interesting thing we had to deal with are address alignments. UBOs work with layout std140. In this setup, elements in the UBO definition are aligned to 16-byte boundaries (the size of a vec4). It turns out that GPUs can usually optimize reads and writes to multiples of 16 bytes, so this makes sense, however, as I explained in my previous post, SSBOs also introduce a packed layout mode known as std430.

Intel hardware provides a number of messages that we can use through the Data Port interface to write to memory. Each message has different characteristics that makes it more suitable for certain scenarios, like the pixel mask I discussed before. For example, some of these messages have the capacity to write data in chunks of 16-bytes (that is, they write vec4 elements, or OWORDS in the language of the technical docs). One could think that these messages are great when you work with vector data types, however, they also introduce the problem of dealing with partial writes: what happens when you only write to an element of a vector? or to a buffer variable that is smaller than the size of a vector? what if you write columns in a row_major matrix? etc

In these scenarios, using these messages introduces the need to mask the writes because you need to disable the channels in the vec4 element that you don’t want to write. Of course, the hardware provides means to do this, we only need to set the writemask of the destination register of the message instruction to select the right channels.

Full post:

A simple introduction to SSBO can be found here:

3D-Tech News Around The Web / (PR) AMD FirePro S9170 server GPU with 32GB memory
« Last post by JeGX on July 08, 2015, 02:39:52 PM »
AMD FirePro S9170 Server GPU Offers Unmatched Onboard Memory to Support Large Dataset Computations.

AMD (NASDAQ: AMD) today announced the new AMD FirePro™ S9170 server GPU, the world’s first and fastest 32GB single-GPU server card for DGEMM heavy double-precision workloads1, with support for OpenCL™ 2.0. Based on the second-generation AMD Graphics Core Next (GCN) GPU architecture, this new addition to the AMD FirePro™ server GPU family is capable of delivering up to 5.24 TFLOPS of peak single precision compute performance while enabling full throughput double precision performance, providing up to 2.62 TFLOPS of peak double precision performance.

Designed with compute-intensive workflows in mind, the AMD FirePro S9170 server GPU is ideal for data center managers who oversee clusters within academic or government bodies, oil and gas industries, or deep neural network compute cluster development.

“AMD is recognized as an HPC industry innovator as the graphics provider with the top spot on the November 2014 Green500 List. Today the best GPU for compute just got better with the introduction of the AMD FirePro S9170 server GPU to complement AMD’s impressive array of server graphics offerings for high performance compute environments,” said Sean Burke, corporate vice president and general manager, AMD Professional Graphics group. “The AMD FirePro S9170 server GPU can accelerate complex workloads in scientific computing, data analytics, or seismic processing, wielding an industry-leading 32GB of memory. We designed the new offering for supercomputers to achieve massive compute performance while maximizing available power budgets.”

“There are some HPC workloads which require as much data as possible to stay resident on the device, and so the 32GB of memory provided by AMD FirePro S9170, the largest available on a single GPU, will enable the acceleration of scientific calculations that were previously impossible,” said Simon McIntosh-Smith, head of the Microelectronics Research Group at the University of Bristol. “For example, our new OpenCL version of the SNAP transport code from Los Alamos National Laboratory needs to keep as much data resident on the device as possible, and so the 32GB of memory will let us run problems of a much more interesting size faster than ever before. The large memory, combined with the 320GB/s memory bandwidth and double precision floating point performance, will make the AMD FirePro S9170 server GPU a ‘killer’ solution device for many HPC applications.”


“We have been developing a fully-parallel computational tool based on the AMD GPU heterogeneous computing platform and OpenCL,” said Omid Mahahadi, co-founder and director, Geomechanica Inc. “This tool accurately captures the complex physics of massive mines plus oil and gas fields rapidly and reliably. Thanks to the impressive 32GB of memory of the new cards, we expect to run computations on massive data structures containing tens of millions of data elements. The combination of rapid double-precision operations with the large memory capacity enables accurate, detailed, and reliable computations. A similar performance using CPUs would likely require much higher capital and maintenance costs. Moving forward, we plan to take advantage of the recent features of the OpenCL 2.0 open API to further enhance the performance of our software.”

Full press release:
3D-Tech News Around The Web / Interview with Neil Trevett about the Vulkan API
« Last post by JeGX on July 08, 2015, 02:29:23 PM »
- Vulkan API is more low-level than OpenGL (programmer is responsible for memory and threads management for example), what triggered this decision?

A low level API has simpler drivers.  This means reduced driver overhead – which results in higher performance for CPU limited applications – and fewer differences between multiple GPU vendors’ implementations.  Also, another fundamental advantage of handing the application more control is that the driver has to do less ‘behind the scenes’ management – resulting in much more reliable and predictable performance which doesn’t hit unexpected road bumps as the driver undertakes complex housekeeping tasks.

- What concrete improvements will gamers see, when Vulkan is used by video game studios? Can they expect better performance and better graphics, or is it just about simplifying studios backend work?

For applications that are CPU limited, which happens on desktop, and even more on mobile, end users should notice better performing applications with less stuttering and halting.

- When will Vulkan first version be released?

Vulkan is still on schedule to have specs and implementations before the end of the year.

Full interview:
English forum / Re: How can I show animated character models in my GLSL Hacker
« Last post by JeGX on July 08, 2015, 10:03:33 AM »
Currently GLSL Hacker has no built-in support for animated characters.
I plan to add the support of animation stored in FBX format in the future.

You can also do animation based on morph-targets. For that, there is a minimal support with the
gh_mesh.do_linear_tweening() function that performs linear move of all vertices between two meshes (start and end targets).
The tween mesh is the mesh that is rendered:

Code: [Select]
gh_mesh.do_linear_tweening(start_mesh, end_mesh, tween_mesh, alpha)
NVIDIA has released a driver that brings fixes for the title Sony Vegas Pro.

You can download it from this page:
3D-Tech News Around The Web / NVIDIA Quadro driver 348.27
« Last post by Stefan on July 07, 2015, 05:47:44 PM »
ODE Driver
  • This is the second release for the R346 drivers, the ‘Optimal Drivers for Enterprise’ [ODE].  ODE branches are dedicated to relatively long term stability for ISV certification, OEMs, and Enterprise customers.

New in Release 348.27:
  • OpenGL hardware acceleration on Windows Remote Desktop
  • CUDA 7.0
  • Nview Version 146.33
  • Workstation application compatibility fixes. Please read the release notes for more information on product support, feature limitations, driver fixes and known compatibility issues.
English forum / Re: 200 demos converted and counting...
« Last post by Stefan on July 07, 2015, 05:42:58 PM »
Alps - shader of the week contains 2 typos in line #262.
I wonder why such bugs slip through in Shadertoy but not in GLSL Hacker?

replace Map(p) < .5 ? dist.x = halfwayT : dist.y = halfwayT;
with     Map(p) < .5 ; dist.x = halfwayT ; dist.y = halfwayT;

English forum / How can I show animated character models in my GLSL Hacker
« Last post by asail0712 on July 07, 2015, 02:16:53 PM »
I cant find API from guide.
Can I show animated by GLSL Hacker??
General Discussion / Re: Frequent TDRs with GTX 760
« Last post by Skylark on July 05, 2015, 06:50:34 PM »
Thanks a lot for the pointers to the hotfix driver. Trying it now.

I knew there was bound to be some info on the GeForce forums somewhere but honestly, wading though that large number of posts a day is just too much for me :-) Hence my presence here, I've followed Geeks3D for years and knew someone would have good info.

I'll post again with my findings after a while using the hotfix driver. Thanks again.
Pages: 1 ... 4 5 [6] 7 8 ... 10