Recent Posts

Pages: 1 ... 4 5 [6] 7 8 ... 10
SetStablePowerState.exe: Disabling GPU Boost on Windows 10 for more deterministic timestamp queries on NVIDIA GPUs

     With all modern graphics APIs (D3D11, D3D12, GL4 and Vulkan), it is possible for an application to query the elapsed GPU time for any given range of render calls by using timestamp queries. Most game engines today are using this mechanism to measure the GPU time spent on a whole frame and per pass. This blog post includes full source code for a simple D3D12 application (SetStablePowerState.exe) that can be run to disable and restore GPU Boost at any time, for all graphics applications running on the system. Disabling GPU Boost helps getting more deterministic GPU times from timestamp queries. And because the clocks are changed at the system level, you can run SetStablePowerState.exe even if your game is using a different graphics API than D3D12. The only requirement is that you use Windows 10 and have the Windows 10 SDK installed.


On some occasions, we have found ourselves confused by the fact that the measured GPU time for a given pass we were working on would change over time, even if we did not make any change to that pass. The GPU times would be stable within a run, but would sometimes vary slightly from run to run. Later on, we learned that this can happen as a side effect of the GPU having a variable Core Clock frequency, depending on the current GPU temperature and possibly other factors such as power consumption. This can happen with all GPUs that have variable frequencies, and can happen with all NVIDIA GPUs that include a version of GPU Boost, more specifically all GPUs based on the Kepler, Maxwell and Pascal architectures, and beyond.

NB: forum doesn't allow longer headlines  ::)
3D-Tech News Around The Web / Raspberry Pi OpenGL performance work
« Last post by JeGX on September 14, 2016, 06:36:48 PM »
Eric Anholt gives some details about upcoming performance boost in the OpenGL driver of the Raspberry Pi. Eric Anholt works for Broadcom on the Raspberry Pi's graphics driver.

Last week I spent working on the glmark2 performance issues.  I now have a NIR patch out for the pathological conditionals test (it's now faster than on the old driver), and a branch for job shuffling (+17% and +27% on the two desktop tests).

Here's the basic idea of job shuffling:

We're a tiled renderer, and tiled renderers get their wins from having a Clear at the start of the frame (indicating we don't need to load any previous contents into the tile buffer).  When your frame is done, we flush each tile out to memory.  If you do your clear, start rendering some primitives, and then switch to some other FBO (because you're rendering to a texture that you're planning on texturing from in your next draw to the main FBO), we have to flush out all of those tiles, start rendering to the new FBO, and flush its rendering, and then when you come back to the main FBO and we have to reload your old cleared-and-a-few-draws tiles.

Job shuffling deals with this by separating the single GL command stream into separate jobs per FBO.  When you switch to your temporary FBO, we don't flush the old job, we just set it aside.  To make this work we have to add tracking for which buffers have jobs writing into them (so that if you try to read those from another job, we can go flush the job that wrote it), and which buffers have jobs reading from them (so that if you try to write to them, they can get flushed so that they don't get incorrectly updated contents).

Complete story:
3D-Tech News Around The Web / MSI GTX 1080 Limited Edition 30th Anniversary
« Last post by JeGX on September 14, 2016, 06:29:34 PM »
MSI is celebrating its 30th anniversary as a leading manufacturer of innovative PC hardware. During the past 30 years, MSI has earned a reputation for providing products featuring cutting edge technology and striving to create and use only the best quality components.

To celebrate this milestone, MSI has created an exclusive limited edition graphics card, combining the excellence of MSI GAMING graphics cards with a unique custom designed EK waterblock for this anniversary edition. The exceptionally classy waterblock features infused RGB LED lights that can be set to any of 16.8 million colors by using the MSI Gaming App.

At the heart of this exclusive card is NVIDIA’s GeForce® GTX 1080 GPU to provide all the power you need at up to 4K resolution gaming. The card comes fully assembled in a closed loop liquid cooling configuration that is covered by warranty and maintenance-free. Enclosed in the exquisite and sturdy wooden box is a small gift which is perfect for enjoying the latest epic games in full comfort.

- Press Release

The CryEngine will add support of Vulkan in version 5.3 (mid-november 2016) and Direct3D 12 multi-GPU support is planned for version 5.4 (late February / GDC 2017).

- CryEngine roadmap in Graphics and Rendering section
- news @
- news @

3D-Tech News Around The Web / Vertex Cache Measurement
« Last post by JeGX on September 14, 2016, 06:14:08 PM »
Now that DX11 has given us UAVs in all the other shading stages as well, I decided to try the equivalent for the vertex cache. By “Vertex Cache”, I mean the Post-transform vertex re-use cache. That is, the thing which enables us to re-use vertex shading results across duplicated vertices in a mesh.

Using UAVs in a VS, we can use SV_VertexID to do an atomic increment into a buffer containing one counter for each vertex. An atomic inc is necessary here because we don’t actually know what the vertex distribution algorithm is, and we could theoretically process a given vert in more than one VS thread simultaneously. For that matter, HW could simply be duplicating all the verts. We won’t know until we’ve looked at the results. Using this approach, we end up with a buffer telling us the exact number of times that each vert was processed during the draw. From this, we can directly calculate the ACMR (average cache miss ratio) of the mesh.

- article
- github
3D-Tech News Around The Web / Masked Software Occlusion Culling Implementation
« Last post by JeGX on September 14, 2016, 06:08:01 PM »
This code accompanies the research paper "Masked Software Occlusion Culling", and implements an efficient alternative to the hierarchical depth buffer algorithm. Our algorithm decouples depth values and coverage, and operates directly on the hierarchical depth buffer. It lets us efficiently parallelize both coverage computations and hierarchical depth buffer updates.

This code is mainly optimized for the AVX2 instruction set, and some AVX specific instructions are required for best performance. However, we also provide SSE 4.1 and SSE 2 implementations for backwards compatibility. The appropriate implementation will be chosen during run-time based on the CPU's capabilities.

- MaskedOcclusionCulling @ github
- Masked Software Occlusion Culling @Intel
Geeks3D's GPU Tools / FurMark released
« Last post by JeGX on September 13, 2016, 03:03:35 PM »
A maintenance release of FurMark is available.

Version - 2016-09-13
+ added command line parameter to enable or disable the dynamic background (/enable_dyn_bkg=1 or
! updated: GPU Shark and GPU-Z 1.11.0.
! updated: ZoomGPU 1.19.3 (GPU monitoring library).

3D-Tech News Around The Web / HWiNFO32 and HWiNFO64 v5.36 released
« Last post by Stefan on September 13, 2016, 11:22:51 AM »
Changes in HWiNFO32 & HWiNFO64 v5.36 - Released on:  Sep-13-2016: 
  • Enhanced sensor monitoring on ASUS 970 Pro Gaming/Aura.
  • Fixed recognition of AMD Radeon RX 460.
  • Added support of Areca SAS/SATA RAID controllers.
  • Fixed some sensor values for ASUS STRIX X99, X99-DELUXE II, RAMPAGE V EDITION 10 and X99-A II boards.
  • Fixed GPU Voltage monitoring with On Semi NCP81022.
  • Improved appearance on Hi-Res displays with custom scaling.
  • Fixed memory timings for Apollo Lake.
  • Improved support of Gemini Lake.
  • Fixed and improved support of Polaris 11 family.
  • Added monitoring of GPU Memory Errors for ECC-capable NVIDIA GPUs.
  • Added option to specify individual polling rate for EC sensor.
  • Enhanced support of Intel Kaby Lake.
BlazingDB is an extremely fast SQL database able to handle petabyte scale. BlazingDB requires a CUDA-enabled GPU with a CUDA compute capability of 3.0 or higher.

Gathering petabytes of data about your customers is cool, but how can you take advantage of this data? BlazingDB lets you run high-performance SQL on a database using a ton of GPUs.


Relying on GPUs for a database is quite interesting. GPUs can run a ton of tasks in parallel and present a clear advantage for very specific tasks. In particular, companies have been using GPUs a lot lately for image processing and machine learning applications — but it’s the first time I’m hearing about taking advantage of GPUs for databases.


That’s where BlazingDB shines. You can do sums, use predicates and run through many, many database entries in little time. The company just started accepting customers in June 2016, and there are already big Fortune 100 companies that want to use BlazingDB.

3D-Tech News Around The Web / iBow docking station: Boost your Mac with Extra Graphics!
« Last post by JeGX on September 12, 2016, 05:14:35 PM »
iBow is a new project on kickstarter: it's a docking station with desktop grade graphics card for your 13/15-inch Macbook Pro and your Mac Mini.

The design of iBow docking allows you to replace graphics cards easily according to your requirements to enhance the graphics experience. iBow was developed to accommodate the largest video cards currently available in the market.


Pages: 1 ... 4 5 [6] 7 8 ... 10