« on: October 24, 2016, 07:00:05 PM »
This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.
So it seems like the EVGA series have got a hotspot, which reaches over 100 °C around the VRAM/VRM area (micron vram only allows up to~95°C). This leads to either the weird black screen bug or to combustion of certain modules around the area.
“The test used in the referenced review from Toms Hardware (Germany) is running under Furmark, an extreme usage case, as most overclockers know. We believe this is a good approach to have some idea about the graphics card limit, and the thermal performance under the worst case scenario. EVGA has performed a similar qualification test during the design process, at a higher ambient temperature (30C in chamber) with a thermal coupler probe directly contacting the key components and after the Toms Hardware (Germany) review, we have retested this again. The results in both tests show the temperature of PWM and memory is within the spec tolerance under the same stress test, and is working as originally designed with no issues.
With this being said, EVGA understands that lower temperatures are preferred by reviewers and customers.
During our recent testing, we have applied additional thermal pads between the backplate and the PCB and between the baseplate and the heatsink fins, with the results shown below. We will offer these optional thermal pads free of charge to EVGA owners who want to have a lower temperature. These thermal pads will be ready soon; and customers can request them on Monday, October 24th, 2016. Also, we will work with Toms Hardware to do a retest.”
This bleeds into GPUOpen. AMD wants to assist developers in taking (some of) the reins for game-GPU optimization. Asked for an introduction to GPUOpen, Koduri told us:
“To get the best performance out of the GPU, the best practices, the best techniques to render shadows, do lighting, draw trees, whatever – there are different ways to do that. But what is the best way to do that? We figured out that that value add kind of moves into engines. It's basically in the game engines, and the games themselves, have to figure out [optimal techniques with new APIs]. They have to do more heavy lifting, figuring out what's the most optimal thing to do.
“The drivers themselves have become very thin. I can't do something super special inside the driver to work around a game's inefficiency and make it better. And we used to do that in Dx11 and before, where when we focus on a particular game and we find that the game isn't doing the most efficient thing for our hardware, we used to have application profiles for each application. You could exactly draw the same thing if you change the particular shaders that they have to something else. We did manual optimization in the drivers. With these low overhead APIs, we can't actually – we don't touch anything, it's just the API, whatever the game passes, it goes to the hardware. There's nothing that we do.
“We have a lot of knowledge in optimization inside AMD, and so do our competitors, so how do we get all of that knowledge easily accessible to the game developers? We have lots of interesting libraries and tools inside AMD. Let's make it accessible to everyone. Let's invite developers to contribute as well, and build this ecosystem of libraries, middleware, tools, and all, that are completely open and would work on not just AMD hardware, but on other people's hardware. The goal is to make every game and every VR experience get the best out of the hardware. We started this portal with that vision and goal, and we had a huge collection of libraries that we [put out]. It's got good traction. It also became a good portal for developers to share best practices. Recently we had nice blogs [...] sharing their techniques and all. More often than not, these blogs have links to source code as well.”
The GPU is a black box for 20 years now. A black box abstracted by very thick APIs, very thick runtimes, very thick voodoo magic. We are trying to get the voodoo magic out of the GPU software stack, and we believe there – there is still voodoo magic in transistors and how we assemble them, and in game engines, compute engines, libraries, the middleware. Voodoo magic in the driver middle-layers is not beneficial to anybody, because it's preventing the widepsread adoption of GPUs.
I just "upgraded" our home network with a Pi-Hole, an interesting project that implements a DNS server with a known-list of ad- and privacy trackers. The result is that everyone on your network that uses that DNS server gets an adblocker for free, without configuration work.
For those looking for some in-depth written explanation, I’ve also decided to write a series of blog posts that should hopefully shed some light on the basics of using SG’s in rendering. The first post provides background material by explaining common approaches to storing pre-computing lighting data in lightmaps and/or probes. The second post focuses on explaining the basics of Spherical Gaussians, and demonstrating some of their more useful properties. The third post explains how the various SG properties can be used to compute diffuse lighting from an SG light source. The fourth post goes even deeper and covers methods for approximating the specular contribution from an SG light source. The fifth post explores some approaches for using SG’s to create a compact approximation of a lighting environment, and compares the results with spherical harmonics. Finally, the sixth posts discusses features present in the the lightmap baking demo that we’ve released on GitHub.
- OpenSSH DSA key generation has been disabled by default. It is important to update OpenSSH keys prior to upgrading. Additionally, Protocol 1 support has been removed.
- OpenSSH has been updated to 7.2p2.
- Wireless support for 802.11n has been added.
- By default, the ifconfig( 8 ) utility will set the default regulatory domain to FCC on wireless interfaces. As a result, newly created wireless interfaces with default settings will have less chance to violate country-specific regulations.
- The svnlite( 1 ) utility has been updated to version 1.9.4.
- The libblacklist( 3 ) library and applications have been ported from the NetBSD Project.
- Support for the AArch64 (arm64) architecture has been added.
- Native graphics support has been added to the bhyve( 8 ) hypervisor.
- Broader wireless network driver support has been added.
Last week I spent working on the glmark2 performance issues. I now have a NIR patch out for the pathological conditionals test (it's now faster than on the old driver), and a branch for job shuffling (+17% and +27% on the two desktop tests).
Here's the basic idea of job shuffling:
We're a tiled renderer, and tiled renderers get their wins from having a Clear at the start of the frame (indicating we don't need to load any previous contents into the tile buffer). When your frame is done, we flush each tile out to memory. If you do your clear, start rendering some primitives, and then switch to some other FBO (because you're rendering to a texture that you're planning on texturing from in your next draw to the main FBO), we have to flush out all of those tiles, start rendering to the new FBO, and flush its rendering, and then when you come back to the main FBO and we have to reload your old cleared-and-a-few-draws tiles.
Job shuffling deals with this by separating the single GL command stream into separate jobs per FBO. When you switch to your temporary FBO, we don't flush the old job, we just set it aside. To make this work we have to add tracking for which buffers have jobs writing into them (so that if you try to read those from another job, we can go flush the job that wrote it), and which buffers have jobs reading from them (so that if you try to write to them, they can get flushed so that they don't get incorrectly updated contents).
MSI is celebrating its 30th anniversary as a leading manufacturer of innovative PC hardware. During the past 30 years, MSI has earned a reputation for providing products featuring cutting edge technology and striving to create and use only the best quality components.
To celebrate this milestone, MSI has created an exclusive limited edition graphics card, combining the excellence of MSI GAMING graphics cards with a unique custom designed EK waterblock for this anniversary edition. The exceptionally classy waterblock features infused RGB LED lights that can be set to any of 16.8 million colors by using the MSI Gaming App.
At the heart of this exclusive card is NVIDIA’s GeForce® GTX 1080 GPU to provide all the power you need at up to 4K resolution gaming. The card comes fully assembled in a closed loop liquid cooling configuration that is covered by warranty and maintenance-free. Enclosed in the exquisite and sturdy wooden box is a small gift which is perfect for enjoying the latest epic games in full comfort.
Now that DX11 has given us UAVs in all the other shading stages as well, I decided to try the equivalent for the vertex cache. By “Vertex Cache”, I mean the Post-transform vertex re-use cache. That is, the thing which enables us to re-use vertex shading results across duplicated vertices in a mesh.
Using UAVs in a VS, we can use SV_VertexID to do an atomic increment into a buffer containing one counter for each vertex. An atomic inc is necessary here because we don’t actually know what the vertex distribution algorithm is, and we could theoretically process a given vert in more than one VS thread simultaneously. For that matter, HW could simply be duplicating all the verts. We won’t know until we’ve looked at the results. Using this approach, we end up with a buffer telling us the exact number of times that each vert was processed during the draw. From this, we can directly calculate the ACMR (average cache miss ratio) of the mesh.
This code accompanies the research paper "Masked Software Occlusion Culling", and implements an efficient alternative to the hierarchical depth buffer algorithm. Our algorithm decouples depth values and coverage, and operates directly on the hierarchical depth buffer. It lets us efficiently parallelize both coverage computations and hierarchical depth buffer updates.
This code is mainly optimized for the AVX2 instruction set, and some AVX specific instructions are required for best performance. However, we also provide SSE 4.1 and SSE 2 implementations for backwards compatibility. The appropriate implementation will be chosen during run-time based on the CPU's capabilities.
Gathering petabytes of data about your customers is cool, but how can you take advantage of this data? BlazingDB lets you run high-performance SQL on a database using a ton of GPUs.
Relying on GPUs for a database is quite interesting. GPUs can run a ton of tasks in parallel and present a clear advantage for very specific tasks. In particular, companies have been using GPUs a lot lately for image processing and machine learning applications — but it’s the first time I’m hearing about taking advantage of GPUs for databases.
That’s where BlazingDB shines. You can do sums, use predicates and run through many, many database entries in little time. The company just started accepting customers in June 2016, and there are already big Fortune 100 companies that want to use BlazingDB.
The design of iBow docking allows you to replace graphics cards easily according to your requirements to enhance the graphics experience. iBow was developed to accommodate the largest video cards currently available in the market.