« on: September 15, 2016, 04:29:20 PM »
This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.
Last week I spent working on the glmark2 performance issues. I now have a NIR patch out for the pathological conditionals test (it's now faster than on the old driver), and a branch for job shuffling (+17% and +27% on the two desktop tests).
Here's the basic idea of job shuffling:
We're a tiled renderer, and tiled renderers get their wins from having a Clear at the start of the frame (indicating we don't need to load any previous contents into the tile buffer). When your frame is done, we flush each tile out to memory. If you do your clear, start rendering some primitives, and then switch to some other FBO (because you're rendering to a texture that you're planning on texturing from in your next draw to the main FBO), we have to flush out all of those tiles, start rendering to the new FBO, and flush its rendering, and then when you come back to the main FBO and we have to reload your old cleared-and-a-few-draws tiles.
Job shuffling deals with this by separating the single GL command stream into separate jobs per FBO. When you switch to your temporary FBO, we don't flush the old job, we just set it aside. To make this work we have to add tracking for which buffers have jobs writing into them (so that if you try to read those from another job, we can go flush the job that wrote it), and which buffers have jobs reading from them (so that if you try to write to them, they can get flushed so that they don't get incorrectly updated contents).
MSI is celebrating its 30th anniversary as a leading manufacturer of innovative PC hardware. During the past 30 years, MSI has earned a reputation for providing products featuring cutting edge technology and striving to create and use only the best quality components.
To celebrate this milestone, MSI has created an exclusive limited edition graphics card, combining the excellence of MSI GAMING graphics cards with a unique custom designed EK waterblock for this anniversary edition. The exceptionally classy waterblock features infused RGB LED lights that can be set to any of 16.8 million colors by using the MSI Gaming App.
At the heart of this exclusive card is NVIDIA’s GeForce® GTX 1080 GPU to provide all the power you need at up to 4K resolution gaming. The card comes fully assembled in a closed loop liquid cooling configuration that is covered by warranty and maintenance-free. Enclosed in the exquisite and sturdy wooden box is a small gift which is perfect for enjoying the latest epic games in full comfort.
Now that DX11 has given us UAVs in all the other shading stages as well, I decided to try the equivalent for the vertex cache. By “Vertex Cache”, I mean the Post-transform vertex re-use cache. That is, the thing which enables us to re-use vertex shading results across duplicated vertices in a mesh.
Using UAVs in a VS, we can use SV_VertexID to do an atomic increment into a buffer containing one counter for each vertex. An atomic inc is necessary here because we don’t actually know what the vertex distribution algorithm is, and we could theoretically process a given vert in more than one VS thread simultaneously. For that matter, HW could simply be duplicating all the verts. We won’t know until we’ve looked at the results. Using this approach, we end up with a buffer telling us the exact number of times that each vert was processed during the draw. From this, we can directly calculate the ACMR (average cache miss ratio) of the mesh.
This code accompanies the research paper "Masked Software Occlusion Culling", and implements an efficient alternative to the hierarchical depth buffer algorithm. Our algorithm decouples depth values and coverage, and operates directly on the hierarchical depth buffer. It lets us efficiently parallelize both coverage computations and hierarchical depth buffer updates.
This code is mainly optimized for the AVX2 instruction set, and some AVX specific instructions are required for best performance. However, we also provide SSE 4.1 and SSE 2 implementations for backwards compatibility. The appropriate implementation will be chosen during run-time based on the CPU's capabilities.
Gathering petabytes of data about your customers is cool, but how can you take advantage of this data? BlazingDB lets you run high-performance SQL on a database using a ton of GPUs.
Relying on GPUs for a database is quite interesting. GPUs can run a ton of tasks in parallel and present a clear advantage for very specific tasks. In particular, companies have been using GPUs a lot lately for image processing and machine learning applications — but it’s the first time I’m hearing about taking advantage of GPUs for databases.
That’s where BlazingDB shines. You can do sums, use predicates and run through many, many database entries in little time. The company just started accepting customers in June 2016, and there are already big Fortune 100 companies that want to use BlazingDB.
The design of iBow docking allows you to replace graphics cards easily according to your requirements to enhance the graphics experience. iBow was developed to accommodate the largest video cards currently available in the market.
Git is hard: screwing up is easy, and figuring out how to fix your mistakes is fucking impossible. Git documentation has this chicken and egg problem where you can't search for how to get yourself out of a mess, unless you already know the name of the thing you need to know about in order to fix your problem.
So here are some bad situations I've gotten myself into, and how I eventually got myself out of them in plain english.
The new DOOM is a perfect addition to the franchise, using the new id Tech 6 engine where ex-Crytek Tiago Sousa now assumes the role of lead renderer programmer after John Carmack’s departure.
Historically id Software is known for open-sourcing their engines after a few years, which often leads to nice remakes and breakdowns. Whether this will stand true with id Tech 6 remains to be seen but we don’t necessarily need the source code to appreciate the nice graphics techniques implemented in the engine.
Unlike most Windows games released these days, DOOM doesn’t use Direct3D but offers an OpenGL and Vulkan backend.
Vulkan being the new hot thing and Baldur Karlsson having recently added support for it in RenderDoc, it was hard resisting picking into DOOM internals. The following observations are based on the game running with Vulkan on a GTX 980 with all the settings on Ultra, some are guesses others are taken from the Siggraph presentation by Tiago Sousa and Jean Geffroy.
Zepto ransomware is a relatively new player in the ransomware scene, and it’s closely related to the infamous Locky ransomware. Taking a closer look at Zepto’s code, we found that the code is pretty much the same as Locky’s code, but it has been slightly modified. The malware authors behind Zepto use the same methods used to spread Locky, and even the infection vector and the TOR payment page are the same, which makes us think that the people behind Locky are now spreading Zepto. The only difference between Locky and Zepto is the ransom demand. Zepto’s demand is much higher than Locky’s, 3 Bicoins (approximately $1,850).
Security researchers have discovered a sophisticated strain of malware which has shifted across platforms in order to target Mac OS X users.
This week, Kaspersky Lab security experts revealed the existence of Backdoor.OSX.Mokes, an OS X-based variation of the Mokes malware family which was discovered back in January.
According to the team, the malicious code is now able to operate on all major operating systems including Windows, Linux and Mac.
Stefan Ortloff, a researcher with Kaspersky Lab's Global Research and Analysis Team, says the sample which was investigated by the team came unpacked, but he suspects that versions in the wild are packed, just like other OS variants of the malware.
The new strain of malware is written in C++ using the cross-platform application framework Qt, and is linked to OpenSSL.
When executed for the first time, the malicious code copies itself to a variety of system library locations, hiding away in folders belonging to apps and services including Skype, Google, Firefox and the App Store. Mokes then tampers with the PC to achieve persistence and connects to the C&C server using HTTP on TCP port 80.
Data center workloads are changing. Not long ago these systems were primarily used to handle storage and serve up web pages, but now they’re increasingly tasked with AI workloads like understanding speech, text, images and video or analyzing big data for insights.
Billions of consumers want instant answers to a multitude of questions, while enterprise companies want to analyze mountains of data to better serve their customers’ needs. Where do those answers come from? Data centers.
As a leader in server systems, IBM saw this trend coming several years ago, and partnered with us to accelerate new data center workloads. After four years of development, IBM today introduced its Power System S822LC for High Performance Computing powered by NVIDIA Tesla P100 GPUs and NVLink to facilitate high-performance analytics and enable deep learning on ever increasing mountains of data.
What is USD?
Pipelines capable of producing computer graphics films and games typically generate, store, and transmit great quantities of 3D data, which we call "scene description". Each of many cooperating applications in the pipeline (modeling, shading, animation, lighting, fx, rendering) typically has its own special form of scene description tailored to the specific needs and workflows of the application, and neither readable nor editable by any other application. Universal Scene Description (USD) is the first publicly available software that addresses the need to robustly and scalably interchange and augment arbitrary 3D scenes that may be composed from many elemental assets.
USD provides for interchange of elemental assets (e.g. models) or animations. But unlike other interchange packages, USD also enables assembly and organization of any number of assets into virtual sets, scenes, and shots, transmit them from application to application, and non-destructively edit them (as overrides), with a single, consistent API, in a single scenegraph. USD provides a rich toolset for reading, writing, editing, and rapidly previewing 3D geometry and shading. In addition, because USD's core scenegraph and "composition engine" are agnostic of 3D, USD can be extended in a maintainable way to encode and compose data in other domains.