This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.
Hi. I’m Forrest Smith and welcome to GameDevDaily. GameDevDaily is a curated platform for sharing knowledge with professional game developers. It’s heavily inspired by Mike Acton’s AltDevBlogADay which enabled and inspired many developers, including myself, to start writing.
The streets of downtown Austin, just cleared of music festival attendees and auto racing fans, are now filled with enthusiasts of a different sort. This year the city is host to SC15, the largest event for supercomputing systems and software, and AMD is on site to meet with customers and technology partners. The hardware is here, of course, including industry-leading AMD FirePro™ graphics and the upcoming AMD Opteron™ A1100 64-bit ARM® processor. However, the big story for AMD at the show this year is the Boltzmann Initiative, delivering new software tools to take advantage of the processing power of our products, including those on the future roadmap, like the new “Zen” x86 CPU core coming next year. Ludwig Boltzmann was a theoretical physicist and mathematician who developed critical formulas for predicting the behavior of different forms of matter. Today, these calculations are central to work done by the scientific and engineering communities we are targeting with these tools.
Finally, for applications already developed in CUDA, they can now be ported into C++. This is achieved using the new Heterogeneous-computing Interface for Programmers (HIP) tool that ports CUDA runtime APIs into C++ code. AMD testing shows that in many cases 90 percent or more of CUDA code can be automatically converted into C++ by HIP. The remainder will require manual programming, but this should take a matter of days, not months as before. Once ported, the application could run on a variety of underlying hardware, and enhancements could be made directly through C++. The overall effect would enable greater platform flexibility and reduced development time and cost.
As we can see, ray-tracing and rasterization are not mutually exclusive. Simplified variants of ray-tracing are already used for complex lighting effects in games – implemented completely in shaders using simplified, shader-friendly scene representations. And these effects are where ray-tracing hardware could come in handy to replace or extend these shader-based ray-casting hacks with real ray-tracing. This can still use a simplified scene or even the real, high complex scene.
According to a new report published by 3Dcenter.org, NVIDIA could use GDDR5X memory instead of HBM2 for some of its next-generation Pascal GPUs next year.
In case NVIDIA does decide to go with GDDR5X memory on select Pascal GPUs, it makes sense to expect the consumer oriented cards to feature GDDR5X memory while HBM2 may be reserved for enthusiast grade cards such as the GeForce GTX Titan X successor and the high-end Quadro range.
This tool allows you to visualize, in real-time, in a browser, how complex functions distort the complex plane, like in the Conformal Pictures Wikipedia entry.
The rendered image is created by evaluating the user-supplied function and then using the results of that function to look up a color in an image which is infinitely tiled over the Complex Plane. By changing the expression in the input field, you can visualize how various functions distort the plane.
With the release of Windows* 10 on July 29 and the release of the 6th generation Intel® Core™ processor family (code-name Skylake), we can now look closer into resource binding specifically for Intel® platforms.
The previous article “Introduction to Resource Binding in Microsoft DirectX* 12” introduced the new resource binding methods in DirectX 12 and concluded that with all these choices, the challenge is to pick the most desirable binding mechanism for the target GPU, types of resources, and their frequency of update.
This article describes how to pick different resource binding mechanisms to run an application efficiently on specific Intel’s GPUs.
If you have started developing or porting games that use DX12, you have probably realized that it is, in many ways a very different beast when compared to DX11. Lots of responsibility is suddenly being placed on your shoulders.
This responsibility amounts to not only getting things functionally correct but also (amongst other things) mastering multi-threaded command list submission to really get the best performance out of the new API.
There isn’t yet a lot of written advice out there which makes it hard to avoid mistakes other people have already made and hard to benefit from the tricks other developers have mastered to drive DX12 efficiently.
We at NVIDIA have started writing down what we know works well and what doesn’t work well with DX12 and we’d like to share this with you in a living document that will change over time as we learn new things about DX12.
This whitepaper focuses on just the compute architecture components of Intel processor graphics gen9. For shorthand, in this paper we may use the term gen9 compute architecture to refer to just those compute components. The whitepaper also briefly discusses the gen9 derived instantiation of Intel HD Graphics 530 in the recently released Intel Core™ i7 processor 6700K for desktop form factors.
Qt 5.5 brings you more solutions for even faster development workflows, a more powerful UI creation offering for keeping pace with market demands in multimedia and 3D user experiences, preliminary support for upcoming Windows 10 development, and more for connectivity. Plus, of course, improvements across the whole Qt framework.
New Features in Qt 5.5
- Improvements and enhancements to all major Qt modules, on all platforms
- All Windows builds are now automatically dynamic regarding OpenGL vs. ANGLE backends, no need to manually configure deployment anymore!
- Qt Bluetooth includes full support for Bluetooth Low Energy and is supported on Linux, Embedded Linux, Android, iOS
- Qt Canvas 3D module fully supported
- Qt 3D 2.0 introduced as technology preview
- Qt Location technology preview lets you integrate maps, geocoding, routing and places into your application. The included mapping backends are: Nokia HERE, OpenStreetMap and MapBox
- Qt Multimedia has support for GStreamer 1.0 enhancing the multimedia capabilities on Linux based systems. Also integration of video/camera to Qt Quick graphics is now easy.
- Qt WebEngine has been updated to Chromium version 40 and the public APIs have been extended, for instance with full integration to Qt WebChannelQt WebEngine has been updated to Chromium version 40 and the public APIs have been extended, for instance with full integration to Qt WebChannel
- TreeView control included in Qt Quick Controls
- Former “Qt Quick Enterprise Controls”, including different industrial gauges, dials and other controls, are now migrated into Qt Quick Controls and are also available for Qt Open Source users
- Full and official support for Red Hat Enterprise Linux 6.6
- Qt Creator 3.4
- Deprecating Qt WebKit, Qt Script and Qt Quick 1 modules.
Today we released a new Hotfix driver 353.49 that addresses the following issue:
Sony Vegas Pro crashes
Windows 10 installation issue introduced with previous 353.45 driver
In addition, this driver also includes the same fixes which were part of our previous 353.38 hotfix driver release:
Delays when starting or switching apps & games with GSYNC enabled
In my previous post I introduced ARB_shader_storage_buffer, an OpenGL 4.3 feature that is coming soon to Mesa and the Intel i965 driver. While that post focused on explaining the features introduced by the extension, in this post I’ll dive into some of the implementation aspects, for those who are curious about this kind of stuff. Be warned that some parts of this post will be specific to Intel hardware.
Another interesting thing we had to deal with are address alignments. UBOs work with layout std140. In this setup, elements in the UBO definition are aligned to 16-byte boundaries (the size of a vec4). It turns out that GPUs can usually optimize reads and writes to multiples of 16 bytes, so this makes sense, however, as I explained in my previous post, SSBOs also introduce a packed layout mode known as std430.
Intel hardware provides a number of messages that we can use through the Data Port interface to write to memory. Each message has different characteristics that makes it more suitable for certain scenarios, like the pixel mask I discussed before. For example, some of these messages have the capacity to write data in chunks of 16-bytes (that is, they write vec4 elements, or OWORDS in the language of the technical docs). One could think that these messages are great when you work with vector data types, however, they also introduce the problem of dealing with partial writes: what happens when you only write to an element of a vector? or to a buffer variable that is smaller than the size of a vector? what if you write columns in a row_major matrix? etc
In these scenarios, using these messages introduces the need to mask the writes because you need to disable the channels in the vec4 element that you don’t want to write. Of course, the hardware provides means to do this, we only need to set the writemask of the destination register of the message instruction to select the right channels.
AMD (NASDAQ: AMD) today announced the new AMD FirePro™ S9170 server GPU, the world’s first and fastest 32GB single-GPU server card for DGEMM heavy double-precision workloads1, with support for OpenCL™ 2.0. Based on the second-generation AMD Graphics Core Next (GCN) GPU architecture, this new addition to the AMD FirePro™ server GPU family is capable of delivering up to 5.24 TFLOPS of peak single precision compute performance while enabling full throughput double precision performance, providing up to 2.62 TFLOPS of peak double precision performance.
Designed with compute-intensive workflows in mind, the AMD FirePro S9170 server GPU is ideal for data center managers who oversee clusters within academic or government bodies, oil and gas industries, or deep neural network compute cluster development.
“AMD is recognized as an HPC industry innovator as the graphics provider with the top spot on the November 2014 Green500 List. Today the best GPU for compute just got better with the introduction of the AMD FirePro S9170 server GPU to complement AMD’s impressive array of server graphics offerings for high performance compute environments,” said Sean Burke, corporate vice president and general manager, AMD Professional Graphics group. “The AMD FirePro S9170 server GPU can accelerate complex workloads in scientific computing, data analytics, or seismic processing, wielding an industry-leading 32GB of memory. We designed the new offering for supercomputers to achieve massive compute performance while maximizing available power budgets.”
“There are some HPC workloads which require as much data as possible to stay resident on the device, and so the 32GB of memory provided by AMD FirePro S9170, the largest available on a single GPU, will enable the acceleration of scientific calculations that were previously impossible,” said Simon McIntosh-Smith, head of the Microelectronics Research Group at the University of Bristol. “For example, our new OpenCL version of the SNAP transport code from Los Alamos National Laboratory needs to keep as much data resident on the device as possible, and so the 32GB of memory will let us run problems of a much more interesting size faster than ever before. The large memory, combined with the 320GB/s memory bandwidth and double precision floating point performance, will make the AMD FirePro S9170 server GPU a ‘killer’ solution device for many HPC applications.”
“We have been developing a fully-parallel computational tool based on the AMD GPU heterogeneous computing platform and OpenCL,” said Omid Mahahadi, co-founder and director, Geomechanica Inc. “This tool accurately captures the complex physics of massive mines plus oil and gas fields rapidly and reliably. Thanks to the impressive 32GB of memory of the new cards, we expect to run computations on massive data structures containing tens of millions of data elements. The combination of rapid double-precision operations with the large memory capacity enables accurate, detailed, and reliable computations. A similar performance using CPUs would likely require much higher capital and maintenance costs. Moving forward, we plan to take advantage of the recent features of the OpenCL 2.0 open API to further enhance the performance of our software.”
- Vulkan API is more low-level than OpenGL (programmer is responsible for memory and threads management for example), what triggered this decision?
A low level API has simpler drivers. This means reduced driver overhead – which results in higher performance for CPU limited applications – and fewer differences between multiple GPU vendors’ implementations. Also, another fundamental advantage of handing the application more control is that the driver has to do less ‘behind the scenes’ management – resulting in much more reliable and predictable performance which doesn’t hit unexpected road bumps as the driver undertakes complex housekeeping tasks.
- What concrete improvements will gamers see, when Vulkan is used by video game studios? Can they expect better performance and better graphics, or is it just about simplifying studios backend work?
For applications that are CPU limited, which happens on desktop, and even more on mobile, end users should notice better performing applications with less stuttering and halting.
- When will Vulkan first version be released?
Vulkan is still on schedule to have specs and implementations before the end of the year.