« on: June 11, 2010, 09:48:02 AM »
This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.
With the June 2010 DirectX SDK, one of our work items was to try out the various DirectX 11 samples against the NVIDIA DirectX 11 graphics parts (NVIDIA GeForce GTX 470/480) now that they are available. For the August 2009 and February 2010 releases, we only had the AMD/ATI DirectX 11 graphics cards available (ATI Radeon HD 5000 Series). Video cards have traditionally competed on a mix of features, performance, and price. These days they are increasingly also competing on power consumption--while this has always been true in the mobile & laptop space, it is becoming increasingly important even in desktops.
There has been a lot of focus in Direct3D 10, 10.1, and 11 to try to minimize the 'feature fragmentation' problem in the Direct3D API (best demonstrated by the "sea of caps" in the Direct3D 9 Card Capabilities spreadsheet we ship in the DirectX SDK) to help simplify the programmer's job trying to efficiently use these APIs. This effort really started with Direct3D 9 Shader Model 3.0 trying to tighten down the specificiation a bit more. This is also a lot of what the Feature Level concept introduced in Direct3D 10.1 and the '10level9' feature levels of DirectX 11 is trying to address in a more manageable way. Performance differences can still vary a great deal between vendors and will vary a lot even between the same vendor's cards at different price-points, but we hope it at least helps constrain the degrees of freedom the programmer has to concern themselves with.
Our work with the NVIDIA hardware for this release has provided insight into some areas that programmers need to pay attention to with respect to different vendor's cards. The biggest difference I noticed was that number of MSAA quality levels exposed by AMD vs. NVIDIA. This information is obtained via the CheckMultisampleQualityLevels method in Direct3D 10.x and 11. The ATI Radeon HD 5000 Series only provides one quality level per sample count, while the NVIDIA GeForce GTX 470/480 exposes a number of fine-grain quality levels per sample count. This highlighted a few UI bugs in some of the samples as well as DXUT/DXUT11 that were corrected in the June 2010 release. Be sure to test the behavior of any MSAA settings and quality levels in your DX10.x and DX11 programs on both vendor's hardware. Another area to pay close attention to is DirectCompute synchronization and timing behavior. DirectCompute as a low-level exposure of the GPU behavior is more subject to architectural differences, so be sure to test any use of DirectCompute on hardware from multiple vendors.
Basically an A-buffer is a simple list of fragments per pixel. Previous methods to implement it on DX10 generation hardware required multiple passes to capture an interesting number of fragments per pixel. They where essentially based on depth-peeling, with enhancements allowing to capture more than one layer per geometric pass, like the k-buffer and stencil routed k-buffer that suffers from read-modify-write hazards. Bucket sort depth peeling allows to capture up to 32 fragments per geometry pass but with only 32 bits per fragment (just a depth) and at the cost of potential collisions.
All these techniques were complex and basically limited by the maximum of 8 render targets that were writable by the fragment shader.
This technology preview is a snapshot of some internal research we have been working on and talking about at various conferences for the past couple years. The level of interest in GPU-accelerated AI has continued to grow, so we are making this (unsupported) snapshot available for developers who would like to experiment with the technology.
In Houdini 11, new Voronoi-based fracturing tools will make it easier to break up objects either before a simulation or automatically during a simulation...
Our particle fluids are now up to 70 times faster with the new FLIP (Fluid Implicit Particle) solver as compared to Houdini 10’s SPH solver, making it ideal for generating multiple iterations. In addition, this new solver is seamlessly integrated with existing particle operations [POPs] making the results highly directable. New buoyancy controls make it easier to float rigid objects and you can even smash up an object by combining these fluid tools with the new fracturing tools...
Hardware Rendering has also been enhanced with high quality OpenGL shading of lights and shadows as well as GPU-assisted volumes, unlimited lights and support for diffuse, specular, opacity, environment, bump and normal maps. Houdini’s Flipbook tools now support all these FX and can capture high dynamic range beauty passes.
In addition, we have improved the lighting interface for Houdini 11. We have new light types such as Global Illumination, Portal, Sky, Indirect and Geometry. The Geometry Lights let you turn any 3D object into a light emitting surface then use a surface shader to control the light emission. The geometry can also be animating or deforming for even cooler results...
With every major release of thinkingParticles new features are introduced, extending the power and flexibility of thinkingParticles by a magnitude, as compared to its predecessor. Release 4 represents a milestone in advancing the feature set.
In this class, we will introduce OpenCL™. We start with an overview of GPU compute since the desire to take advantage of modern GPU computational power in general applications was a main motivator in the development of OpenCL™. The discussion includes some of the early APIs developed to harness the increasing programmable computational power available in modern graphics processors.
We then introduce the anatomy and programming model of OpenCL™ and take you through some of the highlights of installing the ATI Stream SDK v2 which includes support for OpenCL™ 1.0 on x86 CPUs and AMD GPUs. Then, the practical portions of the OpenCL™ runtime and kernel specifications are discussed in detail.
At the end, we discussion optimization tips to help you avoid common pitfalls when coding your applications in OpenCL™. For students who may have existing code written for the proprietary interface, CUDA, we discuss the easy steps involved in porting that code to OpenCL™.
Download the free 3D Laboratory, which allows interation by using the WiiMote and also stereoscopic visualization (cinema 3D-like), creation of 3D models from mathematical equations and much more.
CAPS, a software company that focuses on manycore development, has announced an OpenCL code generator within the just-released 2.3 version of its HMPP directive-based hybrid compiler.
The CUDA back-end generator has been enhanced with Fermi capabilities and this release brings support for more native compilers with Intel ifort/icc, GNU gcc/gfortran and PGI pgcc/pgfort compilers, enabling developers to freely use their favorite compiler with HMPP 2.3.
Based on GPU programming and tuning directives, HMPP offers an incremental programming model that allows developers with different levels of expertise to fully exploit GPU hardware accelerators in their legacy code.
The OpenCL back-end expands the portfolio of targets supported by HMPP to the AMD ATI GPUs. The OpenCL version of HMPP fully supports AMD and NVIDIA GPU compute processors, bringing to users a wider set of hybrid platforms they can execute their applications on. Recently released, the NVIDIA Tesla 200-series GPUs based on the "Fermi" codename CUDA architecture is also supported by HMPP 2.3.
In this work, we evaluate performance of a real-world image processing application that uses a cross-correlation algorithm to compare a given image with a reference one. The algorithm processes individual images represented as 2-dimensional matrices of single-precision floating-point values using O(n4) operations involving dot-products and additions. We implement this algorithm on a nVidia GTX 285 GPU using CUDA, and also parallelize it for the Intel Xeon (Nehalem) and IBM Power7 processors, using both manual and automatic techniques. Pthreads and OpenMP with SSE and VSX vector intrinsics are used for the manually parallelized version, while a state-of-the-art optimization framework based on the polyhedral model is used for automatic compiler parallelization and optimization. The performance of this algorithm on the nVidia GPU suffers from: (1) a smaller shared memory, (2) unaligned device memory access patterns, (3) expensive atomic operations, and (4) weaker single-thread performance. On commodity multi-core processors, the application dataset is small enough to fit in caches, and when parallelized using a combination of task and short-vector data parallelism (via SSE/VSX) or through fully automatic optimization from the compiler, the application matches or beats the performance of the GPU version. The primary reasons for better multi-core performance include larger and faster caches, higher clock frequency, higher on-chip memory bandwidth, and better compiler optimization and support for parallelization. The best performing versions on the Power7, Nehalem, and GTX 285 run in 1.02s, 1.82s, and 1.75s, respectively. These results conclusively demonstrate that, under certain conditions, it is possible for a FLOP-intensive structured application running on a multi-core processor to match or even beat the performance of an equivalent GPU version.
[Changelog for Afterburner 1.6.0 Beta 6]
- Added voltage control for one more MSI N240GT Low Profile graphics card version
- Added initial NVIDIA GeForce GTX 465 series graphics cards support
- Changed marketing name for MSI R5670-PD512 in hardware database
- Core clock is now used as primary overclocking domain instead of shader clock on GeForce GTX 400
- Unlocked memory downclocking ability on GeForce GTX 400 series
- Default core voltages for NVIDIA GeForce GTX 400 series cards are no longer hardcoded into the database. Now MSI Afterburner uses new NVIDIA driver API to read variable default fused voltages on GeForce GTX 400 series graphics cards. Please take a note that new API reqires updated release 256 drivers, which are not available to public yet. MSI Afterburner will not affect core voltage at all when restoring defaults under curretnly available drivers
- Upper limit for core voltage slider on NVIDIA GeForce GTX 400 series graphics cards has been upped to 1213mV. Please take a note that regular GTX 400 cards will not allow you to go beyond the reference voltage limits (1087mV for GTX 470 and 1138mV for GTX 480) until unlocked BIOS is flashed
- Fixed <Link> button state saving/restoring issue on old NVIDIA cards caused
by introducing Fermi family support
- MSI On-Screen Display server has been upgraded to version 3.7.1. New server improves On-Screen Display 3D rendering mode compatibility with Source engine based games and Star Trek Online and contains updated profiles list
[Changelog for Kombustor 1.08]
- New: added the GPU voltage in the GPU monitoring zone.
- Change: now GPUs indexing follows the same indexing scheme than Afterburner.
- Change: added name of the graphics card in temperature graphs.
- Change: removed the auto-start of Afterburner.
- Bugfix: the benchmarking params group was still grayed.
- Minor bugfixes and changes.