« on: June 09, 2010, 09:55:37 PM »
This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.
This technology preview is a snapshot of some internal research we have been working on and talking about at various conferences for the past couple years. The level of interest in GPU-accelerated AI has continued to grow, so we are making this (unsupported) snapshot available for developers who would like to experiment with the technology.
In Houdini 11, new Voronoi-based fracturing tools will make it easier to break up objects either before a simulation or automatically during a simulation...
Our particle fluids are now up to 70 times faster with the new FLIP (Fluid Implicit Particle) solver as compared to Houdini 10’s SPH solver, making it ideal for generating multiple iterations. In addition, this new solver is seamlessly integrated with existing particle operations [POPs] making the results highly directable. New buoyancy controls make it easier to float rigid objects and you can even smash up an object by combining these fluid tools with the new fracturing tools...
Hardware Rendering has also been enhanced with high quality OpenGL shading of lights and shadows as well as GPU-assisted volumes, unlimited lights and support for diffuse, specular, opacity, environment, bump and normal maps. Houdini’s Flipbook tools now support all these FX and can capture high dynamic range beauty passes.
In addition, we have improved the lighting interface for Houdini 11. We have new light types such as Global Illumination, Portal, Sky, Indirect and Geometry. The Geometry Lights let you turn any 3D object into a light emitting surface then use a surface shader to control the light emission. The geometry can also be animating or deforming for even cooler results...
With every major release of thinkingParticles new features are introduced, extending the power and flexibility of thinkingParticles by a magnitude, as compared to its predecessor. Release 4 represents a milestone in advancing the feature set.
In this class, we will introduce OpenCL™. We start with an overview of GPU compute since the desire to take advantage of modern GPU computational power in general applications was a main motivator in the development of OpenCL™. The discussion includes some of the early APIs developed to harness the increasing programmable computational power available in modern graphics processors.
We then introduce the anatomy and programming model of OpenCL™ and take you through some of the highlights of installing the ATI Stream SDK v2 which includes support for OpenCL™ 1.0 on x86 CPUs and AMD GPUs. Then, the practical portions of the OpenCL™ runtime and kernel specifications are discussed in detail.
At the end, we discussion optimization tips to help you avoid common pitfalls when coding your applications in OpenCL™. For students who may have existing code written for the proprietary interface, CUDA, we discuss the easy steps involved in porting that code to OpenCL™.
Download the free 3D Laboratory, which allows interation by using the WiiMote and also stereoscopic visualization (cinema 3D-like), creation of 3D models from mathematical equations and much more.
CAPS, a software company that focuses on manycore development, has announced an OpenCL code generator within the just-released 2.3 version of its HMPP directive-based hybrid compiler.
The CUDA back-end generator has been enhanced with Fermi capabilities and this release brings support for more native compilers with Intel ifort/icc, GNU gcc/gfortran and PGI pgcc/pgfort compilers, enabling developers to freely use their favorite compiler with HMPP 2.3.
Based on GPU programming and tuning directives, HMPP offers an incremental programming model that allows developers with different levels of expertise to fully exploit GPU hardware accelerators in their legacy code.
The OpenCL back-end expands the portfolio of targets supported by HMPP to the AMD ATI GPUs. The OpenCL version of HMPP fully supports AMD and NVIDIA GPU compute processors, bringing to users a wider set of hybrid platforms they can execute their applications on. Recently released, the NVIDIA Tesla 200-series GPUs based on the "Fermi" codename CUDA architecture is also supported by HMPP 2.3.
In this work, we evaluate performance of a real-world image processing application that uses a cross-correlation algorithm to compare a given image with a reference one. The algorithm processes individual images represented as 2-dimensional matrices of single-precision floating-point values using O(n4) operations involving dot-products and additions. We implement this algorithm on a nVidia GTX 285 GPU using CUDA, and also parallelize it for the Intel Xeon (Nehalem) and IBM Power7 processors, using both manual and automatic techniques. Pthreads and OpenMP with SSE and VSX vector intrinsics are used for the manually parallelized version, while a state-of-the-art optimization framework based on the polyhedral model is used for automatic compiler parallelization and optimization. The performance of this algorithm on the nVidia GPU suffers from: (1) a smaller shared memory, (2) unaligned device memory access patterns, (3) expensive atomic operations, and (4) weaker single-thread performance. On commodity multi-core processors, the application dataset is small enough to fit in caches, and when parallelized using a combination of task and short-vector data parallelism (via SSE/VSX) or through fully automatic optimization from the compiler, the application matches or beats the performance of the GPU version. The primary reasons for better multi-core performance include larger and faster caches, higher clock frequency, higher on-chip memory bandwidth, and better compiler optimization and support for parallelization. The best performing versions on the Power7, Nehalem, and GTX 285 run in 1.02s, 1.82s, and 1.75s, respectively. These results conclusively demonstrate that, under certain conditions, it is possible for a FLOP-intensive structured application running on a multi-core processor to match or even beat the performance of an equivalent GPU version.
[Changelog for Afterburner 1.6.0 Beta 6]
- Added voltage control for one more MSI N240GT Low Profile graphics card version
- Added initial NVIDIA GeForce GTX 465 series graphics cards support
- Changed marketing name for MSI R5670-PD512 in hardware database
- Core clock is now used as primary overclocking domain instead of shader clock on GeForce GTX 400
- Unlocked memory downclocking ability on GeForce GTX 400 series
- Default core voltages for NVIDIA GeForce GTX 400 series cards are no longer hardcoded into the database. Now MSI Afterburner uses new NVIDIA driver API to read variable default fused voltages on GeForce GTX 400 series graphics cards. Please take a note that new API reqires updated release 256 drivers, which are not available to public yet. MSI Afterburner will not affect core voltage at all when restoring defaults under curretnly available drivers
- Upper limit for core voltage slider on NVIDIA GeForce GTX 400 series graphics cards has been upped to 1213mV. Please take a note that regular GTX 400 cards will not allow you to go beyond the reference voltage limits (1087mV for GTX 470 and 1138mV for GTX 480) until unlocked BIOS is flashed
- Fixed <Link> button state saving/restoring issue on old NVIDIA cards caused
by introducing Fermi family support
- MSI On-Screen Display server has been upgraded to version 3.7.1. New server improves On-Screen Display 3D rendering mode compatibility with Source engine based games and Star Trek Online and contains updated profiles list
[Changelog for Kombustor 1.08]
- New: added the GPU voltage in the GPU monitoring zone.
- Change: now GPUs indexing follows the same indexing scheme than Afterburner.
- Change: added name of the graphics card in temperature graphs.
- Change: removed the auto-start of Afterburner.
- Bugfix: the benchmarking params group was still grayed.
- Minor bugfixes and changes.
This installment returns to the topic of mixing OpenGL and CUDA C within the same application first introduced in Part 15 of this series. Part 15 demonstrated how to create 2D images with CUDA C on a pixel-by-pixel basis and display them with OpenGL through the use of PBOs (Pixel Buffer Objects).
This article will complete that discussion by demonstrating how to use VBO (Vertex Buffer Objects) to create 3D images with CUDA C and render them using OpenGL as 3D collections of points, wire frame images, and surfaces