Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - Stefan

Pages: 1 ... 106 107 [108] 109 110 ... 145
2141
Quote
The unveiling of OpenCL on the ZMS processors highlights how the performance and flexibility of the underlying StemCell Computing architecture can now be leveraged by developers using an industry standard API to bring new levels of performance to applications targeting low-power platforms.

 OpenCL (Open Computing Language) is the first open, royalty-free standard for general-purpose parallel programming of heterogeneous systems. OpenCL provides a uniform programming environment for software developers to write efficient, portable code for parallel processors such as the ZiiLABS ZMS processors.

"OpenCL enables developers to unlock the full potential of the underlying StemCell Computing architecture to deliver new levels of performance across a broad range of applications.” said Tim Lewis, director of marketing of ZiiLABS. "The OpenCL based ray-tracing and video filter demos we provide a glimpse of the floating-point performance and flexibility that developers can exploit on ZMS-based platforms and products." Read the full Press Release
OpenCL Early Access Program

ZiiLABS is currently inviting developers with innovative ideas to use OpenCL for consumer class handheld and connected platforms to join an OpenCL Early Access Program that will provide selected partners an early release of the ZiiLABS OpenCL SDK* for ZMS processors.

2142
Quote
If you have a powerful OpenCL GPU available in a system, it is a good idea to understand its capabilities. You can do so by running algorithms in their CPU and GPU versions, and comparing the results. You might be surprised with the processing power that most modern software is wasting when it runs on a computer with an OpenCL GPU.

Full story at DDJ

2143
Fixed a problem with SLI SFR, AFR, and SLIAA modes with GeForce GTX 480 and GeForce GTX 470 and high-resolution display modes.

FreeBSD x86
Linux AMD 64 bit
Linux IA 32 bit
Solaris x86/x64

2144
3D-Tech News Around The Web / DigiTecK3D UDK Skin Shader
« on: June 11, 2010, 03:14:14 AM »
DT3D UDK skin shader is a real-time multi-layered skin model implementation in the Unreal 3.x Engine. The shader is based on research from “Efficient Rendering of Human Skin”, implementing most of the effects described in the paper including the multi-layered diffusion scattering in skin. Along with the multi-layered subsurface scattering the shader implements the physically based specular term to better capture the specularity of skin. This is a great shader for the artist that wants to push there characters to that next level. Also can be quite usefull for those artist using the UDK for pre-vis work.

Download here



[via]

2145
3D-Tech News Around The Web / DirectX 11 Hardware Vendor Differences
« on: June 11, 2010, 02:28:59 AM »
Quote
With the June 2010 DirectX SDK, one of our work items was to try out the various DirectX 11 samples against the NVIDIA DirectX 11 graphics parts (NVIDIA GeForce GTX 470/480) now that they are available. For the August 2009 and February 2010 releases, we only had the AMD/ATI DirectX 11 graphics cards available (ATI Radeon HD 5000 Series). Video cards have traditionally competed on a mix of features, performance, and price. These days they are increasingly also competing on power consumption--while this has always been true in the mobile & laptop space, it is becoming increasingly important even in desktops.

There has been a lot of focus in Direct3D 10, 10.1, and 11 to try to minimize the 'feature fragmentation' problem in the Direct3D API (best demonstrated by the "sea of caps" in the Direct3D 9 Card Capabilities spreadsheet we ship in the DirectX SDK) to help simplify the programmer's job trying to efficiently use these APIs. This effort really started with Direct3D 9 Shader Model 3.0 trying to tighten down the specificiation a bit more. This is also a lot of what the Feature Level concept introduced in Direct3D 10.1 and the '10level9' feature levels of DirectX 11 is trying to address in a more manageable way. Performance differences can still vary a great deal between vendors and will vary a lot even between the same vendor's cards at different price-points, but we hope it at least helps constrain the degrees of freedom the programmer has to concern themselves with.

Our work with the NVIDIA hardware for this release has provided insight into some areas that programmers need to pay attention to with respect to different vendor's cards. The biggest difference I noticed was that number of MSAA quality levels exposed by AMD vs. NVIDIA. This information is obtained via the CheckMultisampleQualityLevels method in Direct3D 10.x and 11. The ATI Radeon HD 5000 Series only provides one quality level per sample count, while the NVIDIA GeForce GTX 470/480 exposes a number of fine-grain quality levels per sample count. This highlighted a few UI bugs in some of the samples as well as DXUT/DXUT11 that were corrected in the June 2010 release. Be sure to test the behavior of any MSAA settings and quality levels in your DX10.x and DX11 programs on both vendor's hardware. Another area to pay close attention to is DirectCompute synchronization and timing behavior. DirectCompute as a low-level exposure of the GPU behavior is more subject to architectural differences, so  be sure to test any use of DirectCompute on hardware from multiple vendors.

Source: MSDN

2146
Review at AnandTech

Download 32 bit Vista/7 driver
Download 64 bit Vista/7 driver

[via]

Thanks to UDA i can watch 1080p videos now almost in quadruple speed with VLC and a 8800GTX  ;D


2147
3D-Tech News Around The Web / HWiNFO for DOS is now freeware
« on: June 10, 2010, 05:42:36 PM »
Changes in HWiNFO v5.5.0 - Released on: Jun-08-2010:

    * HWiNFO is now FREEWARE !
    * Fixed Poulsbo SMBus access.
    * Added VIA VN1000 chipset support.
    * Added Patsburg PCH support.
    * Added nVidia D12U, T20, Tesla C2050/C2070/M2050/M2070/S2050/S2070, Quadro Q11U-3, MCP83 and some other models.
    * Improved support of several mature graphics adapters.

2148
Quote
A-Buffer:
Basically an A-buffer is a simple list of fragments per pixel. Previous methods to implement it on DX10 generation hardware required multiple passes to capture an interesting number of fragments per pixel. They where essentially based on depth-peeling, with enhancements allowing to capture more than one layer per geometric pass, like the k-buffer and stencil routed k-buffer that suffers from read-modify-write hazards. Bucket sort depth peeling allows to capture up to 32 fragments per geometry pass but with only 32 bits per fragment (just a depth) and at the cost of potential collisions.
All these techniques were complex and basically limited by the maximum of 8 render targets that were writable by the fragment shader.

Full story at Icare3D

2149
Download here [via]

new extension exposed:
GL_AMD_transform_feedback3_lines_triangles

new profiles:
\Live\GameClient.exe
Risen.exe




2150
Quote
This technology preview is a snapshot of some internal research we have been working on and talking about at various conferences for the past couple years. The level of interest in GPU-accelerated AI has continued to grow, so we are making this (unsupported) snapshot available for developers who would like to experiment with the technology.

2151
3D-Tech News Around The Web / Houdini 11 sneak peek
« on: June 06, 2010, 04:27:14 AM »
Quote
In Houdini 11, new Voronoi-based fracturing tools will make it easier to break up objects either before a simulation or automatically during a simulation...
Our particle fluids are now up to 70 times faster with the new FLIP (Fluid Implicit Particle) solver as compared to Houdini 10’s SPH solver, making it ideal for generating multiple iterations. In addition, this new solver is seamlessly integrated with existing particle operations [POPs] making the results highly directable. New buoyancy controls make it easier to float rigid objects and you can even smash up an object by combining these fluid tools with the new fracturing tools...

Hardware Rendering has also been enhanced with high quality OpenGL shading of lights and shadows as well as GPU-assisted volumes, unlimited lights and support for diffuse, specular, opacity, environment, bump and normal maps. Houdini’s Flipbook tools now support all these FX and can capture high dynamic range beauty passes.
In addition, we have improved the lighting interface for Houdini 11. We have new light types such as Global Illumination, Portal, Sky, Indirect and Geometry. The Geometry Lights let you turn any 3D object into a light emitting surface then use a surface shader to control the light emission. The geometry can also be animating or deforming for even cooler results...

Check out the video

2152
3D-Tech News Around The Web / ATI GPUPerfAPI 2.3 available
« on: June 04, 2010, 10:38:55 PM »
GPUPerfAPI is AMD's library for accessing GPU performance counters on ATI Radeon graphics cards. It is used by GPU PerfStudio 2 and the ATI Stream Profiler and is now available to third party developers who wish to incorporate it within their own applications. GPUPerfAPI supports DirectX10, DirectX11, OpenGL, and OpenCL applications.
Features

Version 2.3 (6/4/10)

    * Supports DirectX10, DirectX11, OpenGL on ATI Radeon 2000 series and newer.
    * Supports OpenCL on ATI Radeon 4000 series and newer.
    * Provides derived counters based on raw HW performance counters.
    * Manages memory automatically - no allocations required.
    * Requires ATI Catalyst driver 10.1 or later.

Documentation
GPUPerfAPI User Guide (pdf)

Download
GPUPerfAPI v2.3 (1.8MB)

2153
Quote
With every major release of thinkingParticles new features are introduced, extending the power and flexibility of thinkingParticles by a magnitude, as compared to its predecessor. Release 4 represents a milestone in advancing the feature set.

Full news here

2154
3D-Tech News Around The Web / ComputeMark 2 finally released
« on: June 04, 2010, 06:09:41 PM »
Today the ComputeMark team has released new version of ComputeMark (DirectX 11 Compute Shade benchmark) with tons of new features - new demos, presets, website and more...

Link 1: Download ComputeMark v2.0
Link 2: Download ComputeMark v2.0

Gallery at VR-Zone

2155
Thermalright demonstrates their upcoming fan at Computex.
PCGH noticed Furmark was running for more than 6 hours.

2156
Quote
In this class, we will introduce OpenCL™. We start with an overview of GPU compute since the desire to take advantage of modern GPU computational power in general applications was a main motivator in the development of OpenCL™. The discussion includes some of the early APIs developed to harness the increasing programmable computational power available in modern graphics processors.

We then introduce the anatomy and programming model of OpenCL™ and take you through some of the highlights of installing the ATI Stream SDK v2 which includes support for OpenCL™ 1.0 on x86 CPUs and AMD GPUs. Then, the practical portions of the OpenCL™ runtime and kernel specifications are discussed in detail.

At the end, we discussion optimization tips to help you avoid common pitfalls when coding your applications in OpenCL™. For students who may have existing code written for the proprietary interface, CUDA, we discuss the easy steps involved in porting that code to OpenCL™.

2157
3D-Tech News Around The Web / OpenCL 3D Laboratory & tutorials
« on: June 02, 2010, 07:48:20 PM »
Quote
Download the free 3D Laboratory, which allows interation by using the WiiMote and also stereoscopic visualization (cinema 3D-like), creation of 3D models from mathematical equations and much more.

2158
Quote
NVIDIA organizes each year with key customers and partners around a meeting of its professional products.

Full story at 3d-test in french - english translation

2159
3D-Tech News Around The Web / OpenCL Code Generator Announced
« on: June 02, 2010, 04:10:07 PM »
Quote
CAPS, a software company that focuses on manycore development, has announced an OpenCL code generator within the just-released 2.3 version of its HMPP directive-based hybrid compiler.

The CUDA back-end generator has been enhanced with Fermi capabilities and this release brings support for more native compilers with Intel ifort/icc, GNU gcc/gfortran and PGI pgcc/pgfort compilers, enabling developers to freely use their favorite compiler with HMPP 2.3.

Based on GPU programming and tuning directives, HMPP offers an incremental programming model that allows developers with different levels of expertise to fully exploit GPU hardware accelerators in their legacy code.

The OpenCL back-end expands the portfolio of targets supported by HMPP to the AMD ATI GPUs. The OpenCL version of HMPP fully supports AMD and NVIDIA GPU compute processors, bringing to users a wider set of hybrid platforms they can execute their applications on. Recently released, the NVIDIA Tesla 200-series GPUs based on the "Fermi" codename CUDA architecture is also supported by HMPP 2.3.

Source DDJ

2160
Quote
In this work, we evaluate performance of a real-world image processing application that uses a cross-correlation algorithm to compare a given image with a reference one. The algorithm processes individual images represented as 2-dimensional matrices of single-precision floating-point values using O(n4) operations involving dot-products and additions. We implement this algorithm on a nVidia GTX 285 GPU using CUDA, and also parallelize it for the Intel Xeon (Nehalem) and IBM Power7 processors, using both manual and automatic techniques. Pthreads and OpenMP with SSE and VSX vector intrinsics are used for the manually parallelized version, while a state-of-the-art optimization framework based on the polyhedral model is used for automatic compiler parallelization and optimization. The performance of this algorithm on the nVidia GPU suffers from: (1) a smaller shared memory, (2) unaligned device memory access patterns, (3) expensive atomic operations, and (4) weaker single-thread performance. On commodity multi-core processors, the application dataset is small enough to fit in caches, and when parallelized using a combination of task and short-vector data parallelism (via SSE/VSX) or through fully automatic optimization from the compiler, the application matches or beats the performance of the GPU version. The primary reasons for better multi-core performance include larger and faster caches, higher clock frequency, higher on-chip memory bandwidth, and better compiler optimization and support for parallelization. The best performing versions on the Power7, Nehalem, and GTX 285 run in 1.02s, 1.82s, and 1.75s, respectively. These results conclusively demonstrate that, under certain conditions, it is possible for a FLOP-intensive structured application running on a multi-core processor to match or even beat the performance of an equivalent GPU version.

Download whitepaper from IBM

[via]

Pages: 1 ... 106 107 [108] 109 110 ... 145