OpenCL Code Generator Announced
« on: June 02, 2010, 04:10:07 PM »
CAPS, a software company that focuses on manycore development, has announced an OpenCL code generator within the just-released 2.3 version of its HMPP directive-based hybrid compiler.

The CUDA back-end generator has been enhanced with Fermi capabilities and this release brings support for more native compilers with Intel ifort/icc, GNU gcc/gfortran and PGI pgcc/pgfort compilers, enabling developers to freely use their favorite compiler with HMPP 2.3.

Based on GPU programming and tuning directives, HMPP offers an incremental programming model that allows developers with different levels of expertise to fully exploit GPU hardware accelerators in their legacy code.

The OpenCL back-end expands the portfolio of targets supported by HMPP to the AMD ATI GPUs. The OpenCL version of HMPP fully supports AMD and NVIDIA GPU compute processors, bringing to users a wider set of hybrid platforms they can execute their applications on. Recently released, the NVIDIA Tesla 200-series GPUs based on the "Fermi" codename CUDA architecture is also supported by HMPP 2.3.

Source DDJ

In this work, we evaluate performance of a real-world image processing application that uses a cross-correlation algorithm to compare a given image with a reference one. The algorithm processes individual images represented as 2-dimensional matrices of single-precision floating-point values using O(n4) operations involving dot-products and additions. We implement this algorithm on a nVidia GTX 285 GPU using CUDA, and also parallelize it for the Intel Xeon (Nehalem) and IBM Power7 processors, using both manual and automatic techniques. Pthreads and OpenMP with SSE and VSX vector intrinsics are used for the manually parallelized version, while a state-of-the-art optimization framework based on the polyhedral model is used for automatic compiler parallelization and optimization. The performance of this algorithm on the nVidia GPU suffers from: (1) a smaller shared memory, (2) unaligned device memory access patterns, (3) expensive atomic operations, and (4) weaker single-thread performance. On commodity multi-core processors, the application dataset is small enough to fit in caches, and when parallelized using a combination of task and short-vector data parallelism (via SSE/VSX) or through fully automatic optimization from the compiler, the application matches or beats the performance of the GPU version. The primary reasons for better multi-core performance include larger and faster caches, higher clock frequency, higher on-chip memory bandwidth, and better compiler optimization and support for parallelization. The best performing versions on the Power7, Nehalem, and GTX 285 run in 1.02s, 1.82s, and 1.75s, respectively. These results conclusively demonstrate that, under certain conditions, it is possible for a FLOP-intensive structured application running on a multi-core processor to match or even beat the performance of an equivalent GPU version.

Download whitepaper from IBM


Afterburner 1.6.0 Beta 6
« on: May 31, 2010, 03:57:13 PM »
Download at Guru3D

[Changelog for Afterburner 1.6.0 Beta 6]
- Added voltage control for one more MSI N240GT Low Profile graphics card version
- Added initial NVIDIA GeForce GTX 465 series graphics cards support
- Changed marketing name for MSI R5670-PD512 in hardware database
- Core clock is now used as primary overclocking domain instead of shader clock on GeForce GTX 400
- Unlocked memory downclocking ability on GeForce GTX 400 series
- Default core voltages for NVIDIA GeForce GTX 400 series cards are no longer hardcoded into the database. Now MSI Afterburner uses new NVIDIA driver API to read variable default fused voltages on GeForce GTX 400 series graphics cards. Please take a note that new API reqires updated release 256 drivers, which are not available to public yet. MSI Afterburner will not affect core voltage at all when restoring defaults under curretnly available drivers
- Upper limit for core voltage slider on NVIDIA GeForce GTX 400 series graphics cards has been upped to 1213mV. Please take a note that regular GTX 400 cards will not allow you to go beyond the reference voltage limits (1087mV for GTX 470 and 1138mV for GTX 480) until unlocked BIOS is flashed
- Fixed <Link> button state saving/restoring issue on old NVIDIA cards caused
by introducing Fermi family support
- MSI On-Screen Display server has been upgraded to version 3.7.1. New server improves On-Screen Display 3D rendering mode compatibility with Source engine based games and Star Trek Online and contains updated profiles list

[Changelog for Kombustor 1.08]
- New: added the GPU voltage in the GPU monitoring zone.
- Change: now GPUs indexing follows the same indexing scheme than Afterburner.
- Change: added name of the graphics card in temperature graphs.
- Change: removed the auto-start of Afterburner.
- Bugfix: the benchmarking params group was still grayed.
- Minor bugfixes and changes.

Intel - Advanced Rendering Techniques
« on: May 29, 2010, 10:09:44 AM »
This article goes further into the details of the varying procedures associated with rendering in Autodesk* Softimage*. Rendering is as nuanced a task as lighting is and incorporates many of the principles covered in previous articles. You don't want to spend hours of effort creating detailed models and precise animation only to create substandard imagery in the final output of your scene, and there are many interrelated parameters and options to consider in this phase of scene creation. As with lighting, rendering is often one of the least-understood aspects of 3D imaging, but it's critical to achieving high-quality animations.

The ATI Catalyst 10.5a Hotfix provides resolution for the following issue:

    * Battlefield: Bad Company 2 - Long load times for new maps when using a Windows® 7 / Windows Vista® based system with an: 
          o ATI Radeon HD 4xxx series graphics card
          o ATI Radeon HD 3xxx series graphics card
          o ATI Radeon HD 2xxx series graphics card


Unigine's old demos/benchmarks contain now also D3D11 renderpath.

Changes in Tropics version 1.3

    * Added stereo 3D support in several modes:
          o Anaglyph
          o Separate images
          o 3D Vision
          o iZ3D
    * Several performance optimizations

Changes in Sanctuary version 2.3

    * Added support of 3D Vision
    * Several performance optimizations

Sculptris - free 3D modeler
« on: May 28, 2010, 12:19:12 PM »
Sculptris wants you to make 3D models. Download it and have a go! I'm sure you will enjoy.

Watch the trailer!

This installment returns to the topic of mixing OpenGL and CUDA C within the same application first introduced in Part 15  of this series. Part 15 demonstrated how to create 2D images with CUDA C on a pixel-by-pixel basis and display them with OpenGL through the use of PBOs (Pixel Buffer Objects).
This article will complete that discussion by demonstrating how to use VBO (Vertex Buffer Objects) to create 3D images with CUDA C and render them using OpenGL as 3D collections of points, wire frame images, and surfaces

Full story at Dr. Dobb's Journal

NVIDIA Geforce SLI profile tool
« on: May 28, 2010, 11:58:19 AM »
After updating my display driver, I no longer see NvApps.xml. Did something change?

Yes. In Release 256 and later display drivers, NVIDIA has made some significant enhancements to the infrastructure of the 3D settings in the NVIDIA Control Panel. These infrastructure changes will make the 3D settings and profiles faster and more robust and provide 3rd party developers with full access and control through a new API. The new infrastructure no longer uses XML to store some of the settings like SLI profiles. Instead, all 3D settings and profiles are fully integrated into the new API with support for versioning, Unicode executable names, and improved access performance. This now includes not just control over DirectX settings, but also OpenGL and CUDA settings. Instead of editing NvApps.xml, we have created a simple tool that enables SLI customers to export their SLI profiles to a text file, edit them, and then import them back into the driver.

Download here

NVIDIA Geforce SSAA tool
« on: May 28, 2010, 11:54:33 AM »
In the launch drivers for GeForce GTX 400 series GPUs, there was a bug in the Transparency Antialiasing implementation that enabled full-screen supersampling. Is there any way to still get full-screen supersampling in Release 256?
     Yes. Release 256 drivers do fix the implementation of Transparency Antialiasing (TRAA) and now offer up to 25% performance improvements with TRAA enabled. However, since some of our gaming really enthusiasts liked the full-screen supersampling, we have created a tool for users that allows them to enable 2x, 4x, and 8x full-screen supersampling.

Download here

NVIDIA 197.90 Quadro WHQL available
« on: May 28, 2010, 10:53:31 AM »
Changes in Version 197.90
The following sections list the important changes and the most common issues resolved 
since driver version 197.59. 
Windows Vista/Windows 7 32-bit Issues
Single GPU Issues
 Graphisoft ArchiCAD 14–ACE profile is needed for this application. 
 AutoCAD–with Windows Aero turned on, the system freezes after opening or closing 
the AutoCAD file dialog.
Multi-GPU Issues
 [SLI Mosaic], Quadro Plex D2: Quadro Plex units connected to the DuHIC cannot be 
Windows Vista/Windows 7 64-bit Issues
Single GPU Issues
 Quadro FX 3700: DirectModel toolkit–stuttering occurs when rendering display lists.  
Multi-GPU Issues
 [SLI], Quadro Plex D2 via DuHIC, G‐Sync: When running a frame lock demo and then 
pausing it, the frame counter jumps to a random value

Fixed Issues–Windows XP 32-bit
 Quadro FX 5500/4600/1700: Blue‐screen crash occurs when the “/3GB” switch is added 
to the Windows boot.ini file.

AMD blog here
Screenshots here
Video here
Direct download here

SM5 GPU is mandatory   :P

The RTM release of the Windows SDK for Windows 7 and .NET Framework 4 is now available for download in either ISO or Web Setup format.  Here are a few key features about this Software Development Kit (SDK):

·         Smaller/Faster: at less than 600MB, this SDK is less than half the size of the previous SDK, producing a faster install with a smaller on disk footprint

·         Cleaner Setup: setup screens have been grouped into native, managed, and common buckets to help you more easily choose the components you need

·         VC++ 2010 Compilers (32 & 64-bit): use the new C++ compilers and CRT that also ship in Visual Studio 2010 for improved run-time and design-time performance

·         Microsoft Help System 1.0: the new help system introduced with Visual Studio 2010 that enables you to view documentation either online or offline and selectively choose which documentation to maintain offline

·         .NET Framework 4 Tools and Reference Assemblies: use tools and reference assemblies updated specifically for .NET Framework 4 development

·         MSBuild: support for .NET Framework 4 MSBuild in the SDK Command line for building native and managed applications using new Visual Studio 2010 project files (such as the vcxproj file for C++ applications)

Thanks, works fine with 8800GTX (G80)  :)

ATI Catalyst 10.5 WHQL available
« on: May 26, 2010, 07:13:51 PM »
Download here

ATI Catalyst™ 10.5 Driver – What’s New?

As usual here is the list of optimised applications and games (343) :

hl2.exe EFLC.exe Singularity.exe RTS-*.exe SR2_pc.exe Medieval_TD.exe HasteGame*.exe Napoleon.exe DeadIslandGame*.exe Demigod.exe OFDR.exe empires2.exe Heaven.exe MassEffect2.exe Conviction*.exe RUSE.exe AVP3.exe daorigins.exe LEGOIndy2.exe RocketKnight.exe Bioshock2.exe dairydash.exe BFBC*Game.exe BF1943Game.exe dirt2*.exe SamHD.exe Sanctuary.exe WinDVD.exe iw4?p*.exe Avatar.exe S8Game-F.exe rfg*.exe Borderlands.exe ShippingPC-SkyGame.exe BattlefieldHeroes.exe Ceville.exe gpl.exe ShippingPC-BmGame.exe \ZenoClash\hl2.exe APGame.exe GHWT.exe Gothic III Forsaken Gods.exe ??5DX9.exe Saboteur.exe CrimeCraft.exe BurningWheels*.exe Republic Heroes.exe SupremeCommander2.exe AA3Game.exe StreetFighterIV.exe Guitar Hero Aerosmith.exe KillingFloor.exe RiseOfTheArgonauts.exe DamnGame.exe DS.exe kb.exe SC2*.exe CoJBiBGame_x86.exe Overlord*.exe GameClient.exe ghost_w32.exe Wolverine.exe bsp.exe Fuel.exe bionic_commando.exe ElvenLegacy*.exe grimmgame.exe flashpoint*.exe Shift.exe theHunter*.exe TLR*.exe Wolf2.exe TerminatorSalvation.exe prototype?.exe eXperience112.exe Battleforge.exe EndWar.exe SilentHill.exe cabalmain.exe Client.exe WheelmanGame*.exe CompatAFR-1x1.exe Unigine.exe wanted.exe DragonAge.exe TS3*.exe DLords.exe FreeRunning.exe DOW2.exe godfather2.exe arma2.exe Empire.exe *.scr.EE3.exe MirrorsEdge.exe cstrike.exe Tropics.exe Legendary.exe fear2*.exe BurnoutParadise.exe Prince of Persia.exe Dead Space.exe biahh.exe war3.exe Mercenaries2.exe Merc2-Demo.exe left4dead*.exe Yeti_Final_Win32.exe FallOut3.exe CoDWaW*.exe tru.exe FF2client.exe RCT3.exe trgame.exe PT2Start.exe Transformers*.exe GTAIV.exe acad.exe aJewelQuestSolitaire.exe SeriousSam.exe FarCry2*.exe ProjectG.exe Jewel Quest Solitaire.exe Flip Words*.exe ExeFile.exe blacksite.exe MOHA.exe TurokGame.exe crossfire.exe kaneandlynch.exe Buildalot2.exe Legend.exe thief.exe GunBound.gme.SEGA Rally*.exe HAWX.exe SpaceSiege.exe SporeApp.exe AgeOfConan.exe tra.exe Jericho.exe MEM_7.exe Stranger.exe DevilMayCry*.exe MassEffect.exe GRID.exe witcher.exe mahjongg_artifacts.exe Studio.exe Diner_Dash_Flo_On_The_Go.exe Big Kahuna Reef.*.Chuzzle.exe Backspin.exe AcesOfTheGalaxy.exe \half-life 2 Demo\hl2.exe \portal\hl2.exe \team fortress 2\hl2.exe \half-life 2 deathmatch\hl2.exe \half-life 2 episode two\hl2.exe \half-life 2 episode one\hl2.exe \half-life 2 lostcoast\hl2.exe \half-life 2\hl2.exe \counter-strike source\hl2.exe \day of defeat source\hl2.exe \half-life deathmatch source\hl2.exe \half-life source\hl2.exe R6Vegas2_Game.exe AssassinsCreed*.exe Validator.exe GH3.exe xrEngine.exe FFOW.exe Settlers6*.exe MonsterGame.exe nfs.exe BA2.exe DiRT.exe ForceSingleGPU.exe TW2008.exe TW2006.exe game.exe SupremeCommander.exe hl.exe SpiderSolitaire.exe Solitaire.exe PurblePlace.exe Minesweeper.exe Mahjong.exe InkBall.exe Hearts.exe FreeCell.exe chess.exe R6Vegas_Game.exe 3dsmax*.exe Crysis*.exe UT3*.exe Wargame-g4wlive.exe hellgate*.exe SinEpisodes.exe iw3mp.exe iw3sp.exe Matrix.exe *Stranglehold.exe Bioshock.exe wic*.exe LostPlanet*.exe sims.icd.nhl2007.exe GodFather.exe pc_matador.exe AcroRd32.exe XR_3DA.exe Scarface.exe TestDriveUnlimited.exe mm.exe Gothic3.exe HitmanBloodMoney.exe NWN2*.exe TW2007.exe ARX.exe SplinterCell4.exe NFSC_demo.exe NFSC.exe RelicCOH.exe starwars_pc.exe LegoStarWarsII.exe fifa07*.exe fsx.exe primarysurf.exe CoJ.exe FEARXP.exe BF2142*.exe JustCause*.exe nhl06.exe battleofthegods.exe cccprev.exe RomeTW*.exe H5_Game.exe Condemned.exe trl.exe Inventor.exe Dwm.exe legends.exe gt.exe graw*.exe sweaw.exe game.dat.Timeshift*.exe nbalive06.exe oblivion.exe x3*.exe gwdev.exe pop3.exe USM.exe BattlefrontII.exe speedDemo.exe speed.exe narnia.exe white.exe KingKong*.exe 3DMark06*.exe RD3.exe Age3.exe Suffering2*.exe Sam2.exe KingKongDemo*.exe BOS.exe CoD2?P_s.exe Fable.exe EiB.exe Sims2EP2.exe DungeonSiege2.exe fs9.exe AFR-FriendlyD3D.exe FEARspdemo.exe FEAR.exe ACTOFWAR*.exe Ehshell.exe X2-Demo.exe X2.exe tribesv_?pdemo_en.exe Swat4SPDemo.exe PandoraMultiPlayerDemo.exe Sims2EP1.exe PCMark05.exe PCMark04.exe PainGame.exe Speed2demo.exe MaxPayne2Demo.exe FFXiWinBench.exe FFXiBench.exe Biademo.exe Tiger 2004.exe Snowblind-Demo.exe Snowblind.exe BreedSPD.exe Breed.exe XPANDRALLY.exe TV_CD_DVD.exe TRAOD*.exe Battlefront.exe Sims2.exe ShadowVault.exe patriots.exe thrones.exe Pariah.exe nba2005.exe mohpa.exe GW.exe pol.exe Driv3r.exe DFX.exe Bia.exe TechDemo.exe aquamark.exe w40k.exe pop2.exe FarCry.exe WoW.exe EverQuest2.exe Speed2.exe Lithtech.exe Swat4.exe SwgClient_r.exe SplinterCell3.exe Painkiller.exe MaxPayne2.exe FlatOutdemo.exe CMR5.exe BfVietnam.exe LockOn.exe 3DMark05.exe 3DMark03.exe 3DMark2001SE.exe 3DMark2001.exe BF2.exe Morrowind.exe TW2005.exe TW2004.exe halo.exe UT2004.exe UT2003.exe RD2D.exe RD2.exe CT3.exe pop.exe RaceDriver.exe SplinterCell2.exe SplinterCell.exe

and the OpenGL extensions (183) :

GL_EXT_vertex_array GL_EXT_abgr GL_EXT_copy_texture GL_EXT_subtexture GL_EXT_texture_object GL_EXT_texture3D GL_EXT_bgra GL_EXT_packed_pixels GL_EXT_rescale_normal GL_EXT_separate_specular_color GL_SGIS_texture_edge_clamp GL_EXT_texture_edge_clamp GL_EXT_texture_lod GL_SGIS_texture_lod GL_EXT_draw_range_elements GL_ARB_imaging GL_EXT_histogram GL_ARB_texture_compression GL_ARB_texture_cube_map GL_EXT_texture_cube_map GL_NV_texgen_reflection GL_EXT_texgen_reflection GL_ARB_multisample GL_ARB_multitexture GL_ARB_texture_env_add GL_EXT_texture_env_add GL_ARB_texture_env_combine GL_EXT_texture_env_combine GL_ARB_texture_env_dot3 GL_EXT_texture_env_dot3 GL_ARB_texture_border_clamp GL_ARB_transpose_matrix GL_ATI_texture_float GL_SGIS_generate_mipmap GL_NV_blend_square GL_EXT_blend_color GL_EXT_blend_minmax GL_EXT_blend_subtract GL_ARB_depth_texture GL_ARB_shadow GL_ARB_shadow_ambient GL_EXT_fog_coord GL_EXT_multi_draw_arrays GL_SUN_multi_draw_arrays GL_ARB_point_parameters GL_EXT_point_parameters GL_EXT_secondary_color GL_EXT_blend_func_separate GL_EXT_stencil_wrap GL_ARB_texture_env_crossbar GL_EXT_texture_lod_bias GL_ARB_texture_mirrored_repeat GL_IBM_texture_mirrored_repeat GL_EXT_texture_mirror_clamp GL_ARB_window_pos GL_ARB_vertex_buffer_object GL_ARB_occlusion_query GL_EXT_shadow_funcs GL_ARB_shader_objects GL_ARB_vertex_shader GL_ARB_fragment_shader GL_ARB_shading_language_100 GL_ARB_draw_buffers GL_ATI_draw_buffers GL_ARB_texture_non_power_of_two GL_ARB_point_sprite GL_EXT_blend_equation_separate GL_ATI_separate_stencil GL_ARB_pixel_buffer_object GL_EXT_pixel_buffer_object GL_EXT_texture_sRGB GL_ARB_fragment_program GL_ARB_texture_rectangle GL_EXT_texture_rectangle GL_ARB_fragment_program_shadow GL_ARB_vertex_program GL_EXT_texture_compression_s3tc GL_EXT_texture_filter_anisotropic GL_EXT_framebuffer_object GL_EXT_packed_depth_stencil GL_EXT_compiled_vertex_array GL_NV_copy_depth_to_color GL_ARB_texture_snorm GL_EXT_texture_snorm GL_ATI_texture_env_combine3 GL_ATI_texture_mirror_once GL_ARB_texture_float GL_ATI_texture_compression_3dc GL_KTX_buffer_region GL_ATI_fragment_shader GL_WIN_swap_hint GL_ATI_meminfo WGL_EXT_swap_control GL_ATI_envmap_bumpmap GL_EXT_gpu_program_parameters GL_ARB_framebuffer_sRGB GL_EXT_framebuffer_sRGB GL_EXT_packed_float GL_EXT_texture_shared_exponent GL_EXT_texture_compression_latc GL_ARB_texture_compression_rgtc GL_EXT_texture_compression_rgtc GL_AMD_performance_monitor GL_AMD_vertex_shader_tessellator GL_AMDX_vertex_shader_tessellator GL_AMD_texture_texture4 GL_ARB_texture_gather GL_EXT_framebuffer_blit GL_EXT_framebuffer_multisample GL_ARB_half_float_pixel GL_ARB_half_float_vertex GL_ARB_map_buffer_range GL_NV_half_float GL_NV_float_buffer GL_ARB_color_buffer_float GL_ARB_depth_buffer_float GL_ARB_shader_texture_lod GL_ARB_draw_instanced GL_EXT_draw_instanced GL_ARB_instanced_arrays GL_EXT_texture_swizzle GL_EXT_gpu_shader4 GL_EXT_texture_array GL_NV_explicit_multisample GL_ARB_texture_multisample GL_AMD_shader_stencil_export GL_ARB_blend_func_extended GL_AMD_texture_cube_map_array GL_ARB_texture_cube_map_array GL_ARB_draw_elements_base_vertex GL_ARB_occlusion_query2 GL_EXT_bindable_uniform GL_EXT_transform_feedback GL_ARB_transform_feedback2 GL_ARB_transform_feedback3 GL_ARB_vertex_array_object GL_EXT_vertex_array_bgra GL_ARB_vertex_array_bgra GL_NV_conditional_render GL_EXT_draw_buffers2 GL_ARB_framebuffer_object GL_EXT_texture_integer GL_ARB_texture_rg GL_ARB_texture_rgb10_a2ui GL_ARB_texture_buffer_object GL_EXT_texture_buffer_object GL_ARB_copy_buffer GL_EXT_copy_buffer GL_NV_fragment_program2 GL_NV_vertex_program3 GL_ARB_draw_buffers_blend GL_AMD_draw_buffers_blend GL_ARB_geometry_shader4 GL_EXT_geometry_shader4 GL_EXT_shader_atomic_counters GL_NV_primitive_restart GL_ARB_provoking_vertex GL_EXT_provoking_vertex GL_ARB_compatibility GL_ARB_uniform_buffer_object GL_EXT_texture_compression_bptc GL_ARB_seamless_cube_map GL_AMD_seamless_cubemap_per_texture GL_ARB_depth_clamp GL_ARB_texture_query_lod GL_ARB_sample_shading GL_ARB_shader_subroutine GL_ARB_gpu_shader5 GL_ARB_draw_indirect GL_ARB_fragment_coord_conventions GL_ARB_shader_bit_encoding GL_ARB_vertex_type_2_10_10_10_rev GL_AMDX_debug_output GL_ARB_timer_query GL_EXT_timer_query GL_AMD_name_gen_delete GL_ARB_sync GL_EXT_texture_buffer_object_rgb32 GL_ARB_sampler_objects GL_ARB_explicit_attrib_location GL_AMD_conservative_depth GL_ARB_tessellation_shader GL_ARB_gpu_shader_fp64 GL_EXT_vertex_attrib_64bit

Tech Soft 3D announced  it has signed an agreement with Adobe to take over development and support of Adobe’s 3D CAD translation suite and PDF publishing SDK.

Full story at Fireuser

SpeedTree Hand Drawing + Physics
« on: May 25, 2010, 07:56:51 PM »
SpeedTreeMiddleware  —  25. Mai 2010  —
A Live Oak tree is hand drawn on-the-fly directly in the SpeedTree 5.1 Modeler.
After the tree is created in under 1 minute, physical interaction is provided instantly via the APEX Vegetation Module, a division of PhysX™ by NVIDIA.
More examples at

NVIDIA GeForce GTX 480M introduced
« on: May 25, 2010, 07:11:48 PM »
Today NVIDIA introduced the new GeForce GTX 480M GPU for Notebooks.

And it costs only 823€  ::)

