« on: December 30, 2009, 11:53:03 AM »
This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.
Nvidia originally scheduled to launch Fermi in November 2009, but was delayed until CES in January 2010 due to defects, according to market rumors. However, the company recently notified graphics card makers that the official launch will now be in March 2010, the sources noted.
There is an exception to this – high-power graphics cards, we love these. They make games sexy and that makes us sexy. At the heart of these is the GPU, and when Nvidia announces it has a new and wonderful one, it is time to take notice. It's codenamed Fermi, after renowned nuclear physicist, Enrico Fermi.
The silicon has been designed from the ground-up to match the latest concepts in parallel computing. The basic features list reads thus: 512 CUDA Cores, Parallel DataCache, Nvidia GigaThread and EEC Support.
Clear? There are three billion transistors for starters, compared to 1.4 billion in a GT200 and a mere 681 million on a G80. There's shared, configurable L1 and L2 cache and support for up to 6GB of GDDR5 memory.
The block diagram of Fermi looks like the floor plan of a dystopian holiday camp. Sixteen rectangles, each with 32 smaller ones inside, all nice and regimented in neat rows. That's your 16 SM (Streaming Multiprocessing) blocks with 512 little execution units inside, called CUDA cores.
Each SM core has local memory, register files, load/store units and thread scheduler to run the 32 associated cores. Each of these can run a floating point or an integer instruction every click. It can also run double precision floating point operations at half that, which will please the maths department.
In all honesty, we can completely understand the rampant issue of piracy and there is a worrisome number that I won't stop repeating. Crytek is a good example of a company that wanted to stay PC-only but simply could not build a business model due to multi-million dollar damages caused by prospective buyers opting for a pirated copy of the game. When Epic Games released their Unreal Tournament III game, they marked 40 million different installations trying to access online servers for multiplayer action.
In short, if those 40 million people went on and purchased the game instead of downloading it from The Pirate Bay and similar sites, Epic Games would earn approximately two billion dollars [40M times $49.95 recommended price, minus some $39.95 in US and plus some Euro 49.99/$62.44 at the time in EU lands]. Now, imagine what would Epic be able to do if they had an influx in excess over a billion dollars. Would Unreal Engine 4 need 4-6 years to develop in a limited budget or could Tim hire as much people as he needs and deliver an engine perfectly optimized for a whole spectrum of PC hardware?
OpenGL 3.2 and GLSL 1.5 is available but there is a lack of simple and complex example programs. On this webpage, I do want to fill this gap by providing example programs using OpenGL 3.2 and GLSL 1.5 with GLEW. Please note, that all example programs do not use any deprecated OpenGL functions.
The NVIDIA Texture Tools is a collection of image processing and texture manipulation tools, designed to be integrated in game tools and asset conditioning pipelines.
The primary features of the library are mipmap and normal map generation, format conversion and DXT compression.
DXT compression is based on Simon Brown's squish library. The library also contains an alternative GPU-accelerated compressor that uses CUDA and is one order of magnitude faster.
We have two OpenGL based graphics systems in Qt. One for OpenGL 1.x, which is primarily implemented using the fixed functionality pipeline in combination with a few ARB fragment programs. It was written for desktops back in the Qt 4.0 days (2004-2005) and has grown quite a bit since. You can enable it by writing -graphicssystem opengl1 on the command line. It is currently in life-support mode, which means that we will fix critical things like crashes, but otherwise leave it be. It is not a focus for performance from our side, though it does perform quite nicely for many scenarios.
Our primary focus is the OpenGL/ES 2.0 graphics system, which is written to run on modern graphics hardware. It does not use a fixed functionality pipeline, only vertex shaders and fragment shaders. Since Qt 4.6, this is the default paint engine used for QGLWidget. Only when the required feature set is not available will we fall back to using the 1.x engine instead. When we refer to our OpenGL paint engine, its the 2.0 engine we’re talking about.
In less than one hour, I went from my rather complex SSE inline assembly, to a simple, clear Mandelbrot implementation... that run... 15 times faster!
Let me say this again: 1500% faster. Jaw dropping. Or put a different way: I went from 147fps at 320x240... to 210fps... at 1024x768!
I only have one comment for my fellow developers: It is clear that I was lucky - the algorithm in question was perfect for a CUDA implementation. You won't always get this kind of speedups (while at the same time doing it with clearer and significantly less code).
But what I am saying, is that you must start looking into these things: CUDA, OpenCL, etc.
_global__ void CoreLoop( int *p,
float xld, float yld, /* Left-Down coordinates */
float xru, float yru, /* Right-Up coordinates */
int MAXX, int MAXY) /* Window size */
float t1, t2, o1, o2;
unsigned result = 0;
unsigned idx = blockIdx.x*blockDim.x + threadIdx.x;
int y = idx / MAXX;
int x = idx % MAXX;
re = (float) xld + (xru-xld)*x/MAXX;
im = (float) yld + (yru-yld)*y/MAXY;
rez = 0.0f;
imz = 0.0f;
k = 0;
while (k < ITERA)
o1 = rez * rez;
o2 = imz * imz;
t2 = 2 * rez * imz;
t1 = o1 - o2;
rez = t1 + re;
imz = t2 + im;
if (o1 + o2 > 4)
result = k;
p[y*MAXX + x] = lookup[result]; // Palettized lookup
The Unreal Engine, created by Epic Games, contains a breathtaking 2.5m lines of code – as Tim Sweeney, technical director: "That's roughly comparable to the complexity of a whole operating system a decade ago."
"Game development is at the cutting edge in many disciplines," says Sweeney. "The physics in modern games includes rigid body dynamics and fluid simulation algorithms that are more advanced than the approaches described in research papers."
The unofficial, unconfirmed but quite real Radeon HD 4860 made by Sapphire has debuted in the US and is currently up for grabs for $130. This model is powered by the 55nm RV790 GPU and has 640 Stream Processors, a 256-bit memory interface, a dual-slot cooler, CrossFireX support, plus DVI, HDMI and DisplayPort outputs.
Kaspersky uses NVIDIA Tesla GPU to detect new viruses and achieves 360-fold performance increase over the common CPU.
Since the real performance of NVIDIA’s next-generation Fermi-based GPUs has yet to be seen, in this article we’re comparing the HD 5970 with NVIDIA’s current fastest dual-GPU graphics card – GeForce GTX295, and checking out if HD 5970 is capable of challenging the Radeon HD 5870 CrossFire. Of course we’ll also explore further regarding its temperature, power comsumption and overclocking potential.
Recently via email we were asked to run a comparison of the different anti-aliasing and image rendering options between the ATI/AMD and NVIDIA Linux drivers and hardware. Well, we have now run a few quantitative and qualitative tests at different anti-aliasing levels under Linux. For those that want to run the tests themselves with their own drivers and hardware, we also have provided instructions on how you can easily do so using the Phoronix Test Suite 2.4 "Lenvik" development build -- it is irresistibly easy.
AMD is now working away on the first low-end desktops graphics cards to feature DirectX 11 support, those powered by the Redwood and Cedar 40nm GPUs.