Geeks3D Forums
March 14, 2010, 03:13:16 PM *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News: Geeks3D Shader Library
 
   Home   Help Search Login Register  
Pages: [1]
  Print  
Author Topic: A realtime Mandelbrot zoomer in SSE assembly and CUDA  (Read 344 times)
0 Members and 1 Guest are viewing this topic.
JeGX
Global Moderator
Full Member
*****
Posts: 211


View Profile WWW
« on: December 16, 2009, 04:22:46 PM »

Link: http://users.softlab.ece.ntua.gr/~ttsiod/mandelSSE.html

 Last weekend, I got to play with an NVIDIA GT240 (around 100$). Having read a lot of blogs about GPU programming, I downloaded the CUDA SDK and started reading some samples.

Quote
In less than one hour, I went from my rather complex SSE inline assembly, to a simple, clear Mandelbrot implementation... that run... 15 times faster!

Let me say this again: 1500% faster. Jaw dropping. Or put a different way: I went from 147fps at 320x240... to 210fps... at 1024x768!

I only have one comment for my fellow developers: It is clear that I was lucky - the algorithm in question was perfect for a CUDA implementation. You won't always get this kind of speedups (while at the same time doing it with clearer and significantly less code).

But what I am saying, is that you must start looking into these things: CUDA, OpenCL, etc.

Code:
_global__ void CoreLoop( int *p,
  float xld, float  yld, /* Left-Down coordinates */  
  float xru, float  yru, /* Right-Up coordinates */  
  int MAXX, int  MAXY) /* Window size */
{
    float re,im,rez,imz;
    float t1, t2, o1, o2;
    int k;
    unsigned result =  0;
    unsigned idx =  blockIdx.x*blockDim.x + threadIdx.x;
    int y = idx / MAXX;
    int x = idx % MAXX;

    re = (float) xld + (xru-xld)*x/MAXX;
    im = (float) yld + (yru-yld)*y/MAXY;
    
   rez = 0.0f;
   imz = 0.0f;
   k = 0;
   while (k < ITERA)
   {
     o1 = rez * rez;
     o2 = imz * imz;
     t2 = 2  * rez * imz;
     t1 = o1 -  o2;
     rez = t1 +  re;
     imz = t2 +  im;
     if (o1 +  o2 > 4)
     {
        result = k;
        break;
     }
     k++;
  }
  p[y*MAXX + x] =  lookup[result]; // Palettized lookup
}
Logged

Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.11 | SMF © 2006-2009, Simple Machines LLC Valid XHTML 1.0! Valid CSS!