Pawel Artymowicz lab: GPU supercomputing & CUDA: GPU >> CPU

in early 2007, nVidia opened up the gates to a paradise. a free if not entirely open-source project called CUDA made its debut. it's a general-purpose graphics device computing, utilizing the massively parallel architecture of today's GPUs, or graphics processing units. in effect, all the recent nvidia cards became capable of carrying out parallel computational tasks. their raw power exceeds that of a CPU by a factor now typically ~10^2.

CUDA is best explained in this wiki page.
It is nicely illustrated using real-life applications in the
nVidia CUDA Zone. Typical speedups w.r.t. cpu are 5-100.

mythbusters were hired to illustrate the power of parallel processing and constructed a cute, monstual
parallel paint gun to paint mona lisa in less than 80 ms. it's fun to watch the monster and its 1024 paintballs flying slow-mo to meet their final destination: canvas.

without much exaggeration one can say that gpus could only be ignored so long - as soon as there's a solution that speeds up your program 100 times, you have no choice but to change to that track, no mater how comfortable your old one was.

for me, the old path was clusters and MPI. I started 10 yrs ago with a cluster of sun ultra5 workstations, and later continued with a cluster of custom built pc's in rack mounts.
hydra cluster, 5 GFLOPs
sample application of hydra
ANTARES cluster, 61-144 GFLOPs

MPI is a language (more precisely protocol and libraries that implement it, for exchange of data between nodes of a cluster). physical exchange was facilitated by commodity gigabit ethernet switches that became affordable about that time.

clusters were great, and essentially most today's supercomputers are built like that: farms of dozens to tens of thousands of machines hooked up by relatively slow interconnects. distributed memory and distributed processing power. which is ok for some problems, like hydrodynamics w/o radiation transfer or self-gravity in astrophysics, or frame-by-frame movie rendering and postproduction in a studio.

so clusters were great, but not trouble-free. you had to wait (hours or days, depending on how ambitious your computation was!) for the requested number of processors on some big national supercomputer to be allocated to your simulation. or you could decide to build your own little cluster, if you had money, place and time for it. that made more sense to many, and could cost your grant agency 'only' $20k or so, unless you really needed smp (shared memory machine, then you had to multiply the cost 5-10 times.)

you and your associates could only run a system of a few dozen nodes at best. beyond that magical number, frequent individual component breakdowns, software upgrades, and so on, needed to be taken care of by a professional sysadmin or technician (which you could not afford; so you used
your nodes praying they don't fail, and did not repair those that eventually did.)

on a small cluster, scientific long-term simulations could in practice be done in 2D but rarely in 3D, unless you were very lucky with your problem and/or very patient..

* * *

let's skip to 2007 then. why is a gpu hundred times more powerful than a cpu?
today, both are capable of parallel computation, since they have multi-core structure.
each gpu core is far less advanced on the control/vectorization side a bit slower.
but your ~$350 4-core intel processor is no match for 216-240 cores of a ~$350 nvidia gpu, on the newest G200 card series (nForce gtx260, gtx2800, and from january 2009 also a 2-gpu card gtx295 with 480 cores). ASSUMING YOU CAN harness the combined power of those gpu cores...

so here's the challange: to build and program a massively parallel system (with hundreds of cumputing nodes or cores) that is a bit more environmentally friendly than the old clusters: much less noise, much less total electrical power used, and finally much much more bang for the buck.
and that means: a supercomputer in a signle computer case, performing thousands of GFLOPs!
perhaps a thousand times the number of operations you could perform 10 yrs ago.

Pawel Artymowicz lab: GPU supercomputing & CUDA

Sunday, December 21, 2008

GPU >> CPU

No comments:

Post a Comment

Blog Archive

Search This Blog

brief specs: cudak1

Followers

Acknowledgements