Sunday, December 21, 2008

so far so good

the ZMachine is a hardware success! (excuse my enthusiasm - this is the first computer I've built completely from scratch.) In a midtower with 790i mini-atx format motherboard, I have 3 powerful, o.c'd gtx280 GPUs (680 MHz base clock > 602 Mhz stock). they pack 720 cores, waiting to be used in parallel. the quad-core cpu keeps the gpus busy and isn't choking, handling the workload ok.

hydraulic assembly was fun, and it turns out you CAN achieve zero leaks ;-) Zalman is a bit cramped inside, but after several hours you can figure out how to route all those power connectors and tubes. it's really a high-end box made of 4-5 mm aluminum plates and doors. it looks and sounds good.

Zalman's 3 l/min pump (max rating, in practice half that value with all those waterblocks) is doing fine. in fact, when run on automatic setting, the cooling system never goes beyond the minimum fan rpm (1000) and flow rate (~1 l/min), since the cpu and gpus aren't able to heat the coolant to more than about 46 C (I think Zalman alarms and does strange things like shutting off above 60 C). This is one of the reasons the system (if not running cuda) is very quiet. Max load, producing up to ~1000 W of heat, causes my northbridge fan to go to audible levels, since the south- and northbridge aren't watercooled and easily heat up above 80 C.

the new x58 chipset (for i7 nehelem processors, socket LGA1366) will not have that problem. nevertheless, I'm waiting for updated waterblock mounts (some are available already) and a larger number of PCIe lines on those new architectures. the PCIe bandwidth seems to be a big problem with X58/i7. for all I know, they don't yet offer 3 x16 PCIe configuration like my evga mobo does: degrading just one of the three slots to 1.0 standard, equivalent to x8 slot.
the X58 manufacturers are apparently under no pressure to widen PCIe, since the SLI/crossfire takes over from the pci express bus the duty of gluing the cards together. but why would they go back and reduce the PCI throughput as compared to 790i? beats me, unless this is a limitation of the QPI sections on the cpu right now.

you may notice that I did not mention SLI. Unfortunately, SLI and CUDA aren't friends yet :-( But somebody wanting to use 3-way SLI could do it in the Zalman LQ1000 box. despite occasional thermal crash problems I'm rather happy with the system tests so far. those problems arise only when the machine is overloaded with computations (multiple large simulations per gpu), and are likely due to very busy north/southbridge on a 790i motherboard (this should not be a factor on newer i7/socket LGA1366 boards, as mentioned - you won't have a fan on the northbridge).

the ZMachine is certainly pushing the limits, trying to be small, powerful and quiet at the same time. it seems that it will handle scientific simulations very well, as they tend to be similar to the simple fluid and particle codes includes as examples in the sdk. by the way, those examples are really useful. if I wanted to run the kind of test I described on my clusters I'd need dozens to hundreds of cpus. I can't give you an estimate of speedup yet, I only know
that runnig the sdk's examples in emulated mode, where you compile with flags forcing execution on cpu, is not a fair comparison, because the programs would in reality be run very differently on cpus.

* * *

of course, since nvidia is doubling the graphics cards' transistors and performance so rapidly, and the first to appear are the air cooled cards, there is a valid question as to water vs. air cooling. in a server room, Tesla rackmounted machines may be the best choice (not necessarily the cheapest). I myself may soon build an air cooled production-run machine to be kept away from people's desks, based on the upcoming gtx 295 gpu. it will certainly be much cheaper, about half-priced compared with the existing gtx 280 cards. [about 1.66 of my cards are needed to match the planned performance of one new dual 295, so I had to pay $850 (CAD) * 1.66 ~ $1400 (CAD) for the performance I will be able to buy for under $500 (US) or $600 (CAN).] but two 295s will be much louder than my current setup! moreover, gpus are only 40% or so of my system's price..

if the situation with mobo's continues for more than a year, CUDA will make sense on those fast x16 slots only, each of which will have to be shared by two gpus (decreasing communication bandwidth we have now?). we will have max 4 gpus; power consumption and air cooling will restrict the clock speeds (e.g. ~580 MHz on the upcoming gtx295 vs. 680 MHz on my 280s). until something changes radically regarding the PCIe bus designs, or cooling/overclocking of gpus, in the near future we're going to have a slow progress in CUDA hardware, as the 4 new gpus on 2 cards will not be much faster than the current 3 water-cooled gpus.

so enjoy the moment and, in theory, get onto the list of 500 fastest supercomputers (ending at some 10 TFLOPs performance) for just $20k, right now!

No comments:

Post a Comment