Tuesday, March 17, 2009

a welcome to two baby brothers

well, I am definitely not given to moderation. Actually, now that I've spent all the money I am :-)

two new machines:

* * *


very similar to cudak1 described below, you wouldn't tell them apart from outside, so I'm not posting pictures today.
the same overall profile of a quiet, small, personal supercomputer. an office machine on which to learn cuda and develop applications, running 64-bit linux fedora 10 and cuda, as well as doing everything else you expect of a desktop workstation. (I crammed like 100 audio cd's into my rhythmbox files.)

major differences: X58 architecture with 4-core (8-thread) Nehalem Intel chip (2.68 GHz, I believe). only slightly overclocked gpus: 720 cores on 3 nVidia GeForce GTX280 H2O (as opposed to H2OC in original cudak1). Different communication bandwidths via PCI-e bus on a P6T6 workstation beard by Asus. has a multiplexer preventing a permanent degradation of a link to the middle graphics card, like in the 790i architecture. That doesn't mean you can get 3 x (5-6)GB/s flow concurrently to all 3 cards, but maybe that's an unlikely request in practice. cpu simply cannot handle 3 cards at max speed. at least all three cards are now on a more level playing field. they reach the same peak bandwidth of 5.8 GB/s, which Asus calls "true 3-way SLI". whatever. these motherboards are rare though, I understand.

the new Intel cpu also includes its memory controllers, offloading tasks form the previously overworked or at least overheating northbridge. and that means: finally no noisy, microscopic northbridge fan on the P6T6 mobo.

a nice surprise: you open up a system monitor and there are 8 separate cpus reported and graphed (twice the number of actual cores on Nehalem thanks to hyperthreading).

at some point I may describe one technical mod which I made to thermally stabilize the motherboard. I revesed all the Zalman's fans and am now blowing the hot air from the radiator outside, as God intended it. higher coolant and card temps (still comfortably low vis-a-vis specs) but, importantly, lower component and air temps inside the box --> no thermal hang-ups of the motherboard.

* * *


this one is a step-brother, not a twin brother of cudak1 vel Z-machine. Its an air-cooled monster
in a Thermaltake Armour full-tower case, with watercooling applied to the Nehalem cpu, but with 3 aircooled GTX295 dual-gpu cards. So 6 gpus this time not 3, although each a little slower (main clock ~579 MHz, as opposed to the overclocked H2OC @ 680 MHz). That translates to a theoretical peak performance of 5+ TFLOPs. Benchmarks heat gpus up to 91 C, and the system becomes a bit noisy.
oh well... the noise is a fair price to pay for a theoretical performance of a small campus-scale computing center. (I made a small mechanical modification to improve air flow, preemptively. So far I could not crash this system thermally, but I haven't tried extra hard, "just" 6 or 8 benchmarks running at the same time..)

total system cost of cudak2 and 3 was on the order of $5.5k CAD each; that is currently something like $4.4k USD. more info later.

a big can of Ooops.

the Z-machine described previously (cudak1) had an accident after just 2 months of duty.
its flow meter, that little paddle wheel which sends signals to the controlling pcb logic of the Zalman, got stuck unobserved at night. the cooling system panicked and switched itself off instead of pumping even more coolant. if you wonder what could happen next.. do the words chernobyl & tsunami mean anything to you? Zalman apparently uses bad handling of emergencies, and badly designed, faulty flow meters. what a shame, a very nice box as I said. it should absolutely switch off and protect the computer not just itself, in emergency.

anyway, I won't spend time describing this accident. my doctor says it's bad for my blood pressure. a reconfigured system is running again. limping a bit but alive. I cleaned the !@#$&^% flow meter.