| gamesfairy ( @ 2008-12-28 04:35:00 |
I recently threw together a small compute cluster from some spare parts I had. It comprises five Pentium 3 chips, clocked at 1Ghz, and one mobile P4 clocked at 1.4Ghz. Each has a relatively small amount of RAM - around the 64MB mark - mainly because I don't have more lying around.
The cluster boots via a modified version of (the rather excellent) LTSP environment. Each node is diskless, and boots via PXE, downloads a kernel/initrd, and mounts root via NFS. Obviously, disk access is slow in this configuration (compounded by the fact that the 'server' machine runs IDE disks, at UDMA 66) but it kicks out relatively little heat and chomps relatively little power. Each node can be powered on via WoL - important since the cluster is stored away from my house, making physical maintainence hard. There's a MOSIX kernel compiled, but not set up properly, as I don't need MOSIX yet. It's a shame that openMosix died (although great that I can get a student license of MOSIX!)
Anyway, one thing that the cluster is configured for is to use distcc, a distributed C compiler. There is some discussion over the optimum number of C programs to compile in parallel (the '-j' flag as passed to 'make') and so I thought I'd run some investigation.
I took a vanilla 2.6.27-10 kernel, ran make defconfig, and then ran make CC=distcc HOSTCC=distcc -j $j, where $j ranged from 1 to 10. The results were interesting, shown below:

These results were taken when the cluster comprised of four PIII 1GHz chips, with around 64 or 128MB memory per node. distcc was configured not to use the server machine as a compilation host (any more than was neccesary).
We can clearly see that 6-8 parallel processes is optimum - taking 15 minutes to compile the kernel. The elevated time at 5 is possibly a freak value, and values above 8 are probably the result of network congestion.
So there you have it - for a five node NFS-root no-swap low-memory p3-1Ghz cluster, use -j 7 :D
Hope this post is useful for someone. Please leave a comment if it is, or comment just if you think I'm wrong!
The cluster boots via a modified version of (the rather excellent) LTSP environment. Each node is diskless, and boots via PXE, downloads a kernel/initrd, and mounts root via NFS. Obviously, disk access is slow in this configuration (compounded by the fact that the 'server' machine runs IDE disks, at UDMA 66) but it kicks out relatively little heat and chomps relatively little power. Each node can be powered on via WoL - important since the cluster is stored away from my house, making physical maintainence hard. There's a MOSIX kernel compiled, but not set up properly, as I don't need MOSIX yet. It's a shame that openMosix died (although great that I can get a student license of MOSIX!)
Anyway, one thing that the cluster is configured for is to use distcc, a distributed C compiler. There is some discussion over the optimum number of C programs to compile in parallel (the '-j' flag as passed to 'make') and so I thought I'd run some investigation.
I took a vanilla 2.6.27-10 kernel, ran make defconfig, and then ran make CC=distcc HOSTCC=distcc -j $j, where $j ranged from 1 to 10. The results were interesting, shown below:

These results were taken when the cluster comprised of four PIII 1GHz chips, with around 64 or 128MB memory per node. distcc was configured not to use the server machine as a compilation host (any more than was neccesary).
We can clearly see that 6-8 parallel processes is optimum - taking 15 minutes to compile the kernel. The elevated time at 5 is possibly a freak value, and values above 8 are probably the result of network congestion.
So there you have it - for a five node NFS-root no-swap low-memory p3-1Ghz cluster, use -j 7 :D
Hope this post is useful for someone. Please leave a comment if it is, or comment just if you think I'm wrong!