"Speed, it seems to me, provides the one
genuinely modern pleasure."
-- Aldous Huxley (1894 - 1963)
download 0.4
|
news
|
[Manual: HTML
PDF]
CVS
|
Freshmeat
|
[ IRC: #distcc on OPN]
|
mailing
list
distcc is a program to distribute compilation of C code across several machines on a network. distcc should always generate the same results as a local compile, is simple to install and use, and is often significantly faster than a local compile.
Unlike other distributed build systems, distcc does not require all machines to share a filesystem, have synchronized clocks, or to have the same libraries or header files installed. Machines can be running different operating systems, as long as they have compatible binary formats or cross-compilers. (Currently it is being tested on gcc-linux-x86 and gcc-freebsd-x86.)
distcc sends the complete preprocessed source code across the network for each job, so all it requires of the volunteer machines is that they be running the distccd daemon, and that they have an appropriate compiler installed.
distcc is designed to be used with GNU make's parallel-build feature (-j). Shipping files across the network takes time, but few cycles on the client machine. Any files that can be built remotely are essentially "for free" in terms of client CPU.
distcc is quite new but has successfully compiled the Linux kernel, rsync, KDE, Samba and Ethereal, sometimes over twice as fast as a single machine.
Typical results: building Samba HEAD on a single HP x2000 (1700MHz P4, 1GB) takes 7 minutes, 15 seconds. Using distcc across three such machines on a 10Mbps hub takes only 3 minutes, 9 seconds (130% faster) and generates an identical binary.
distcc is distributed under the GNU General Public Licence v2.
Here's the good stuff:
You can use Freshmeat's subscription feature to be notified of new releases.
There is a single mailing list for development and use questions.
There will be a seminar about distcc at AOSS4, the Australian Open Source Symposium, on Saturday July 20, at UNSW in Sydney.
Back in Australia. I'll probably do an 0.5 release soon with the changes that have accumulated. If you want to see it, look on the freeze_0_5 branch in CVS.
In the meantime I made some progress on compression and on the test suite, but that's not committed yet.
automake is quite a pain, particularly when people with slightly different versions try to build from CVS. Should you keep the generated files in CVS or not? I feel a bit uncertain whether it's worth the hassle or not.
distcc now has more of a test suite written in Python, with some C scaffolding to support it. I'm very pleased with this approach, as it seems more productive than using either sh or C. The main drawback is that you can't run it on machines without Python. I'm happy to see the very cool Subversion project is using this approach too.
Partly to support the test suite, distcc needs a way to specify an alternative port for opening TCP connections. It would seem to make sense to add in a syntax for SSH connections at this time. I think something like HOST[:PORT] for plain TCP, and USER@HOST[:COMMAND] for SSH. With rsync, it's proven quite important to allow people to explicitly specify the remote daemon name, since it can be difficult or confusing to get the search path set properly under sshd.
What I neglected to say the other day is that I'm hoping LZO will be cheap enough that it can just be always on. I like Havoc's thoughts on not multiplying configuration options beyond necessity. Doing this will necessity bumping the version and force both sides to upgrade, but that's why it's called beta.
lzo
I'm looking at using LZO compression for squishing source and object files for transit. Typical compression results are:
80977 distcc.i 20217 distcc.i.1.gz 19408 distcc.i.2.gz 18858 distcc.i.3.gz 26397 distcc.i.1.lzo 19334 distcc.i.9.lzo
lzo at default (level 1) compression is enough substantially faster than gzip, enough that time(1) has time measuring it accurately.
gzip at level 3, where the time usage is moderate, is 23% the size of uncompressed. lzo at level 1, where the time usage is even smaller, is 32% of the size of the original. lzo level 9 is slower than gzip.
One way to look at this is that lzo is 40% larger than gzip; but on the other hand reducing network traffic to 32% of the original value may be sufficient for most situations.
Solaris: Solaris should be much closer to working in CVS HEAD. If you have a Sun machine, please try it and let me know. Thanks to Petter Reinholdtsen for helping with this.
I'm trying to work out a good way to schedule jobs across SMP machines: even if the machine as a whole is relatively slow, it still makes sense to keep roughly one job per CPU. Ideally the clients would not need to know too many details about all the machines they're talking to, and the protocol will stay fairly simple. The current plan is to make the daemon only accept as many jobs as it can handle, and the client will intelligently back off at that point. That also allows the owner of the volunteer machine to limit the amount of work it does. I did some work towards that tonight.
Another open question is how to handle cross-compilation. A benefit of the distcc design is that all you need to do is set the compiler name appropriately, and of course have that compiler installed. In the common case of homogenous machines, you shouldn't need to do anything; so perhaps the cc name should only be changed if the hosts are of different type.
I'm testing distcc against GARNOME, and it seems to be building correctly, except that Sawfish (?) might have a race condition in its Makefile. Apparently it takes over an hour to build the whole thing, so ccache and/or distcc should be a big win.
bje suggested that distcc ought to be able to assemble (as opposed to compile) remotely, and HEAD does that now.
There's now also quite a nice little test suite in Python. It's much easier than writing tests in sh or Tcl. One of the tests builds a C and Assembly program remotely, and then checks if the result executes properly. Once you get in the swing of writing tests it becomes much easier.
And all this adds up to release 0.4: not huge changes, but some worthwhile improvements to be getting on with.
<bje> I'm in distcc heaven :-) <bje> 50 minute builds down to 15 minutes.and an anonymous happy customer writes:
<anon> it's one of those programs that, if it was a dog, you'd give a good petting, a scratch behind the ears, and say "good doggie!"
Celso writes:
bulmalug its the Majorca local lug weblog and it's one of the more famous in the spanish world (over 6000 daily visits). It's an introductory article exposing the distcc adventages and how to use the distccd, distcc and enviroment variables. It's the begin of the distcc world domination ;)