rzip
rzip is a compression program, similar in functionality to gzip or
bzip2, but able to take advantage long distance redundencies in files,
which can sometimes allow rzip to produce much better compression
ratios than other programs.
The original idea behind rzip is described in my PhD thesis (see http://samba.org/~tridge/), but the implementation in this
version is considerably improved from the original implementation. The
new version is much faster and also produces a better compression
ratio.
Latest release
The latest release is rzip 2.1.
Changes in this release include:
- Added -L compression level option
- minor portability fixes
- fixed a bug that could cause some files to not be able to be
uncompressed
You can get this release from the download directory
Advantages
The principal advantage of rzip is that it has an effective history
buffer of 900 Mbyte. This means it can find matching pieces of the
input file over huge distances compared to other commonly used
compression programs. The gzip program by comparison uses a history
buffer of 32 kbyte and bzip2 uses a history buffer of 900 kbyte.
The second advantage of rzip over bzip2 is that it is usually
faster. This may seem surprising at first given that rzip uses the
bzip2 library as a backend (for handling the short-range compression),
but it makes sense when you realise that rzip has usually reduced
the data a fair bit before handing it to bzip2, so bzip2 has to do
less work.
Disadvantages
rzip is not for everyone! The two biggest disadvantages are that you
can't pipeline rzip (so it can't read from standard input or write to
standard output), and that it uses lots of memory. A typical
compression run on a large file might use a couple of hundred MB of
ram. If you have ram to burn and want the best possible compression
rate then rzip is probably for you, otherwise stick with bzip2 or
gzip.
Documentation
See the manual page
License
rzip is released under the GNU General Public License version 2 or
later. See the COPYING file in the source distribution for details.
Performance
Compression benchmarks are always tricky things. The existing
benchmarks I am aware of all deal with very small files, and if you
are thinking of using rzip then you are almost certainly not
interested in small files! For this reason I created a new compression
corpus in 1998 which I called the "large-corpus". Of course, typical
file sizes are getting bigger all the time, so the term "large" may
not be all that appropriate any more, but it certainly has much larger
files than the commonly used compression corpuses.
You can get a copy of the large-corpus files from http://samba.org/ftp/tridge/large-corpus/.
In the following I show the compression ratios of the large-corpus for
rzip 2.0, gzip 1.3.5 and bzip2 1.0.2 on my Debian Linux laptop. In all
cases the programs were run with their maximum compresion options.
File Name | rzip | gzip | bzip2 |
large-corpus/archive | 6.03 | 3.64 | 4.97 |
large-corpus/emacs | 5.08 | 3.66 | 4.62 |
large-corpus/linux | 5.54 | 4.24 | 5.23 |
large-corpus/samba | 9.55 | 3.50 | 4.78 |
large-corpus/spamfile | 29.95 | 8.43 | 14.23 |
Related Programs
Con Kolivas has released a very interesting varient of rzip, called
lrzip, which can use
multiple compressor backends and achieve even better compression
Authors
The original author of rzip is Andrew Tridgell. Version 2 of rzip also
contains a lot of work from Paul Russell.
Download
You can download the latest release from the download directory.
For the bleeding edge, you can fetch rzip via CVS or
rsync. To fetch via cvs use the following command:
cvs -d :pserver:cvs@pserver.samba.org:/cvsroot co rzip
To fetch via rsync use this command:
rsync -Pavz samba.org::ftp/unpacked/rzip .
Andrew Tridgell
rzip AT tridgell.net