rzip

rzip is a compression program, similar in functionality to gzip or bzip2, but able to take advantage long distance redundencies in files, which can sometimes allow rzip to produce much better compression ratios than other programs. The original idea behind rzip is described in my PhD thesis (see http://samba.org/~tridge/), but the implementation in this version is considerably improved from the original implementation. The new version is much faster and also produces a better compression ratio.

Latest release

The latest release is rzip 2.1. Changes in this release include: You can get this release from the download directory

Advantages

The principal advantage of rzip is that it has an effective history buffer of 900 Mbyte. This means it can find matching pieces of the input file over huge distances compared to other commonly used compression programs. The gzip program by comparison uses a history buffer of 32 kbyte and bzip2 uses a history buffer of 900 kbyte. The second advantage of rzip over bzip2 is that it is usually faster. This may seem surprising at first given that rzip uses the bzip2 library as a backend (for handling the short-range compression), but it makes sense when you realise that rzip has usually reduced the data a fair bit before handing it to bzip2, so bzip2 has to do less work.

Disadvantages

rzip is not for everyone! The two biggest disadvantages are that you can't pipeline rzip (so it can't read from standard input or write to standard output), and that it uses lots of memory. A typical compression run on a large file might use a couple of hundred MB of ram. If you have ram to burn and want the best possible compression rate then rzip is probably for you, otherwise stick with bzip2 or gzip.

Documentation

See the manual page

License

rzip is released under the GNU General Public License version 2 or later. See the COPYING file in the source distribution for details.

Performance

Compression benchmarks are always tricky things. The existing benchmarks I am aware of all deal with very small files, and if you are thinking of using rzip then you are almost certainly not interested in small files! For this reason I created a new compression corpus in 1998 which I called the "large-corpus". Of course, typical file sizes are getting bigger all the time, so the term "large" may not be all that appropriate any more, but it certainly has much larger files than the commonly used compression corpuses.

You can get a copy of the large-corpus files from http://samba.org/ftp/tridge/large-corpus/.

In the following I show the compression ratios of the large-corpus for rzip 2.0, gzip 1.3.5 and bzip2 1.0.2 on my Debian Linux laptop. In all cases the programs were run with their maximum compresion options.

File Namerzipgzipbzip2
large-corpus/archive 6.03 3.64 4.97
large-corpus/emacs 5.08 3.66 4.62
large-corpus/linux 5.54 4.24 5.23
large-corpus/samba 9.55 3.50 4.78
large-corpus/spamfile29.95 8.4314.23

Related Programs

  • Con Kolivas has released a very interesting varient of rzip, called lrzip, which can use multiple compressor backends and achieve even better compression

    Authors

    The original author of rzip is Andrew Tridgell. Version 2 of rzip also contains a lot of work from Paul Russell.

    Download

    You can download the latest release from the download directory.

    For the bleeding edge, you can fetch rzip via CVS or rsync. To fetch via cvs use the following command:

      cvs -d :pserver:cvs@pserver.samba.org:/cvsroot co rzip
    
    To fetch via rsync use this command:
      rsync -Pavz samba.org::ftp/unpacked/rzip .
    

    Andrew Tridgell
    rzip AT tridgell.net