Saturday, 4 June 2011

lrzip tarball of all 40 linux-2.6 kernels

With the 2.6 linux kernel now officially finished, I'm providing a tarball of all 40 of the 3 point kernel releases as an lrzip tarball. This is a convenient way of getting all the releases in a relatively low bandwidth form, and previous archives of this nature have had a few downloads so I figured I'd complete the archive for those who want to download it and use it as an ad for lrzip:

163.9MB:
linux-2.6.0-2.6.39.tar.lrz

10.3GB:
linux-2.6.0-2.6.39.tar

Original file is a tarball of all 3 point linux kernel release tarballs 2.6.0 to 2.6.39 coming to a grand total of 10.3GB

Of course this is a plug for lrzip on something it is particularly good at compressing.

lrzip can be obtained here:
http://lrzip.kolivas.org

About this archive: It was compressed with lrzip version 0.606 on a quad core 3GHz core 2 on a relatively slow external USB2 hard drive with the following options:

lrzip -UL 9 linux-2.6.0-2.6.39.tar
Total time: 00:56:19.88

It would have compressed a lot faster without the -L 9 option, but given this is the "final" archive of 2.6, I figured I'd push it a bit further. Lrzip can compress it even further with zpaq as an option, but it makes decompression much slower so I'd personally find the archive less useful.

Of course someone will ask how it compares to xz, so for completion:

xz -9 linux-2.6.0-2.6.39.tar
Total time: 2:05:32.218

11067473920 linux-2.6.0-2.6.39.tar
1535618848 linux-2.6.0-2.6.39.tar.xz 13.8%
171879382 linux-2.6.0-2.6.39.tar.lrz 1.6%

EDIT:
I figured I'd do the rest as well.

Here is a tarball of all 161 stable 3 point releases from linux-1.0 to linux-2.6.39:

211MB:
linux-1.0-2.6.39.tar.lrz

19617064960 linux-1.0-2.6.39.tar
221368298 linux-1.0-2.6.39.tar.lrz 1.1%

As compressing this workload is mostly I/O bound I performed it on an SSD
drive this time:

linux-1.0-2.6.39.tar - Compression Ratio: 88.617. Average Compression Speed: 8.026MB/s.
Total time: 00:38:50.76

11 comments:

  1. Wow, that's *very* nice! I wonder how large a (text only) Wikipedia would be!

    ReplyDelete
  2. Wikipedia, unlike the linux kernel, does not have much in the way of redundant information, and in fact the first 100MB (enwik8) and first 1000MB (enwik9) of wikipedia text is a very common benchmark for compression. Lrzip does very well, but only compared to the "regular" compression algorithms. Compared to dedicated algorithms designed to be used for that purpose specifically, they do better. But then, lrzip is meant as a general purpose compression program on large files, not dedicated for one type of data only.

    See:
    Large text compression benchmark

    ReplyDelete
  3. Ok, Thanks !
    Lrzip works with a dictionary as well, right? Would it make sense to use the dictionary from that file as basis for kernel archives? If I understand compression correctly, a large part of the data would be a dictionary. Then would it be possible to make archives based on that dictionary that are smaller to download once you have the "main linux source" dictionary? Or put in another way: How much of those 164 MB is dictionary, and how much each kernel's unique data.

    You didn't take the user/system times with /usr/bin/time, right? ;)

    ReplyDelete
  4. D., it looks like you may want to use.. git.

    ReplyDelete
  5. I'd also wish your lrzip would get a bit more attention.

    It is brilliant.

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. Sorry about that. Thought it posted images but I don't know how (or if) we can so I'll just put the URLs to the images below. Inspired by ck's post, I thought I would give this a shot with the 2009 assembly the complete Human Genome (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/) rather than of the 2.6 series of source for shits and giggles.

    I compared gzip, lzma2, rzip/lzma, bzip, and rar to compress and decompress the datasets which totaled ~3 GB on an Intel Xeon 3360 clocked @ 3.40 GHz (quad core) with 8 GB of RAM.

    Filesize/ratio
    http://img202.imageshack.us/img202/1383/sizej.png

    The top tier in terms of file size were rar, rzip/lzma, and lzma2. Bzip2 and gzip gave the largest archives. Here gzip produced an archive that was over 112 MB larger than the next closest competitor.

    Compression speed
    http://img863.imageshack.us/img863/9987/compress.png

    In terms of compression speed, bzip2 was the quickest. The second tier was occupied by rar and gzip. A distant forth was rzip/lzma very distant fifth was lzma2 which took almost 10x longer than the fastest. The label markers in the plot represent encode throughput (MB/s).

    Decompression speed
    http://img508.imageshack.us/img508/589/decompress.png

    Gzip decompressed fastest. A close second tier contained both rzip/lzma and lzma2. Bzip took nearly nearly 5x longer than the fastest and approx. 2x as long as the second tier. Rar's decompression performance of over 6-1/2 min was the longest measured. Again, the label markers in the plot represent throughput (MB/s).

    Data table
    http://img855.imageshack.us/img855/9253/tableii.png

    ReplyDelete
  8. Thanks all. Don't forget if you're really after small with lrzip, try -z and if you're after fast, try -l .

    ReplyDelete
  9. lrztar -l chromFa
    compression: 1521 MB in 5.67 min (2.006x and 10.90 MB/s)
    decompression: 13 sec

    lrztar -l chromFa
    compression: 680.8 MB in 36.25 min (4.483x and 1.40 MB/s)

    decompression: 42.5 min

    ReplyDelete
  10. Hey graysky. Those results you just posted, I assume the second one was actually lrztar -z. Note that sequential compression is also faster and smaller than using lrztar as well
    ie:
    tar cf chromFa.tar chromFa
    lrzip chromFa.tar

    will produce better compression than
    lrztar chromFa

    lrztar is just there for convenience but is not as efficient at compression of really large files.

    ReplyDelete
  11. @ck - yeah, typeo on my part. I actually tried it both ways with the standard settings but didn't see a difference (1-2 MB as I recall).

    ReplyDelete