With the 2.6 linux kernel now officially finished, I'm providing a tarball of all 40 of the 3 point kernel releases as an lrzip tarball. This is a convenient way of getting all the releases in a relatively low bandwidth form, and previous archives of this nature have had a few downloads so I figured I'd complete the archive for those who want to download it and use it as an ad for lrzip:
Original file is a tarball of all 3 point linux kernel release tarballs 2.6.0 to 2.6.39 coming to a grand total of 10.3GB
Of course this is a plug for lrzip on something it is particularly good at compressing.
lrzip can be obtained here:
About this archive: It was compressed with lrzip version 0.606 on a quad core 3GHz core 2 on a relatively slow external USB2 hard drive with the following options:
lrzip -UL 9 linux-2.6.0-2.6.39.tar
Total time: 00:56:19.88
It would have compressed a lot faster without the -L 9 option, but given this is the "final" archive of 2.6, I figured I'd push it a bit further. Lrzip can compress it even further with zpaq as an option, but it makes decompression much slower so I'd personally find the archive less useful.
Of course someone will ask how it compares to xz, so for completion:
xz -9 linux-2.6.0-2.6.39.tar
Total time: 2:05:32.218
1535618848 linux-2.6.0-2.6.39.tar.xz 13.8%
171879382 linux-2.6.0-2.6.39.tar.lrz 1.6%
I figured I'd do the rest as well.
Here is a tarball of all 161 stable 3 point releases from linux-1.0 to linux-2.6.39:
221368298 linux-1.0-2.6.39.tar.lrz 1.1%
As compressing this workload is mostly I/O bound I performed it on an SSD
drive this time:
linux-1.0-2.6.39.tar - Compression Ratio: 88.617. Average Compression Speed: 8.026MB/s.
Total time: 00:38:50.76
Wow, that's *very* nice! I wonder how large a (text only) Wikipedia would be!ReplyDelete
Wikipedia, unlike the linux kernel, does not have much in the way of redundant information, and in fact the first 100MB (enwik8) and first 1000MB (enwik9) of wikipedia text is a very common benchmark for compression. Lrzip does very well, but only compared to the "regular" compression algorithms. Compared to dedicated algorithms designed to be used for that purpose specifically, they do better. But then, lrzip is meant as a general purpose compression program on large files, not dedicated for one type of data only.ReplyDelete
Large text compression benchmark
Ok, Thanks !ReplyDelete
Lrzip works with a dictionary as well, right? Would it make sense to use the dictionary from that file as basis for kernel archives? If I understand compression correctly, a large part of the data would be a dictionary. Then would it be possible to make archives based on that dictionary that are smaller to download once you have the "main linux source" dictionary? Or put in another way: How much of those 164 MB is dictionary, and how much each kernel's unique data.
You didn't take the user/system times with /usr/bin/time, right? ;)
D., it looks like you may want to use.. git.ReplyDelete
I'd also wish your lrzip would get a bit more attention.ReplyDelete
It is brilliant.
This comment has been removed by the author.ReplyDelete
Sorry about that. Thought it posted images but I don't know how (or if) we can so I'll just put the URLs to the images below. Inspired by ck's post, I thought I would give this a shot with the 2009 assembly the complete Human Genome (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/) rather than of the 2.6 series of source for shits and giggles.ReplyDelete
I compared gzip, lzma2, rzip/lzma, bzip, and rar to compress and decompress the datasets which totaled ~3 GB on an Intel Xeon 3360 clocked @ 3.40 GHz (quad core) with 8 GB of RAM.
The top tier in terms of file size were rar, rzip/lzma, and lzma2. Bzip2 and gzip gave the largest archives. Here gzip produced an archive that was over 112 MB larger than the next closest competitor.
In terms of compression speed, bzip2 was the quickest. The second tier was occupied by rar and gzip. A distant forth was rzip/lzma very distant fifth was lzma2 which took almost 10x longer than the fastest. The label markers in the plot represent encode throughput (MB/s).
Gzip decompressed fastest. A close second tier contained both rzip/lzma and lzma2. Bzip took nearly nearly 5x longer than the fastest and approx. 2x as long as the second tier. Rar's decompression performance of over 6-1/2 min was the longest measured. Again, the label markers in the plot represent throughput (MB/s).
Thanks all. Don't forget if you're really after small with lrzip, try -z and if you're after fast, try -l .ReplyDelete
lrztar -l chromFaReplyDelete
compression: 1521 MB in 5.67 min (2.006x and 10.90 MB/s)
decompression: 13 sec
lrztar -l chromFa
compression: 680.8 MB in 36.25 min (4.483x and 1.40 MB/s)
decompression: 42.5 min
Hey graysky. Those results you just posted, I assume the second one was actually lrztar -z. Note that sequential compression is also faster and smaller than using lrztar as wellReplyDelete
tar cf chromFa.tar chromFa
will produce better compression than
lrztar is just there for convenience but is not as efficient at compression of really large files.
@ck - yeah, typeo on my part. I actually tried it both ways with the standard settings but didn't see a difference (1-2 MB as I recall).ReplyDelete