Thursday 7 April 2011

Quick lrzip comparison

So I was building a kernel package for myself and when I was done saw a large directory and thought, what the heck, I'll compress it with lrzip for grins.

bzip2 size: 1831647009 time: 12m50s (command "tar cjf")
7z    size:  945054166 time: 36m23s (command "7za a")
lrzip size:  586087630 time: 17m09s (command "lrztar")

These were done on a quad core 3GHz with 8GB ram using nothing but default options. Note that lrzip was done using the lrztar wrapper to make the comparison fair since it's just a single command without temporary files, so it took 3 passes to compress. If I compressed it with temporary files it would be smaller again (i.e. tar cf directory && lrzip)

Of course there is parallel bzip2 which speeds up bzip2's compression but has no effect on compression. And then there is xz which slows down 7z's compression and has no effect on compression. So why is xz becoming the new defacto standard? Probably because it's not aimed squarely at the windows market the way 7z is and feels more politically correct to the linux crowd. Maybe it's because it's called lzma2 that it must be better than lzma since it has a higher version number. I've never seen any performance or compression advantage of any significance with lzma2 versus lzma. Personally I'm relatively unimpressed with xz so I don't really understand it. Even less understandable is why the kernel has both lzma AND xz support in it now.

I wish I could get people more excited about lrzip :)

6 comments:

  1. Looks promising! Any chance this will be maintained in debian?

    ReplyDelete
  2. lzma2 does not offer significant improvements over lzma. Mostly, the multi-threading is simplified. There is also a theoretical small gain for uncompressible data. And... i think that's it.
    I'm not even sure if it can benefit from 64-bit memory for larger buffers.

    ReplyDelete
  3. Debian unstable has a package, but an old 0.551 version if I recall. Jari Alto is maintaining it.

    ReplyDelete
  4. I didn't know about it, it's quite nice. It doesn't make sense to have huge amount of RAM and not use it for an advantage like this. Maybe message just needs to get out to gain support.

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete
  6. I replaced the largest ( 10 - 50GB ) bzip2 archives with lrzip - on a multicpu || multicore system with more than 4 or 8GB system it runs fast, with a much better compression ratio ( circa 20% gain over bzip ). started testing in 2011, in 2012 lrzip is my primary compression utility. btw - I agree, xz is a "misunderstanding". VERY GOOD JOB CON!

    ReplyDelete