Saturday, 13 November 2010

lrzip-0.530 and multithreading speed-ups.

As planned I finally manage to thread the compression phase of lrzip to parallelise as much as possible. Originally it used to feed data into a buffer which rzip acted on, and then when it was full it was handed over to the backend compressor, all of which were single threaded except for lzma which didn't even scale beyond 2 CPUs. The implementation in the latest released version of lrzip, 0.530, does it the way I mentioned in my previous blog post.

First it feeds data into the rzip buffer which preprocesses data until it has enough to pass onto a compression thread. Then it hands the data to a compression thread and continues reading more data and lets rzip work on it while the compression thread is doing the 2nd phase compression in the background. Once rzip has enough data for another thread, it spawns another thread and so on, until there are as many threads as CPUs, and then keeps reading until the first thread is free and reuses it, and so on.

Well the results of this are about as good as I could have hoped for. While the faster lzo compression backend only gains a small speedup, the slower the backend, the bigger the speedup. It becomes impressive once zpaq is in use, where I was able to get a 4x speedup on a quad core. That makes lrzip with zpaq almost as fast as regular xz! However, since zpaq takes just as long to decompress as it does to compress, and I haven't threaded the decompression phase, it ends up taking 4x longer to decompress than it did to compress (grin). So zpaq isn't -quite- at the usable stage just yet, but it may well be in the near future.

So what everyone has been waiting for (all 3 of you), benchmarks! 10GB virtual image file being compressed on a quad core 3GHz from an SSD.

Compression Size        Compress  Decompress
None        10737418240
gzip        2772899756   05m47s    2m46s
bzip2       2704781700   16m15s    6m19s
xz          2272322208   50m58s    3m52s
7z          2242897134   26m36s    5m41s
lrzip       1299228155   16m12s    4m32s
lrzip -M    1079682231   12m03s    4m05s
lrzip -l    1754694010   05m30s    3m12s
lrzip -lM   1414958844   05m15s    2m57s
lrzip -zM   1066902006   71m20s    04h08m

Kick arse.

Get it here, and remember freshmeat may not have updated their download links yet:
lrzip

Next stop, to parallelise the decompression phase. I doubt anything but zpaq will really benefit from this, but it would be great to have a zpaq based compression format that is useably fast.

3 comments:

  1. Although I don't see a lot of posts on these developments, I am sure that I am not only one of three enjoying reading about your work on lrzip. Keep it up!

    Galen

    ReplyDelete
  2. Some feedback of the 2nd of 3 ;)
    Tested a 4,5GB vdi Virtualbox Image file on a dual core pure lrzip

    0.5.2: Total time: 00:29:38.969 Ratio: 1.774 (2,5GB)
    0.530: Total time: 00:15:17.109 Ratio: 1.638 (2,7GB)

    So half the time, not so bad ;)
    Got for both some warnings:
    LZMA ERROR: 2. Try a smaller compression window

    Use a 32bit Machine with 3,5 GB RAM.

    But now the negative feedback: With both versions the -M switch leads to a memory access error nearly immediatly. Use linux 2.6.35.8-ck 32bit.

    CU sysitos

    ReplyDelete
  3. Thanks very much for the feedback! Can you please email me the output of the -M failure when it's passed the -vv options? Also if you have the time, the -vv options added to the LZMA error examples would also be helpful. Thanks!

    ReplyDelete