Wednesday 17 November 2010

lrzip-0.540 and multithreaded decompression.

Unable to put it down, I tackled the last big thing I had planned for lrzip in the short term future: Multithreaded decompression. In a similar fashion (but in reverse obviously) to how I tackled it on compression, I modified it to take the file and spit out chunks to as many threads as there are CPU, and let them all start decompressing each chunk independently. Then as soon as the first thread is complete, it hand its buffer over to the rzip stage and makes itself available to take on more chunks and so on.

This technique needs to have a file that was compressed with enough chunks in the first place, but the absolute minimum a file will already be is two chunks due to the 2 streams that rzip uses every time, but will obviously scale best with files already compressed with the multithreaded version.

So how does it perform? Well much like the compression side, the slower the backend compressor in use, the better the speedup with more CPUs.

Going to my regular benchmark, here are the before and after effects on decompression on quad core:

Options   Singlethread  Multithread
lrzip      4m32s         3m07s
lrzip -M   4m05s         2m50s
lrzip -l   3m12s         2m23s
lrzip -lM  2m57s         2m20s
lrzip -zM  04h08m        72m0s

Note that the -z compress/decompress had a slightly braindamaged screen output in v0.530, and in this benchmark above, so it actually didn't perform as well as it would in v0.540 since that's been fixed. Clearly, though, it's massively faster.

Summary: Kick arse.

So I'm very pleased with the performance of lrzip on SMP now. With the help of a friend online who had access to a 48 CPU machine (no he wasn't running BFS :\), we tried various files to see how much we could get lrzip to scale. To actually use all the CPUs, we needed a file large enough that had enough work to distribute to most of the CPUs, and then keep them busy for long enough to show the effect of that scaling.

lrzip -z linux-2.6.36.tar
real 1m6.552s
user 17m16.660s

So this ran about 17x faster than it would have run single threaded, but still wasn't large enough.

The kde source fit that reasonably well, at 1967093760 bytes.

lrzip -z kde.tar
real 4m33.285s
user 92m26.650s

This one ran over 20x faster than it would have run single threaded. I don't know what the upper limit will be, but clearly this has been a massive improvement in performance, and brings zpaq into usable speeds with enough cores.

Get it here (too lazy to post on freshmeat right now):

No comments:

Post a Comment