This technique needs to have a file that was compressed with enough chunks in the first place, but the absolute minimum a file will already be is two chunks due to the 2 streams that rzip uses every time, but will obviously scale best with files already compressed with the multithreaded version.
So how does it perform? Well much like the compression side, the slower the backend compressor in use, the better the speedup with more CPUs.
Going to my regular benchmark, here are the before and after effects on decompression on quad core:
Options Singlethread Multithread lrzip 4m32s 3m07s lrzip -M 4m05s 2m50s lrzip -l 3m12s 2m23s lrzip -lM 2m57s 2m20s lrzip -zM 04h08m 72m0s
Note that the -z compress/decompress had a slightly braindamaged screen output in v0.530, and in this benchmark above, so it actually didn't perform as well as it would in v0.540 since that's been fixed. Clearly, though, it's massively faster.
Summary: Kick arse.
So I'm very pleased with the performance of lrzip on SMP now. With the help of a friend online who had access to a 48 CPU machine (no he wasn't running BFS :\), we tried various files to see how much we could get lrzip to scale. To actually use all the CPUs, we needed a file large enough that had enough work to distribute to most of the CPUs, and then keep them busy for long enough to show the effect of that scaling.
lrzip -z linux-2.6.36.tar
So this ran about 17x faster than it would have run single threaded, but still wasn't large enough.
The kde source fit that reasonably well, at 1967093760 bytes.
lrzip -z kde.tar
This one ran over 20x faster than it would have run single threaded. I don't know what the upper limit will be, but clearly this has been a massive improvement in performance, and brings zpaq into usable speeds with enough cores.
Get it here (too lazy to post on freshmeat right now): lrzip.kolivas.org