A few bug reports have been trickling in since I last released a version of lrzip, which as I said last time is good and bad. It's helped direct my workflow on this, even though I keep trying to let it settle down. I'd love to leave the code since it mostly works quite well, and there is a common mantra that "release early, release often" is considered good, but in practice that doesn't seem to work. If you release stuff while it's still broken, it will get a bad reputation and people will dismiss it that time around and often not give it a chance again. So if there are show stopper bugs, they need attending to ASAP.
So anyway, the fact is that the fun part is implementing stuff rather than just fixing bugs, so for the next release I worked on doing both since I had to fix the bugs I knew about. I've mentioned before that lrzip breaks data into rzip streams and compresses each stream a block at a time. I had implemented multithreading by splitting that block up according to number of CPUs, but at the end of the stream, it waited to get all the blocks back before moving onto the next stream. For this release I moved the threading up earlier in the overall process, thus allowing the rzip stage to move onto the next stream while the back end threads are still busy compressing data from the previous stream. This means that if a file is large enough (greater than approximately 2/3 ram), once all CPUs are in use with back end threads, they should remain in use until the end of the file is reached. This speeds up compression on large files on SMP machines.
While looking for bugs to do with lzma compression failing due to running out of ram, I discovered why changing from level 7 compression to 9 compression did nothing... It's been a long time since the lzma library was imported into lrzip, and I had completely forgotten that it only goes up to level 7! The default lzma compression level is meant to be 5, so I've scaled it down now, meaning lrzip will now use the best all round compromise level for lzma. This also was affecting the window sizes internally used by the lzma library. At level 5, the windows are 16MB in size, and go up to 64MB for level 7. As one kind person noted in the comments on this blog, lzma uses up to 12x the memory of the compression window, so each thread would have been needing up to 800MB each. This explained why people were getting lzma failing on window sizes and complaining despite apparently enough ram. While the new level does lose slightly on compression, it does also improve compression speed significantly, and should pretty much eliminate lzma compression failing due to "too big window sizes".
With the two speed improvements, here are the sort of speed ups one can see with version 0.550 of lrzip:
10GB virtual image on quad core with 8GB ram:
lrzip 15m45s -> 11m03s
lrzip -M 12m03s -> 9m30s
And just as a reminder, xz completed this in 50m58s and the file size was almost halved by lrzip compared to xz.
Benchmarks can be found here:
On version 0.544 of lrzip, I had changed multithreading to use two separate groups of threads for stream 0 and stream 1 to make decompression more robust for out-of-order storage in the archive. Well unfortunately this doubled the possible number of threads, and this brought a now all-too-familiar problem: running out of ram on big files. In .550 I changed it to only use one thread for stream 0 rather than multiple ones since it's usually only one block of stream 0 before multiple blocks of stream 1. This can slow down decompression a little more, but reliability is more important.
I put some effort into tackling the dreaded running out of ram problem with multiple threads all competing for ram. What lrzip does now is detect when the compression back end has failed (which is usually for memory reasons) and instead of failing completely, it will then wait for the previous thread to complete its compression before trying again. This serialises work that would have otherwise been in parallel. Thus less ram is required since only one thread is likely to be using ram. I did this on both compression and decompression. Also, I made the lzma compression back end first try a lower compression level (since this uses a smaller window) before failing, and then finally resorting to bzip if it never succeeds. This should make running out of memory a non-fatal problem since it will just try again using less and being slower.
On the subject of constraining memory usage, I finally bit the bullet and dropped the maximum mmap down to only 1/3 of ram since I can use my sliding mmap to do the remaining chunk size. It is a tiny bit slower, but far more reliable, and leaves a lot more ram for the compression back ends. This makes my sliding mmap implementation, which was just an experiment originally, used routinely for any file larger than 1/3 of ram size.
My "spreading out of thread spawning evenly" idea in the last release was a failure. I reverted it since it provided no speed ups and occasionally slowed things down.
I fixed a few other minor errors to do with not callocing enough ram in some structs which looked really scary but I have no idea if they were causing problems. Hopefully they fixed some random problem somewhere :P
So I was very proud of this release and made it go gold as version 0.550...
About 5 minutes after releasing it, I found a couple of new bugs that were pretty nasty. When lzma worked on an incompressible block, it now failed instead of just leaving that block uncompressed. That was an easy fix. I also found that I had broken compression from STDIN a while back and hadn't noticed, and the compression summary at the end was reading 0. So I quickly corrected those and bumped the version up to 0.551.
So get it here:
No doubt more bugs will show up and I'll have to tend to those as well, but this should be a more robust version since a lot of failure cases in 0.544 have been addressed.
Other ideas I'm considering for lrzip at this stage based on requests include a good error testing and recovery mechanism with either reed-solomon or low density parity checking, and encrypted password protection. I have a few libraries I've looked at for both of these but am open to suggestions if people have recommendations.
EDIT: People didn't understand how the chunks and streams work, so I've copied this comment I made below into the blogpost.
Lrzip splits files up multiple times. Imagine a file is twice the size of ram. It will first be split into chunks:
A B C
where each chunk is 1/3 the size of the file.
Then the rzip stage is performed on chunk A, splitting it into streams:
where A0 is the "dictionary" and A2 the "matches"
Then each stream is split up into further chunks for the compression backend, chunks
A1A A1B A1C A1D A1E
and so on, and each chunk is passed onto the compressor (such as lzma) and the results written to disk.
All this used to be done in series.
Version 0.530 made the last chopping up part be done in parallel since the slow compression back end (lzma or zpaq) was the time consuming part of compression. However when once streams A0 and A1 would have finished their rzip stage, lrzip would wait for A0A A0B A1A A1B etc. to finish compressing completely before lrzip would move onto chunk B.
Version 0.551 now allows lrzip to move onto chunk B while A0A A0B A1A A1B etc are all still compressing.
This means that on faster compression (lzo, gzip and bzip2), the rate limiting step is the stream production of A0 and A1. The pipe dream is to parallelise those, but they rely on using as much ram as possible to make the rzip stage effective so it would compromise compression greatly (and be messy as hell to implement).