Yep, there are random fixes in this one.
Oh wait, you probably want more detail than that.
Lrzip splits the data into two streams during its rzip stage, one with the "dictionary" parts it ends up finding multiple copies of, and the rest. It turns out that during the multithreaded decompression modification I made the naive assumption that there would be only one chunk of stream 0 data per compressed window. Unfortunately I can't tell what the data is until I've already read it, and throwing lots of chunks into threads to decompress means I have a whole lot of data potentially in a different order to how the rzip re-expansion stage will expect it. I was loathe to modifying the lrzip archive format yet again so soon after changing it, so I used the help of the only person I could find reproducing a decompression problem to try and find how to predict what stream the data came from and then modified the decompression stage accordingly. I'm still not happy with it because it still feels fragile, but I need more time to investigate to ensure I'm doing the right thing. This is unlikely to hit anyone trying to decompress anything less than many many gigabytes in size (the archive is still fine, and the previous non-multithreaded one should decompress it ok).
Much more importantly for the common case usage, I limited the lrzip compression window on 0.542 to only 300MB because it seemed to break on larger windows on 32 bit. Unfortunately, I did it -before- chopping it up into smaller chunks for multithreading, and for all architectures, not just 32 bit. So I fixed that which means the default compression window should be significantly larger on 0.543.
Finally it was clear that the rate limiting step on multithreaded workloads was the rzip stage on extra large files, and since the compression/decompression can now begin as a separate thread to the rzip component, I made the rzip process run at a lower 'nice' level than the back end threads, which afforded a small speed up too.
I still have more planned for lrzip but was hoping to take a small break for a while to do other things now that it's more or less stable. In the meantime, apparently when it's run on a system with uclibc instead of glibc, sysconf() fails to report ram properly which needs fixing. That should be easy enough to workaround, but is a little disappointing from uclibc. The other planned changes involve committing a more robust fix for the multiple stream 0 archives, and more multithreading improvements. Currently one chunk at a time is either compressed or decompressed at the same time as the rzip preprocessing/expansion stage and then all the threads need to complete before moving onto the next chunk. The next logical change is to move the threading even higher up the chain to be able to process multiple chunks concurrently, keeping all CPUs busy at all times. This last change is unlikely to make a huge difference on default settings, but should speed up zpaq based de/compression even more when a file is large enough (bigger than available ram).
Oh and working on OSX still needs fixing (see previous post about named semaphores).
No comments:
Post a Comment