One bug of note was that the md5 calculation on files that had compressed blocks greater than 4GB in size was wrong. This was very suspicious for a 32 bit overflow error. Indeed Serge Belyshev did some excellent detective work and found the culprit to be in the glibc implementation of md5, which is used by lrzip. This only affects using the md5 library components, not the md5sum command line utility which uses a different rolling algorithm so glibc userspace never hit it. The bug in question was amusing in the way it shows one of the many naive ways we dealt with 32 bit limitations in the past. It assumed anything larger than a 32bit chunk was just 2^31 + (chunk size modulo 2^31). That means it would never work with a chunk larger than 2^32. The fix has been pushed upstream and is now incorporated into lrzip.
Another bug, as reported on this blog by a commenter, was that of creating corrupt very small archives (less than 64 bytes). This has been fixed by disabling the back end compression when the chunk is less than 64 bytes and just using the rzip first stage.
A lot of the other work in this release was just getting it to compile on osx. Numerous issues showed up as always, and I didn't have access to an osx machine on the previous release to fix it. This time I used my wife's laptop ;) . One of the issues, for example, was that osx didn't see itself as #ifdef unix, which I thought was a little amusing. Another unexpected surprise was that the default osx filesystem is not case sensitive which caused a conflict lrzip.h vs Lrzip.h. Alas I have no other BSDs to try compiling it on so I'm not sure if they're fixed with this.
Interestingly, I still have to disable md5 calculation on the osx build. The md5 is calculated the same on compression and decompression within lrzip, but it disagrees with the result returned from the ports version of md5! This defeats the whole purpose of including md5 in it since the point of it is to have a command line result to compare to. I'm guessing there's an endianness dispute there somewhere and haven't ever tracked it down, since osx has done an endian flip in the past. lrzip still uses crc32 checking of each block internally so it's not like there isn't any integrity checking.
Finally what would a release be without some new benchmarks? Nothing performance-wise has changed in lrzip since the last version, but I have access to a 12 thread CPU machine with 32GB of ram now, so I did some quick benchmarks with the classic 10GB virtual image I've been using till now.
Compression Size Percentage Compress Time Decompress Time None 10737418240 100.0 gzip 2772899756 25.8 3m56s 2m15s pbzip2 2705814394 25.2 1m41s 1m46s lrzip 1095337763 10.2 2m54s 2m21sNote that with enough ram and CPU, lrzip is actually faster than gzip (which does compression in place) and comparable on decompression, despite a huge increase in compression. pbzip2 is faster than both but its compression is almost no better than gzip.
Hmm, lrztar in version 0.613 gives me an error:ReplyDelete
lrztar -Uz test_directory
Cannot have -o and -O or -S together
Fatal error - exiting
altough lrztar from 0.612 worked perfectly.
Darn, did I break it? I guess I should rush a .614 release when I can. Thanks for bug report.ReplyDelete
No problem. Oh, and good job with the BFS ans BFQ. I just can't imagine living without them.ReplyDelete