Once again I find myself writing a post saying there will be delays with the resync of BFS and -ck for the new linux kernel. This time the reason for most people would be a quite unexpected development. As you may have read on this blog last year, I got invited to interview with Google for a job as a software engineer and then in the end I got turned down due to lack of adequate breadth of knowledge. This was probably for the best for me anyway since I have a full time unrelated career and the jump would have been too great. Anyway a small company noticed the work I had done on cgminer with bitcoin and openCL work and asked if I was interested in writing some software for them. The work involves writing openCL frameworks so they can provide distributed computing capability to clients. They were quire happy to forego any of the regular interview details or pretty much anything that is normally involved in employing someone and before long we started talking contracts instead. Since the work itself actually looked like a lot of fun, I decided to go with the opportunity.
Anyway, long story short, I'm doing a little bit of contract work for them and my kernel work will take a slightly lower priority in the meantime. I'm not abandoning it, but it will be delayed some more before the next release. Apologies for any inconvenience this may cause in the interim.
A development blog of what Con Kolivas is doing with code at the moment with the emphasis on linux kernel, MuQSS, BFS and -ck.
Tuesday, 31 July 2012
Friday, 13 July 2012
lrzip-0.614
This release is a quick hotfix for broken lrztar in lrzip 0.613.
https://freecode.com/projects/long-range-zip
https://freecode.com/projects/long-range-zip
Sunday, 8 July 2012
lrzip-0.613
lrzip 0.612 has been out in the wild for a while now and the good news is that there have been very few bug reports in that time. After allowing enough accumulated issues collect in my inbox, I've created a pure-bugfix maintenance release in version 0.613:
long-range-zip 0.613
One bug of note was that the md5 calculation on files that had compressed blocks greater than 4GB in size was wrong. This was very suspicious for a 32 bit overflow error. Indeed Serge Belyshev did some excellent detective work and found the culprit to be in the glibc implementation of md5, which is used by lrzip. This only affects using the md5 library components, not the md5sum command line utility which uses a different rolling algorithm so glibc userspace never hit it. The bug in question was amusing in the way it shows one of the many naive ways we dealt with 32 bit limitations in the past. It assumed anything larger than a 32bit chunk was just 2^31 + (chunk size modulo 2^31). That means it would never work with a chunk larger than 2^32. The fix has been pushed upstream and is now incorporated into lrzip.
Another bug, as reported on this blog by a commenter, was that of creating corrupt very small archives (less than 64 bytes). This has been fixed by disabling the back end compression when the chunk is less than 64 bytes and just using the rzip first stage.
A lot of the other work in this release was just getting it to compile on osx. Numerous issues showed up as always, and I didn't have access to an osx machine on the previous release to fix it. This time I used my wife's laptop ;) . One of the issues, for example, was that osx didn't see itself as #ifdef unix, which I thought was a little amusing. Another unexpected surprise was that the default osx filesystem is not case sensitive which caused a conflict lrzip.h vs Lrzip.h. Alas I have no other BSDs to try compiling it on so I'm not sure if they're fixed with this.
Interestingly, I still have to disable md5 calculation on the osx build. The md5 is calculated the same on compression and decompression within lrzip, but it disagrees with the result returned from the ports version of md5! This defeats the whole purpose of including md5 in it since the point of it is to have a command line result to compare to. I'm guessing there's an endianness dispute there somewhere and haven't ever tracked it down, since osx has done an endian flip in the past. lrzip still uses crc32 checking of each block internally so it's not like there isn't any integrity checking.
Finally what would a release be without some new benchmarks? Nothing performance-wise has changed in lrzip since the last version, but I have access to a 12 thread CPU machine with 32GB of ram now, so I did some quick benchmarks with the classic 10GB virtual image I've been using till now.
long-range-zip 0.613
One bug of note was that the md5 calculation on files that had compressed blocks greater than 4GB in size was wrong. This was very suspicious for a 32 bit overflow error. Indeed Serge Belyshev did some excellent detective work and found the culprit to be in the glibc implementation of md5, which is used by lrzip. This only affects using the md5 library components, not the md5sum command line utility which uses a different rolling algorithm so glibc userspace never hit it. The bug in question was amusing in the way it shows one of the many naive ways we dealt with 32 bit limitations in the past. It assumed anything larger than a 32bit chunk was just 2^31 + (chunk size modulo 2^31). That means it would never work with a chunk larger than 2^32. The fix has been pushed upstream and is now incorporated into lrzip.
Another bug, as reported on this blog by a commenter, was that of creating corrupt very small archives (less than 64 bytes). This has been fixed by disabling the back end compression when the chunk is less than 64 bytes and just using the rzip first stage.
A lot of the other work in this release was just getting it to compile on osx. Numerous issues showed up as always, and I didn't have access to an osx machine on the previous release to fix it. This time I used my wife's laptop ;) . One of the issues, for example, was that osx didn't see itself as #ifdef unix, which I thought was a little amusing. Another unexpected surprise was that the default osx filesystem is not case sensitive which caused a conflict lrzip.h vs Lrzip.h. Alas I have no other BSDs to try compiling it on so I'm not sure if they're fixed with this.
Interestingly, I still have to disable md5 calculation on the osx build. The md5 is calculated the same on compression and decompression within lrzip, but it disagrees with the result returned from the ports version of md5! This defeats the whole purpose of including md5 in it since the point of it is to have a command line result to compare to. I'm guessing there's an endianness dispute there somewhere and haven't ever tracked it down, since osx has done an endian flip in the past. lrzip still uses crc32 checking of each block internally so it's not like there isn't any integrity checking.
Finally what would a release be without some new benchmarks? Nothing performance-wise has changed in lrzip since the last version, but I have access to a 12 thread CPU machine with 32GB of ram now, so I did some quick benchmarks with the classic 10GB virtual image I've been using till now.
Compression Size Percentage Compress Time Decompress Time None 10737418240 100.0 gzip 2772899756 25.8 3m56s 2m15s pbzip2 2705814394 25.2 1m41s 1m46s lrzip 1095337763 10.2 2m54s 2m21sNote that with enough ram and CPU, lrzip is actually faster than gzip (which does compression in place) and comparable on decompression, despite a huge increase in compression. pbzip2 is faster than both but its compression is almost no better than gzip.
Tuesday, 3 July 2012
BFS 424, linux-3.4-ck3
As seen on this blog previously, a bug showed up in 3.4-ck2/BFS 423 to do with unplugged I/O management that would lead to severe stalls/hangs. I'm releasing BFS 424 officially and upgrading 3.4-ck2 to 3.4-ck3, incorporating just this one change.
BFS 424:
3.4-sched-bfs-424.patch
3.4-ck3:
3.4-ck3/
Others on -ck2 can simply apply the incremental patch to be up to date.
3.4bfs423-424.patch
Enjoy!
お楽しみください
BFS 424:
3.4-sched-bfs-424.patch
3.4-ck3:
3.4-ck3/
Others on -ck2 can simply apply the incremental patch to be up to date.
3.4bfs423-424.patch
Enjoy!
お楽しみください
Subscribe to:
Posts (Atom)