-ck hacking: bug

Showing posts with label bug. Show all posts

Tuesday, 20 March 2012

BFS issues and linux 3.3

Linux 3.3 is out, but I'm not releasing BFS for it at this stage. The reason is that a regression has been reported as showing up in BFS and it's proving hard to track down.

The issue involves a slowdown under load when the load is niced. The problem is the slowdown does not occur all the time, and I've only seen it in the wild once and have not been able to repeat it since. I've audited the code and have not yet found the culprit, but when it does happen, it is very obvious with mouse stalling for seconds. The 'top' output is usually a give away that something has gone wrong because the 'PR' column should normally report values of 1-41 for a nice 19 load. However when it happens, it will show values much higher, in the 42-81 range which should not happen. Unfortunately, the best hint for me would be to find what version of BFS it was introduced, and look for the change responsible, and since I can't even reproduce the problem most of the time, I can't do this regression testing.

So I'm appealing to the BFS users out there to see if anyone has this problem more regularly that has the time to try older versions of BFS. By older versions of BFS, I don't mean the same version of BFS on older kernels, but to try the first version of BFS that was available for that kernel. Running something continuously as a 'nice'd load is required to reproduce it, where the load is equal to the number of CPUs, so for example 'nice -19 make -j4' continuously in a kernel tree on a quad core machine. I'm hoping that someone out there is able to reproduce it and can do the regression testing. Thanks in advance. In the meantime, I'll keep auditing code and comparing new to old versions in the hope something stands out.

EDIT: An alternative approach was to try moving to 3.3 and make minor fixes along the way to see if the problem persists. Consider this patch a pre-release for now (CPU accounting appears all screwy still):

EDIT2: Fixed CPU accounting, bumped version to 418:
3.3-sched-bfs-418.patch

Friday, 3 June 2011

2.6.39 BFS test 9 - is this the one?

Hopefully this test patch should fix all the problems with BFS 404 on 2.6.39:

bfs404-test9.patch
Ubuntu Packages : grab the 2.6.39-ck1-3 package

Please report back if you haven't already! Thanks to everyone who has tested so far! Your feedback has been absolutely essential on this weird and wonderful bug.

Monday, 30 May 2011

2.6.39 BFS progress

TL;DR: 2.6.39 BFS fixed maybe?

After walking away from the code for a while, annoyed at the bug I couldn't track down, I had another good look at what might be happening. It appears that while the grq lock is dropped in schedule() to perform the block plug flush, a call to the task via try_to_wake_up may be missed entirely, leaving the task deactivated when it should actually keep running. Anyway, first tests from the people on these blog comments are reassuring.

Here is a cleaned up and slightly modified version of the "test8" patch that has so far been stable and shows to have fixed the problem for a handful of people:

Apply to 2.6.39-ck1 or 2.6.39 with BFS 404:
bfs404-recheck_unplugged.patch

In response to requests for packaged versions, I've uploaded a 2.6.39-ck1-2 ubuntu package which includes this change:
Ubuntu Packages

Please test and report back! If this fixes the problem, I'll be releasing it as ck2.

Thursday, 21 April 2011

BFS 0.401

I was meant to be on holidays this week, and indeed I've been away from home somewhere warm. While BFS was supposed to be the last thing I cared about, I was fortunate enough to have other people actually find some bugfixes to BFS. First up was _sid_ who found some very small optimisations that I've committed to the new version of BFS. But even more impressively, Serge Belyshev found a long standing bug that would cause bad latencies when Hz values were low, due to the "last_ran" variable not being set. This may well have been causing a significant latency disadvantage to BFS when Hz was 100.

As you can see in this graph, worst case latencies could be 100 times better with this bug fixed. While it will affect all Hz values, it is most significant at low Hz and probably unnoticeable by the time you're on 1000Hz. Those who are on low Hz configurations, especially those on say android, will notice a dramatic speedup moving to BFS 401.

So get it here (available for 2.6.38.3, 2.6.35.12 and 2.6.32.38):
BFS PATCHES

Again, thanks VERY much to the testers and even more to those contributing bugfixes and code.

Friday, 25 February 2011

lrzip-0.570 for the uncorrupt.

When I last blogged about lrzip, I mentioned the corruption on decompression issue a user was seeing in the field. This bug, not surprisingly, worried me greatly so I set out on a major hunt to eliminate it, and make lrzip more reliable on decompression. After extensive investigation, and testing on the part of the user, to cut a long story short, the corruption was NEVER THERE.

The problem he was encountering was on decompressing a 20GB logfile, he would compare it to the original file with the 'cmp' command. On decompressing the file and comparing it, there would be differences in the file at random places. This made me think there was a memory corruption somewhere in lrzip. However he also noted that the problem went away on his desktop machine when he upgraded from Debian Lenny to Squeeze. So we knew something fishy was going on. Finally it occurred to me to suggest he try simply copying the 20GB logfile and then running 'cmp' on it. Lo and behold just copying a file of that size would randomly produce a file that had differences in it. This is a disturbing bug, and had it been confined to one machine, would have pointed the finger at the hardware. However he had reproduced it on the desktop PC as well, and the problem went away after upgrading his distribution. This pointed to a corruption problem somewhere in the layers between write() and what ends up on the disk. Anyway this particular problem is now something that needs to be tackled elsewhere (i.e. Debian).

Nonetheless, the corruption issue got me thinking about how I could make lrzip more reliable on decompression when it is mandatory that what is on disk is the same as what was originally compressed. Till now, lrzip has silently internally used crc32 to check the integrity of each decompressed block before writing it to disk. crc32 still has its place and is very simple, but it has quite a few collisions once you have files in the gigabyte size (collisions being files with the same CRC value despite being different). Fortunately, even with a hash check as simple as CRC, if only one byte changes in a file, the value will never be the same. However the crc was only being done on each decompressed chunk and not the whole file. So I set out to change over to MD5. After importing the MD5 code from coreutils and modifying it to suit lrzip, I added an md5 check during the compression phase, and put the MD5 value in the archive itself. For compatibility, the CRC check is still done and stored, so that the file format is still compatible with all previous 0.5 versions of lrzip. I hate breaking compatibility when it's not needed. On decompression, lrzip will now detect what is the most powerful hash function in the archive and use that to check the integrity of the data. One major advantage of md5 is that you can also use md5sum which is standard on all modern linux installations to compare the value to that stored in the archive on either compression or decompression. I took this idea one step further, and added an option to lrzip (-c) to actually do an md5 on the file that has been written to disk on decompression. This is to ensure that what is written on disk is what was actually extracted! The Debian lenny bug was what made me think this would be a useful feature. I've also added the ability to display the md5 hash value with a new -H option, even if the archive was not originally stored with an md5 value.

One broken "feature" for a while now has been multi-threading on OS-X. I have blogged previously about how OS-X will happily compile software that uses unnamed semaphores, yet when you try to run the program, it will say "feature unimplemented". After looking for some time at named semaphores, which are clunky in the extreme by comparison, it dawned on me I didn't need semaphores at all and could do with pthread_mutexes which are supported pretty much everywhere. So I converted the locking primitive to use mutexes instead, and now multi-threading on OS-X works nicely. I've had one user report it scales very well on his 8-way machine.

Over the last few revisions of lrzip, apart from the multi-threaded changes which have sped it up, numerous changes to improve the reliability of compression/decompression (to prevent it from running out of memory or corrupting data) unfortunately also have slowed it down somewhat. Being a CPU scheduler nut myself, I wasn't satisfied with this situation so I set out to speed it up. A few new changes have made their way into version 0.570 which do precisely that. The new hash check of both md5 and crc, which would have slowed it down now with an extra check, are done now only on already buffered parts of the main file. On a file that's larger than your available ram, this gives a major speed up. Multi-threading now spawns one extra thread as well, to take into account that the initial start up of threads is partially serialised, which means we need more threads available than CPUs. One long term battle with lrzip, which is never resolved, is how much ram to make available for each stage of the rzip pre-processing and then each thread for compression. After looking into the internals of the memory hungry lzma and zpaq, I was able to more accurately account for how much ram each thread would use, and push the amount of ram available per compression thread. The larger the blocks sent to the compression back end, the smaller the resulting file, and the greater the multi-threading speed up, provided there's enough data to keep all threads busy. Anyway the final upshot is that although more threads are in use now (which would decrease compression), compression has been kept approximately the same, but is actually faster.

Here's the latest results from my standard 10GB virtual image compression test:

Compression  Size         Percentage  Compress Time  Decompress Time
None         10737418240  100.0
gzip         2772899756    25.8          5m47s        2m46s
bzip2        2704781700    25.2         16m15s        6m19s
xz           2272322208    21.2         50m58s        3m52s
7z           2242897134    20.9         26m36s        5m41s
lrzip        1372218189    12.8         10m23s        2m53s
lrzip -U     1095735108    10.2          8m44s        2m45s
lrzip -l     1831894161    17.1          4m53s        2m37s
lrzip -lU    1414959433    13.2          4m48s        2m38s
lrzip -zU    1067075961     9.9         69m36s       69m35s

Lots of other internal changes have gone into it that are too numerous to go into depth here (see the Changelog for the short summary), but some user visible changes have been incorporated. Gone is the annoying bug where it would sit there waiting for stdin input if it was called without any arguments. The help information and manual page have been dramatically cleaned up. The -M option has been abolished in favour of just the -U option being used. The -T option no longer takes an argument and is just on/off. A -k option has been added to "keep corrupt/broken files" while corrupt/broken files generated on compression/decompression are automatically deleted by default. The -i information option now gives more information, and has verbose(+) mode to give a breakdown of the lrzip archive, like the following -vvi example:

Detected lrzip version 0.5 file.
../temp/enwik8.lrz:
lrzip version: 0.5 file
Compression: rzip + lzma
Decompressed file size: 100000000
Compressed file size: 26642293
Compression ratio: 3.753
MD5 used for integrity testing
MD5: a1fa5ffddb56f4953e226637dabbb36a
Rzip chunk 1:
Stream: 0
Offset: 25
Block   Comp    Percent Size
1       lzma    58.1%   867413 / 1493985        Offset: 22687516        Head: 0
Stream: 1
Offset: 25
Block   Comp    Percent Size
1       lzma    28.8%   5756191 / 20000000      Offset: 75      Head: 5756266
2       lzma    28.4%   5681891 / 20000000      Offset: 5756291 Head: 11438182
3       lzma    28.2%   5630256 / 20000000      Offset: 11438207        Head: 17068463
4       lzma    28.1%   5619003 / 20000000      Offset: 17068488        Head: 23554929
5       lzma    28.5%   3087298 / 10841364      Offset: 23554954        Head: 0
Rzip compression: 92.3% 92335349 / 100000000
Back end compression: 28.9% 26642052 / 92335349
Overall compression: 26.6% 26642052 / 100000000

I didn't bother blogging about version 0.560 because all the while 0.570 was under heavy development as well and I figured I'd wrap it all up as a nice big update instead. I'm also very pleased that Peter Hyman, who helped code for lrzip some time ago, has once again started contributing code.

That's probably enough babbling. You can get it here once freshmeat updates its links:
lrzip

Monday, 25 October 2010

Minor BFS 357 Bug(?) on 2.6.36

I received a bug report from someone running BFS on 2.6.36. They hit this BUG_ON in the new code present only in 2.6.36:

+static void try_to_wake_up_local(struct task_struct *p)
+{
+ struct rq *rq = task_rq(p);
+ bool success = false;
+
+ BUG_ON(rq != this_rq());

This looks fairly straight forward and this code path is only used by the new worker code in 2.6.36. However it shouldn't be hit unless something else is calling this function (indirectly via schedule()) wrongly. Anyway they hit it it seems via the iwlwifi code. No idea how, but it's actually harmless to wake up a task on another runqueue in BFS, so simply removing this BUG_ON fixes it.

Here's a patch to apply to BFS if you're running it on 2.6.36 and run into this bug:
bfs357-worker_fix.patch which just removes the BUG_ON. However it makes me wonder if this bug is in mainline and only those who hit this bug can confirm or otherwise by running 2.6.36 vanilla and there's no point reporting it when it's so vague.