Tuesday 20 March 2012

BFS issues and linux 3.3

Linux 3.3 is out, but I'm not releasing BFS for it at this stage. The reason is that a regression has been reported as showing up in BFS and it's proving hard to track down.

The issue involves a slowdown under load when the load is niced. The problem is the slowdown does not occur all the time, and I've only seen it in the wild once and have not been able to repeat it since. I've audited the code and have not yet found the culprit, but when it does happen, it is very obvious with mouse stalling for seconds. The 'top' output is usually a give away that something has gone wrong because the 'PR' column should normally report values of 1-41 for a nice 19 load. However when it happens, it will show values much higher, in the 42-81 range which should not happen. Unfortunately, the best hint for me would be to find what version of BFS it was introduced, and look for the change responsible, and since I can't even reproduce the problem most of the time, I can't do this regression testing.

So I'm appealing to the BFS users out there to see if anyone has this problem more regularly that has the time to try older versions of BFS. By older versions of BFS, I don't mean the same version of BFS on older kernels, but to try the first version of BFS that was available for that kernel. Running something continuously as a 'nice'd load is required to reproduce it, where the load is equal to the number of CPUs, so for example 'nice -19 make -j4' continuously in a kernel tree on a quad core machine. I'm hoping that someone out there is able to reproduce it and can do the regression testing. Thanks in advance. In the meantime, I'll keep auditing code and comparing new to old versions in the hope something stands out.

EDIT: An alternative approach was to try moving to 3.3 and make minor fixes along the way to see if the problem persists. Consider this patch a pre-release for now (CPU accounting appears all screwy still):

EDIT2: Fixed CPU accounting, bumped version to 418:


  1. I might be able to do some regression testing. What exactly would I be looking for? Higher values in the PR column? Would a virtual machine work to test this or could that create false positives? Would older hardware be more likely to expose this bug? My main machine is quite new (socket FM1) so I don't think that I would be able to run older kernels too far back, but I do have a PowerPC G3 Mac that should easily be able to run anything from 2.6.0 and up. What is the earliest release you suspect would exhibit this bug? Could the bug be specific to compiling with a particular version of gcc? Did the bug show up for you on a machine with hyperthreading?

  2. That would be much appreciated. I'm guessing it happened somewhere in the transition from late BFS 3xx to 416 so you wouldn't have to go back very far. I'm expecting it to be in the early BFS 400s, guessing linux 2.6.39ish. Judging by the symptom, I don't think it would matter whether it's hyperthread or not, but it might need to at least be SMP, dual core or dual thread or more. I'm led to believe it shows up after some time, so perhaps there's some accumulated counter that needs to go off before it becomes a problem. They giveaway would be a PRI > 41 for a non SCHED_IDLEPRIO task, and it exhibits this only in the setting of a niced load, where nice 19 has the most dramatic effect. It should be the same within a VM as a real machine.

  3. Could you release BFS beta for 3.3 to give it a test? I've got no problem you mentioned.

  4. I haven't done the port to 3.3 yet because all my BFS time is being consumed with tracking down this bug instead.

  5. Furthermore, 3.3 appears to have one of the biggest CPU scheduler code churns in a long time, meaning it will involve quite some work.

  6. Looks like I overestimated how much work it would be. Putting a link in the post for a preview patch.

    1. ck. Please fix the patch link. You add blogger website link before the real link. :D

  7. Thanks. Fixed. There is a known bug to do with displayed accounting but that shouldn't affect its behaviour.

  8. Okay, fixed the CPU accounting and bumped the version up to 418. There's an incremental called 417 stuff in the test/ directory if anyone just wants the fix.

  9. I just tried testing for the bug with 3.2-ck1 by compiling a kernel with "make -j4" niced to 19 on a quad cpu/core VM. The highest value I saw for priority was 41. It seems I am unable to induce the behavior this bug.

    1. Thanks for trying. This is why I was appealing for help here, because I can't reproduce it either and I haven't found what specific circumstances create the problem. Make -j4 niced to 19 is a very very very common workload on my quad core and I don't see it.

  10. I have just tried compiling kernel and KDE 4.8.1 on my Intel Core i7-2670QM with nice -n19 make -j8 on 3.3 with BFS 418. It looks like I am not affected by this bug either. I will try to compile something bigger as soon as possible and see if it will work fine.

  11. CK - I ran the standard make benchmark on 3.2.11/3.2.11+bfs v0.416 and 3.3.0/3.3.0+bfs v0.418 running it a total of 15 times. As expected, there is a statistically significant difference between both pairs of kernels (cfq/bfs under 3.2.11 and cfq/bfs under 3.3.0) with the bfs patched kernels out-performing the cfq kernels.

    Kudos as usual!

    The 'make benchmark' is compiling the linux 3.3.0 via 'make -j16 BzImage' and timing the result, the repeating 14 times. This is done via a bash script. Why 16 threads? I tested this on a dual quad system (with HT enabled).

    .post h3 {
    background:url(http://s14.postimage.org/s3vhdkxlt/anova.png) no-repeat;
    margin:.25em 0 0;
    padding:0 0 4px 30px;
    font-family:trebuchet ms;

    Here is the ANOVA plot of the data: http://s14.postimage.org/s3vhdkxlt/anova.png

    Here is the data table for those who want to analyze it on their own: http://pastebin.com/7W0HADvH

  12. Thanks for those as usual. Note that you can add links to your comments with ordinary html by linking anova.png

    I'll be posting a release candidate shortly for BFS with a few minor code changes as well. I notice how easy it is to type cfq when you meant cfs too ;)

  13. I see a new BFS-420, but this my experience was with bfs-418, therefore I tell it here. This night I compiled chromium on my mac-mini dual-core 64bit machine and using nice, which resulted htop:

    1734 ral 1 0 374M 19396 8360 S 0.0 0.5 1:34.62 ├─ /usr/bin/yakuake
    4599 ral 1 0 24132 2416 1864 S 0.0 0.1 0:00.34 │ ├─ /bin/bash
    10703 root 1 0 43372 1416 1072 S 0.0 0.0 0:00.01 │ │ └─ su -
    11165 root 42 0 24928 3356 1964 S 0.0 0.1 0:00.15 │ │ └─ -su
    26710 root 47 0 24248 2488 1436 R 4.0 0.1 0:01.50 │ │ └─ htop
    4586 ral 1 0 23960 2220 1752 S 0.0 0.1 0:00.00 │ ├─ /bin/bash
    4612 root 1 0 43368 1412 1068 S 0.0 0.0 0:00.01 │ │ └─ su -
    4626 root 42 0 24928 3360 1928 S 0.0 0.1 0:00.73 │ │ └─ -su
    30480 root 42 19 249M 146M 3760 S 0.0 4.0 0:37.86 │ │ └─ /usr/bin/python3 /usr/bin/emerge -auvDN wor
    11059 portage 42 19 4216 608 472 S 0.0 0.0 0:00.03 │ │ └─ [www-client/chromium-17.0.963.83] sandbo
    11077 portage 42 19 28892 4784 1704 S 0.0 0.1 0:00.07 │ │ └─ /bin/bash /usr/lib64/portage/bin/ebui
    11122 portage 42 19 29036 4104 944 S 0.0 0.1 0:00.00 │ │ └─ /bin/bash /usr/lib64/portage/bin/e
    11138 portage 42 19 26532 2376 1652 S 0.0 0.1 0:00.00 │ │ └─ /bin/bash /usr/lib64/portage/bi
    11140 portage 42 19 404M 387M 1012 S 0.0 10.5 0:31.78 │ │ └─ make -j2 --quiet chrome chro
    29272 portage 42 19 12680 1368 932 S 0.0 0.0 0:00.00 │ │ └─ /usr/x86_64-pc-linux-gnu/
    29274 portage 42 19 21380 6000 1232 S 0.0 0.2 184467440│ │ ├─ /usr/lib/gcc/x86_64-pc
    29273 portage 80 19 170M 141M 4920 R 81.0 3.8 0:01.79 │ │ └─ /usr/libexec/gcc/x86_6

    I hope you can copy-paste and see. Perhaps easier to see these percentage numbers which is much too high vor syslog (the first number):
    41263268 21114581:29 Ss syslog-ng
    73.0 0:00 RN+ cc1plus
    77.5 0:01 RN+ cc1plus
    7.0 52:09 Ss+ X
    3.1 23:25 Sl kwin
    1.7 13:16 Sl firefox
    0.8 0:37 SN+ emerge
    0.7 0:31 SN+ make

    Ralph Ulrich

  14. Thanks Ralph. None of the 420 changes will go beyond possible fixes for that issue compared to 418 that you're already running. I see you have some at 42+ in your testing, but I need to confirm that you have not started anything SCHED_IDLEPRIO as well, as those numbers would be normal for that. The other things is to experience a dramatic slowdown with everything else while it happens. Do you have that?

  15. I have just in this moment compiled linux-3.3 with bfs-420 and want to reboot and test recompile chromium with my Gentoo using

    MAKEOPTS=" -j2 --quiet"
    CFLAGS=" -march=native -O2 -pipe -Wno-unused "
    CXXFLAGS="-march=native -O1 -pipe -Wno-unused "

    in my /root/.bashrc
    schedtool -D $$
    I comment this out for next reboot!

    Or do you want me to test 418 further?

    Ralph Ulrich

  16. No that's fine, thanks! If you were using schedtool -D then those numbers are normal and you have not experienced the rare bug. Reboot and enjoy!