Friday 4 July 2014

BFS 0.449

Hot on the heels of the BFS448 release, I was doing some experimenting for some ideas I had (nothing productive so far) when I discovered the long-standing "CPU locality" code which determines the relationship between CPUs (eg. if they're SMT siblings or separate physical CPUs etc) was broken. So I've fixed the code that determines that, along with printing out what BFS believes to be the relationship (called locality) in dmesg on startup. An example output from a 2 thread, 2 core CPU would be:

[ 0.100217] LOCALITY CPU 0 to 1: 1
[ 0.100220] LOCALITY CPU 0 to 2: 2
[ 0.100221] LOCALITY CPU 0 to 3: 2
[ 0.100222] LOCALITY CPU 1 to 2: 2
[ 0.100223] LOCALITY CPU 1 to 3: 2
[ 0.100224] LOCALITY CPU 2 to 3: 1

I've also added the namespace fix as posted by here by Bogdan Trach (Thanks!). Diff from BFS 0.448 and full patch here:

BFS 3.15 patches

The changes in this patch may improve CPU throughput and decrease latency under certain circumstances but no benchmarking so far has shown any statistically significant difference.



  1. I don't understand exactly what this fix changes, but it works well together with BFQ & TuxOnIce on a 3.15.3.

    @ Alfred Chen: BTW, does this fix obsolete two of your patches, namely "[BFS] Refine locality to ranking and siblings/cache idle code" && "[BFS] locality doesn't need to be kmalloc" ?

    Thank you all for your concerted work on BFS,

    Manuel Krause

    1. Are you using just bfs-0.499 and none of the extra patches by Alfred Chen or post-factum?

      The issue with ath9k still seems to persist with bfs499 and 3.15.3. Other than that it seems to be working great.

      Thanks again for your effort Con :)

    2. Me? I just omitted/ reverted the above mentioned two patches from Alfred Chen and then patched 3.15.3 with CK's incremental BFS one: 448-449. The rest of AC's patches stayed in my kernel setup and do work.
      Sorry, that I can't say anything to ath9k. Not in use on here.

      Manuel Krause

    3. @ooo: ADDON: And I never did apply the "Use prefered raid6 gen function" and "phc-intel 0.3.2 patch" as I haven't understood the first and don't need the latter. *MK

    4. Don't use the "Use prefered raid6 gen function", it's hardcore to optimize on my cpu(core2) to reduce kernel boot time. PHC is another well-know patch to under-volt cpu, but not everyone use it these days. None of them is bfs related.

      For other patches I work on bfs, I will sync them up with 0449 and post back soon.

    5. Here are my patches synced with 0449

      #1 [BFS] Fix goto unlock in get_nohz_timer_target()

      #2 [BFS] Refine locality to ranking and siblings/cache idle code.

      #3 [BFS] locality doesn't need to be kmalloc.

    6. This comment has been removed by the author.

    7. @ Alfred Chen: Thank you for the updated news! Fresh patches applied, works fine in combination, too!

      But, I'd like to ask: What do I need?: So far, I had both SCHED_MC and SCHED_SMT enabled in my .config. After little research I found out that my Intel P8400 Core 2 Duo doesn't ship Hyper-Threading, while /proc/cpuinfo does very well export the "ht" flag. And Con's new LOCALITY code exports only "LOCALITY CPU 0 to 1: 2". Somewhere I've read that keeping the SMT way alive in ht-flagged systems could benefit code execution that relies on that extension, even if not supported by the hardware. I'm in doubt in what of that all may be a fairy-tale or what setting is the only "right" one.

      Thank you for sharing your ideas,

    8. @Manuel
      Well, it is the first time I found out ht flag for Core2Duo cpus. I guess that means cpu can support ht but physically it just has one ht per core. CONFIG_SCHED_SMT should be only used in scheduler code, if you enable it while you don't have it, it will waste cpu instructions to consider it.

      So, IMO, KISS, you don't have it, disable it.

  2. This comment has been removed by the author.

  3. @ck There are 3 more patches for bfs, pls review if you have time and comments will be welcome

    #1 [BFS] Sync up try_to_wake_up* functions
    -- Sync try_to_wake_up* functions with mainline, so it can make the sync up work easier from release to release

    #2 [BFS] Add WARN_ON_ONCE if rq is dereferenced in wait_task_inactive()
    -- Add WARN_ON_ONCE, if it doesn't show up in 1~2 release, the !rq check can be safe remove

    #3 [BFS] Remove runqueue wake_list.
    -- The runqueue wake_list is used in ttwu_queue_remote(). In BFS,
    tasks are just put into grq, so wake_list is no needed.

    1. All the three added patches work fine on here, since published, for now 1d9h48m uptime.

    2. Hi Alfred. Thanks for you continuing attention to the code.

      1.The try_to_wake_up functions have the extra function in mainline for dealing with other flags which BFS does not use so there is no point adding the extra function.
      2. While the idea of removing the check for rq over time is sound, the problem is it will be different on different architectures so there is no guarantee if the warning doesn't show up on x86* that it is safe to remove.
      3. A valid cleanup.

      Also note to others using Alfred's patches, I still disagree on the other patch [2] in his previous list. I will try to put pending accepted patches into my bfs directory.

    3. @ck
      Thanks for the review.
      1. I am not adding the extra function, mainly just sync-up the function name changes like ttwu_post_activation -> ttwu_do_wakeup. As for the flags, I will check if they are used by other functions and try to clean them up.
      2. After expand the percpu macro, basically it is var_address + per_cpu_offset, despite the different ways to determining per_cpu_offset for different arch, it is very unlikely it returns an zero pointer.

      I have 2 more minor changes these two weeks, will post out after testing.

    4. What's the percpu macro in reference to? Do you mean the optimisation you removed in your first set of patches? Checking to see if SMT siblings exist is CPU architecture dependent action and not one based on hard code. Running an SMT kernel on an AMD cpu will only find no siblings once it is running.

    5. I am just talking about #2 [BFS] Add WARN_ON_ONCE if rq is dereferenced in wait_task_inactive() in my second reply.

    6. Then how are you going to know if there's a codepath during the unlocked IRQs context switch on other architectures that doesn't dereference rq briefly enough for it to be NULL, which is all that is testing for?

    7. Let's explore it a little detail
      a. expand the task_rq() macro, task_rq(p)=>cpu_rq(task_cpu(p)),task_cpu(p)=>task_thread_info(p)->cpu,task_thread_info(task)=>((struct thread_info *)(task)->stack);cpu_rq() is the final step and it is a per_cpu variable which basiclly var_address + per_cpu_offset. So rq is looked up by cpu index in task thread info.
      b. If we consider all the possibility that we call task_rq() without any lock or irq, that turns out it may crash the kernel or rq could be any value

      rather than the zero. Checking "dereference" NULL is not enough for all cases here. I have done a little test to show returns of cpu_rq(0~19) while my machine just have 2.
      [ 14.903959] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
      [ 21.450042] cpu_rq(0) ffff88013fc12880.
      [ 21.450049] cpu_rq(1) ffff88013fd12880.
      [ 21.450055] cpu_rq(2) 0000000000012880.
      [ 21.450057] cpu_rq(3) 0000000000012880.
      [ 21.450061] cpu_rq(4) 0000ffff00012880.
      [ 21.450063] cpu_rq(5) 0000000000012880.
      [ 21.450066] cpu_rq(6) 0000000000012880.
      [ 21.450068] cpu_rq(7) 0000000000012880.
      [ 21.450071] cpu_rq(8) ffffffff817c858b.
      [ 21.450074] cpu_rq(9) 0000000000012880.
      [ 21.450077] cpu_rq(10) 0000000000012880.
      [ 21.450080] cpu_rq(11) 0000000000012880.
      [ 21.450083] cpu_rq(12) 0000000000012880.
      [ 21.450086] cpu_rq(13) ffffffff8103fc50.
      [ 21.450090] cpu_rq(14) ffffffff8103fbd0.
      [ 21.450092] cpu_rq(15) 0000000000012880.
      [ 21.450096] cpu_rq(16) ffffffff8103fc10.
      [ 21.450099] cpu_rq(17) 0000000000012880.
      [ 21.450102] cpu_rq(18) 0000000000012880.
      [ 21.450105] cpu_rq(19) 0000000000012880.
      c. There is no checking in mainline wait_task_inactive() too. If bfs cause issue here than mainline does and this checking is just for specifical arch, it should consider around by arch macro.

    8. Thanks. Looks convincing. I might run with it without even bothering with the check at all then.

  4. Hi Con,

    thanks for the new version of BFS. I already know, that the problem with hibernate on ath9k still persist, but you should know, that the problem is on 3.15 more worse. Now even the suspend doesn't work anymore without a kernel hung. No problem so far with CFS.

    Using BFS on zen kernel 3.15.5.

    Regards sysitos.

    Btw. more important, enjoy your trip ;)

    1. I can say that this is not ath9k issue. I've reverted ath9k driver to 3.13 state, and that didn't help. So need to look for the bug somewhere else.

    2. Hi,

      Ok, can't verify this. Tested it only with the actual 3.12.24 kernel (btw, even the best kernel for my Dell Vostro 3750 ;) ) with BFS, BFQ and exFAT and here all is working fine (hibernate and suspend). My tests with 3.14 were already written down here, without the ath9k module all is working. And with CFS all is working. So maybe the loading of the ath9k does trigger the bug, maybe some new network algorithm, but I am not an expert ;(. My test with 3.15 shows, that suspend leads to an kernel hung nearly instantly, if there are some network monitors are running. No network activity and there is no hung. But hibernate always leads to a hung. So are my studies.

      Btw. pf, which kernel do you support with your pf-kernel, some long time kernels too or as the zen-kernel (which I use at the moment) only the newest one? Thanks.

      Regards sysitos

    3. @post-factum
      The ath9k bug with ck is actually present with 3.13 kernel already.
      I just tested vanilla 3.13.0 with bfs-446 and got the same hang after waking up from suspend.

      I'm quite sure 3.12 was the version that still worked though. I will be testing it soon and report back.

    4. continued..
      I tested bfs-444 with on both 3.12.0 and 3.12.24 and it doesn't freeze after waking up from suspend.

      so whatever broke it happened between 3.12 and 3.13 or bfs-444 and 446

    5. I've found several commits regarding tasklet usage fixes, and the symptom was quite similar. I guess this bug is related to tasklets scheduling.

    6. Good investigating so far guys. One of PF's screenshots shows a possible recursive lock around the sched_init_smp function and that did change in mainline between 3.12 and 3.13 so perhaps those changes were not portable to BFS.

    7. Would be happy to try possible patch ^_^.

    8. Even better, it found a real bug ;)

      Try this:

    9. tried the sisrevert patch against vanilla 3.15.5 and bfs449, but my system still freezes after suspend.

      It's odd that with 3.15+bfs449 I can suspend and resume _once_ without any issues, but do another suspend+resume after that and system freezes. with 3.13 (and 3.14 if I remember correctly) system would freeze right after the first suspend+resume.

    10. Oh well unrelated bug then and probably never struck. I didn't think that code path should be hit on suspend resume anyway. Keep looking...

    11. Try this on top of the sisrevert patch as well:

    12. I accidentally replied below instead of this thread and can't remove the post :(

      anyway as I said there, sis-test.patch on top the sisrevert patch still results in exact same suspend behavior as with plain bfs449

    13. Unfortunately, those patches do not fix ath9k issue :(.

    14. Thanks PF. Just to confirm once more, is this only an issue with Tux On Ice compiled in as well? And is BFS+Ath9k only okay?

    15. Hi Con. No, don't use Tux On Ice, so BFS + ath9k isn't okay. Tested it with TOI too, but same hung problem.

      regards sysitos

    16. For me the issue happens after TOI usage, but as you can see other ppl face it even without TOI.

    17. So switched my minipci WLAN card again. Hopefully the last one, because its a nightmare to do this on a Dell Vostro 3750. With other words, can't test anymore the hibernate bug with the ath9k wifi card and bfs.

      Btw. Using now an Intel 7260 wifi card with bluetooth, good/excelent performance on 2,4GHz and even in 5GHz area its great and this with only 2 antennas. (Wasn't aware, that this does work at all).
      And hibernate/suspend does work with BFS ;)

      PS: But remember, there are different models with different specs on the market!

      Regards sysitos

  5. sis-test.patch on top the sisrevert patch still results in exact same suspend behavior as with plain bfs449

    1. Never mind it was a (very) long shot anyway, thanks for testing it.

    2. I am just testing these patches:

      Suspend to ram is OK, Hibernate I dont use and have disabled in .config .
      My system as ever: MacMini Core2Duo

      Thanks from a happy user from hot summer Hamburg, Germany

  6. Here's another test patch that there is a small chance it might help this issue:

    Note to others, unless patches are in the pending/ directory I'm not planning on including them. These test patches are just experimenting for this particular issue.

    1. I tested this against 3.15.5 and sisrevert patch (nothing else from pending/ or test/). However the suspend issue remains with ath9k.

      There seems to be some change though: Previously I could usually suspend and resume once without any issues, and the system would freeze only after second resume.

      with sched_affinity_locks patch, 3/5 of the times my system froze right after the first suspend/resume. The other two times my system froze after second resume.

      So maybe there could be some change for worse, or it could just be coincidence.

      anyway, thanks again for your continuing efforts for solving this issue :)

    2. Unfortunately, this patch doesn't fix the issue for me as well.

      Nevertheless, I've found out that ksoftirqd hangs just here:

      __cond_resched() call from within run_ksoftirqd() eats lots of CPU time in the case of system hang.

    3. Thanks guys. @PF I've been staring at the lock debugging photos you took as they seem the key to finding this issue though I'm still stumped as to what it is about that particular module that makes it happen. I'm still looking whenever I have time and we'll nail this sooner or later..

    4. You'd laugh, Con, but… the issue is fixed by turning on ath9k powersaving (default is off).

      I don't know what to say ;). Everything I know about it is that ath9k powersaving has been implemented using timers, and, probably, they are related to CPU scheduler locking issue as well.

      Anyway, I'm not 100% sure, but my machine survived after two hibernation cycles. Will test more.

    5. @pf, how did you enable the powersaving? with ps_enable=1 option?
      I tried that but my system still locks up after second suspend.

      However this time my system actually worked a few seconds longer after resume than usually, and only locked up when wireless started connecting, which apparently takes a bit longer with ps_enable=1.

    6. @ooo, I've done this:

      modprobe ath9k ps_enable=1
      iw dev wlp1s0 set power_save on

    7. @pf, okay, that's what I did except didn't enable power_save with iw.
      That didn't seem to make any difference though: still lockup after second suspend.

    8. @pf No difference with ps_enable=1, always tested with this switch, because for me it seems to give some better connection stability. Same experience as ooo.

    9. Guys, would you want to try if this kind of workaround works for you. It's not a solution anyway.

      When I played with eee-pc years ago, the wifi driver doesn't work after resume from suspend. So I have to unload the wifi driver module(and maybe iw module) and modprobe it back after system resume.

      I think you need to play with plugin scripts in /etc/pm/sleep.d, also you also need to restart upper layer services/application to get it works.

      Good lucky.

    10. Sorry, detail steps should be
      1. unload modules before suspend.
      2. load modules after resume
      3. deal with upper layer service/application

    11. Alfred, plz read my detailed bugreport here: (several emails there for June and July).

      I've reported that unloading/modprobing module raises the same issue, even in more controlled manner.

    12. I was having lockups too with the latest kernel + ck, this fixed it for me

      probably unrelated but might help someone else.

  7. Hi Con,

    for testing the ath9k suspend bug which patch should I use on top of the BFS 3.15 patch? Only this one, or both from /testing with or without all the patches from /pending?

    Regards sysitos

    1. Hi Con,
      tested only with the new patch. For me it seems to be better, but the CPU hung still remains (for suspend and hibernate). But now after hibernating the WLAN does reconnect, was not the case without it. Dmesg shows multiple times, that CPU0 hung for min. 23 sec, top shows a kernel threads eating the CPU ;) and goes to values of 20 and more. System isn't usable anymore.
      But fine to see, that you are working on it. Thanks again.

      Regards sysitos