Monday 9 September 2013

BFS 0.441, 3.11-ck1

Announcing a resync and update of the BFS CPU scheduler for linux-3.11

BFS by itself:

Full -ck1 patchset including separate patches:

Apart from the usual resync to keep up with the mainline churn, there are a few additions from BFS 0.440. A number of changes dealing with wake lists as done by mainline were added that were missing from the previous code. There is a good chance that these were responsible for a large proportion of the suspend/resume issues people were having with BFS post linux 3.8. Of course I can't guarantee that all issues have been resolved, but it has been far more stable in my testing so far.

The other significant change is to check for throttled CPUs when choosing an idle CPU to move a process to, which should impact the behaviour and possibly throughput when using a scaling CPU governor, such as ondemand.

Those of you still using the evil proprietary Nvidia binary driver (as I still do) will encounter some issues and will need to use a patched pre-release driver from them if you build it yourself, until they release a new driver.

That is all for now.



  1. Thanks CK! Glad you were able to find some time to dig into the suspend problem. I know the linux-ck user base will gladly provide feedback.

    I will preform the usual "make" benchmarks and post to my blog (linking here). Stay tuned.

    1. This comment has been removed by the author.

    2. I did the usual 'make' benchmark comparing BFS v0.441 to CFS in Linux version 3.11. Usual results found: BFS is statistically faster. See details in blogpost:

      ...why can't I edit my posts? Only option is to delete.

  2. "which should impact the behaviour and possibly throughput"

    positively or negatively?

  3. Thanks a lot for this update!

  4. It works, but still needed the following patch to make kernel bootable:

    1. Ah yes darn, forgot about that entirely.

    2. Nah it's just a matter of not enabling a config option.

    3. @ck - Probably best to include in a ck2 patchset though, no? Some users may enable CONFIG_RCU_USER_QS...

  5. THANKS!

    I appreciate your work.

  6. Thank you for your work.

    Quick question: for those of use using the evil binary driver from NVIDIA, what patched pre-relase driver are you refeering to?

    1. I'd like to second this request. Where can I obtain this patched NVidia driver?

    2. I ended up downloading the tarball for the arch nvidia-ck packages, and applying the to the nvidia sources installed by apt.

  7. Sleep/suspend still doesn't work well on my machine. How could I debug?
    I'm running Archlinux with graysky binaries. When the sleep procedure is initiated, the disks power down, but the PC remains with the fans on and a blinking white cursor on the screen.

    1. I've got the same problem as you.

    2. Same here. At least the machine does not hang anymore, it simply does not suspend neither using systemctl suspend nor kde (which uses pm-utils I think). Using Archlinux and graysky's repo binaries for nehalem processors.

    3. Odd, all of my suspend resume issues have gone away, but I'm not using systemd so perhaps that's related? Try adding the no_console_suspend option to your kernel boot command and see if there's any meaningful information there.

    4. This comment has been removed by the author.

    5. CONFIG_HZ_1000 fixed the suspend issue for me.
      Everything lower than this and it breaks at:
      smpboot: CPU 1 is now offline

      Most of the times it reaches CPU 5 and then crashes.

    6. With no_console_suspend this is what I got:

    7. @big-bum - Can you try recompiling with 1000 Hz tick rate as Jan suggested? Does that fix it?

    8. Yes. Recompiled with HZ=1000, now it works OK.
      From 4 suspend test all worked fine. I think 1000Hz is the solution.

    9. If it still needs 1000Hz to work correctly, then there is still a bug.

    10. +1 for 1000Hz which solves suspend issue

    11. This solution didn't worked all the time... I get random crashes on resume. So I reverted to lower HZ and tried debug the problem again.
      This time the backtraces look a bit more helpful:

    12. Here is another backtrace for the blocked task.
      Happens on "echo 0 > /sys/bus/cpu/devices/cpu2/online"

      [ 152.502382] task PC stack pid father
      [ 152.502405] zsh D ffff88023fa91ac0 0 5282 5281 0x00000000
      [ 152.502410] ffff880236123ca0 0000000000000082 ffff88023689f330 ffff880221869fd8
      [ 152.502413] ffff880221869fd8 ffff880221869fd8 ffff880236123ca0 ffff880221868000
      [ 152.502415] ffff880221869e80 00000000ffffffff ffffffff8195c744 ffff880236123ca0
      [ 152.502418] Call Trace:
      [ 152.502425] [] ? schedule_preempt_disabled+0x1a/0x20
      [ 152.502428] [] ? __mutex_lock_slowpath+0x134/0x220
      [ 152.502430] [] ? mutex_lock+0xe/0x20
      [ 152.502434] [] ? store_online+0x1f/0xc0
      [ 152.502439] [] ? sysfs_write_file+0xb9/0x140
      [ 152.502443] [] ? vfs_write+0xb3/0x1e0
      [ 152.502446] [] ? SyS_write+0x43/0x90
      [ 152.502448] [] ? page_fault+0x1f/0x30
      [ 152.502452] [] ? system_call_fastpath+0x1a/0x1f

  8. Regarding the microsleep ondemand fix I would like to have a backport to linux-3.10
    (This LTS one will be longer used by many gentoo users for example)

    Greetings and thanks from Hamburg, Germany
    Ralph Ulrich

  9. Regarding the sleep/suspend issue, I have obtained the following message after an
    albeit successful resume on a thinkpad R60:

    WARNING: CPU: 1 PID: 1804 at kernel/trace/ring_buffer.c:2571 rb_reserve_next_event.isra.48+0x227/0x327()
    Delta way too big! 18446741873297462717 ts=18446744063829990609 write stamp = 2190532527892
    If you just came from a suspend/resume,
    please switch to the trace global clock:
    echo global > /sys/kernel/debug/tracing/trace_clock

    Hope this may be of help
    Greetings from Rome,

  10. Will these changes be ported to kernel 3.10?

    As it is a long term kernel I'd like to keep using it for a while.

    1. I wasn't planning on it but I guess you're saying you want me to.

    2. More than a "I want you to" is a "would be a good idea".

      As kernel 3.10 is a longterm one, it will be the first choice for those who need a stable kernel; this does not fit alltogether with BFS and suspension :(

    3. I'll be sticking to 3.10 for a while. And I guess quite a few other people too. Though of course I understand that maintaining two different trees isn't everyone's idea of fun. Fortunately, BFS for 3.10 works without issues for me (unless a minor kernel update breaks it.)

    4. I am not saying "I want you to", what I meant is: "would be a good idea".

      As kernel 3.10 is a longterm one, many people will stick to that, having these fixes in BFS for 3.10 would make those people (like me) able to use a stable -ck kernel for a while.

  11. I'm experiencing high battery drain on BFS. My laptop's battery depletes after ~2h on BFS compared to 4-5h on CFS. :(

  12. Kernel 3.11 is soo good that I don't have to use BSF anymore...

  13. This comment has been removed by the author.

  14. 3.11 with BFS & BFQ is worse than ever seen.

    I've ported my setup to the new machine, and many things got better, but the usual problems remainded:
    SWAP, SHM & their interaction got worse. Unusable with 3.11.

    With 3.10.12 + both patchsets I don't have as much problems.

    And I won't use 3.11.z any longer with your promoted patches, CK, until there is a usable kernel for them.

    Please, also, prepare a backport of BFS-422 for the longtime 3.10.x!

    Thanks, Manuel Krause

  15. I meant BFS 442 (not 422) Sorry for typo. Manuel

  16. And the backport should clarify if it's 3.11 OR 3.10 related, Manuel

  17. Dear graysky,
    i noticed that if i use the newest linux-ck (Sandybridge) my external USB Mouse stopped working after about 5 sec.
    With linux-3.11.1-1 all seams right.
    Anybody else noticed that? Or is it just me (Lenovo E530 i7)

    1. Sounds like a powersaving option, but if you're saying the non-ck kernel does not exhibit this behavior and that is the only change, that does raise my level of suspicion. The real test would be for you to compile linux-ck from the AUR twice:

      1) Untouched PKGBUILD, compile, install reboot. Does this happen?

      2) Since the ck1 patchset is preapplied to the configs in the linux-ck package, you will need to grab the official ARCH config (or config.x86_64) and overwrite the ones included in the linux-ck.tar.gz source package. Then update the sha256sums in the PKGBUILD, and then comment out the line that patches the source with CK1 (line 120 in 3.11.1-2-ck). Now compile, install, reboot, recheck. Does it still happen? If so, it it highly likely to be something in the ck1 patchset.

    2. Thanks! I use repo-ck so i dont change anything.
      I will try the thing you said using aur tomorrow and report to you.
      1000 times thanks, graysky!

    3. Strange thinks happend, it seams that if i had the Powercable of my Laptop plugged in the mouse continue to work. But as soon i deplug the Powercable the Mouse switches of.
      So it might be a patchset problem. Any tipps for powersetting configuration?
      All the best.

  18. Dear graysky, it all seams that it is a problem the patchset, i did all the thinks you said and still the same problem. But: before the update to 3.11.1-2 all things worked right. So maybe something wrong with that?

  19. Myth story solved!
    All that wired stuff happend because of Laptop-Tools (USB Suspend), now it is disabled and all work perfect.

    1. From what I've read, laptop-mode-tools is an evil package that contains some depreciated code. Glad you got your problem solved.

  20. I made the backport of
    3.11-sched-bfs-442.patch to
    for consumation with Longtermstable Linux-3.10

    It was quiet easy. Beside just copying I just
    added a spinlock.h function alias.

    See attachement found in Gentoo bug at

    Greeting from Hamburg, Germany
    Ralph Ulrich

    1. Courtesy to PaulBredbury who made the diff
      310-sched-bfs-440-to-442 , you found there:

      Me wonder: Should I apply the relocation of mutex_lock of Alfred Chen (look down of this blog thread) also for Linux-3.10?
      Ralph Ulrich

  21. Here is my patch to remove dead lock warning at system boot up which I reported some time ago.

    diff --git a/kernel/sched/bfs.c b/kernel/sched/bfs.c
    index 763d417..3a617be 100644
    --- a/kernel/sched/bfs.c
    +++ b/kernel/sched/bfs.c
    @@ -6940,6 +6940,7 @@ void __init sched_init_smp(void)

    + mutex_lock(&sched_domains_mutex);
    * Set up the relative cache distance of each online cpu from each
    @@ -6953,7 +6954,6 @@ void __init sched_init_smp(void)
    for_each_online_cpu(cpu) {
    struct rq *rq = cpu_rq(cpu);

    - mutex_lock(&sched_domains_mutex);
    for_each_domain(cpu, sd) {
    int locality, other_cpu;

    @@ -6983,7 +6983,6 @@ void __init sched_init_smp(void)
    rq->cpu_locality[other_cpu] = locality;
    - mutex_unlock(&sched_domains_mutex);

    * Each runqueue has its own function in case it doesn't have
    @@ -6999,6 +6998,7 @@ void __init sched_init_smp(void)
    + mutex_unlock(&sched_domains_mutex);
    void __init sched_init_smp(void)

    1. Thanks a lot, Alfred!

    2. Alfred Chan puts the
      mutex_lock (respective mutex_unlock)

      outside of

      If mutex_lock is meant for one cpu
      this sound reasonable? Is it?
      Ralph Ulrich

    3. @Ralph Ulrich
      Thanks for the review and comment. As my understanding, sched_domains_mutex is not for one cpu, there is existed code which make sure sched_domains_mutex is held then run into for_each_cpu loop, the online code is at

    4. @Chen, I booted with two Linux-3.11 and Linux-3.10 (see my backport above) your patch applyd:
      Runs fine! Feels faster!
      Also another guy in the Gentoo forum ran with your patch!

      I wonder if benchmarks of Graysky of Archlinux would see the difference?
      Ralph Ulrich

    5. @Ralph Ulrich
      Thanks for testing. As this fix just locate in __init sched_init_smp, which just be invoked when system boot up, so I don't think performance boost is expected.

  22. I finally managed to at least find a workaround for the suspend crash.
    It seems there is a problem with the process migration in the cpu offline code.
    simply run this command before suspending:
    ps -eo pid | xargs -I'{}' taskset -pc 0 {}
    this sets all processes on cpu 0 so no need for migration
    after resuming this: (you have to change this to your core count this is for a 4 core i7 ):
    ps -eo pid | xargs -I'{}' taskset -pc 0-7 {}

  23. Hi greysky, it's me again.
    I updated my linux-ck-sandybridge via repo-ck to 3.11.5 and now if got this problem if i want to load the nvidia module.

    sudo modprobe nvidia
    modprobe: ERROR: could not insert 'nvidia': Exec format error

    This don't happen if i use the stock arch kernel (3.11.4).

    Should i run the nvidia-bug-report and post the results?


  24. Downloaded the new nvidia-ck-sandybridge driver, now works! Thanks!

  25. Yes, OT, but i think, it´s interesting:

    "Here's Why Radeon Graphics Are Faster On Linux 3.12"

    have fun

    1. The mentioned patch works for me with LTS Linux-310.16-442.bfs
      But I had to disable bfs patching of cpufreq-ondemand.c
      Ralph Ulrich

    2. also "Nouveau Performance Is Faster With Linux 3.12"

      i think, these patches are a great improvement, very interesting for LTS user

      with a few modifications, these 3 patches works for me with 3.10+bfs+bfq and 3.11+bfs+bfq

      Con, please, what do you think, 80 or 63, 100000 or 10 ...?

      # ondemand-patches
      #define DEF_SAMPLING_DOWN_FACTOR (1)
      #define MAX_SAMPLING_DOWN_FACTOR (100000)

      # bfs
      #define DEF_SAMPLING_DOWN_FACTOR (1)
      #define MAX_SAMPLING_DOWN_FACTOR (10)


    3. Thank you for encouraging to apply these three patches! Currently running @3.11.5+BFS+BFQ with integrated Intel graphics GM45. Seems to work well and I hope we won't face any latency or power consumption related regressions. I'm looking forward to PHORONIX testing the latter..

      BTW, in my kernel from the openSUSE repo the only difference is
      #define DEF_FREQUENCY_UP_THRESHOLD (63) -- vs. -- (80).
      There is no #define MAX_SAMPLING_DOWN_FACTOR (10) in BFS/ck1.

      Cheers, Manuel

    4. @graysky: ... and maybe you want to benchmark this, too. B-D

    5. Manuel, you´re right, the only difference is
      #define DEF_FREQUENCY_UP_THRESHOLD (63) -- vs. -- (80).

      there is a #define MAX_SAMPLING_DOWN_FACTOR (10), but it´s for
      cpufreq_conservative.c - (100000) is for cpufreq_ondemand.c and
      there is no change of this in v3-1-3-cpufreq-ondemand....patch

      sorry, in the heat of the battle..... ;)


    6. No problem, that you agree. ;-)

      Would be interesting, though, what you say about that 63 vs. 80 difference.

      ATM, I can tell that my notebooks fan is running @ higher rpm than before, what is o.k. when also running the BOINC client and not on batteries. And if I correctly interprete the PHORONIX diagrams, this should be the intended result of the patches: That we don't have idle cycles if they are not indicated by the actual (CPUs) load.

      Greets, Manuel

  26. performance drop with ck1, can somebody confirm this?

    Phoronix Test Suite v4.8.3
    running only point 3: "Run Complex System Test"

    3.11.5+stable_queue / 3.11.5+stable_queue+ck1
    3048 / 2173 TPS -- PostMark
    7545.56 / 7026.06 MB/s -- RAMspeed SMP [Average/Integer]
    7598.59 / 7142.49 MB/s -- RAMspeed SMP [Average/Floating Point]
    43.35 / 44.09 Seconds -- C-Ray
    22213.19 / 14038.33 RPS -- Apache Benchmark

    i will test bfs (not ck) later.

  27. 3.11.5+stable_queue / 3.11.5+stable_queue+ck1 / 3.11.5+stable_queue+bfs
    3048 / 2173 / 2083 TPS -- PostMark
    7545.56 / 7026.06 / 7037.25 MB/s -- RAMspeed SMP [Average/Integer]
    7598.59 / 7142.49 / 7148.92 MB/s -- RAMspeed SMP [Average/Floating Point]
    43.35 / 44.09 / 44.05 Seconds -- C-Ray
    22213.19 / 14038.33 / 13624.50 RPS -- Apache Benchmark

    1. What do you mean with "+stable_queue"?

      Thanks, Manuel

    2. latest 3.11-stable patches from

  28. baseline 3.11.6 / bfs 3.11.6
    7076 / 7142 -- PostMark
    15483.20 / 15483.20 -- RAMspeed SMP [Average/Integer]
    15500.74 / 15500.74 -- RAMspeed SMP [Average/Floating Point]
    25.46 / 24.25 -- C-Ray
    19177.54 / 33466.23 -- Apache Benchmark

    1. thx Jan, can you post your config, please!?

  29. Kernel: 3.11.6 | Scheduler: cfs/bfs

    _1: - .config from siduction
    cfs_1: - IRQ_TIME_ACCOUNTING=y
    apart from that, cfs_1 & bfs_1 are identical (except NUMA_BALANCING | CGROUP_CPUACCT | CGROUP_SCHED | SCHED_AUTOGROUP (all disabled by bfs))

    _2: - .config by me
    cfs_2: - TICK_CPU_ACCOUNTING=y
    apart from that, cfs_2 & bfs_2 are identical (except NUMA_BALANCING | CGROUP_CPUACCT | CGROUP_SCHED | SCHED_AUTOGROUP !!!and!!! TICK_CPU_ACCOUNTING (all disabled by bfs))
    bfs_2 & bfs_3 are identical, except IRQ_TIME_ACCOUNTING/TICK_CPU_ACCOUNTING

    cfs_1 / bfs_1 / cfs_2 / bfs_2 / bfs_3
    2403.00 / 1693.00 / 3086.00 / 2174.00 / 2130.00 TPS -- PostMark
    7242.90 / 7463.08 / 7481.70 / 7067.49 / 7260.16 MB/s -- RAMspeed SMP [Int]
    7339.97 / 7504.71 / 7576.60 / 7110.45 / 7400.83 MB/s -- RAMspeed SMP [Float]
    0044.18 / 0043.27 / 0043.36 / 0043.99 / 0043.33 Sec -- C-Ray
    18725.01 / 4364.33 / 23478.09 / 14183.98 / 14443.12 RPSec -- Apache Benchmark

    there is a great perf. improvement with cfs and .config (cfs_2)

    with bfs and a nearly identical config (bfs_2), the performance will drop
    with TICK_CPU_ACCOUNTING=y, the lost in the RAMspeed test is not so high (bfs_3)

    1. Thank you for your great effort!
      Did you notice any desktop latency related regressions when running your bfs_3 .config (e.g. compared to bfs_2)?

      Greetings, Manuel Krause

    2. Currently running 3.11.6 with BFS, BFQ, the three aforementioned "Radeon faster" patches and the manual enablement of TICK_CPU_ACCOUNTING=y instead of IRQ...

      Works well and seems to be a bit snappier when browsing with latest Firefox ESR while watching .avi videos in parallel. Also, editing >this here< is better with this setup when having the normal use case up.

      Thanks, Manuel Krause

      P.S.: Why does Con Kolivas not contribute to his blog in recent time?

    3. I have nothing to say right now about the kernel code?

    4. Manuel,
      sorry, i can´t say anything about desktop interactivity and responsiveness. I only try to figure out what config-options will make the perf. drop. I am not sure with anything at moment and i will do some more benchmarks. What i can say:
      TICK_CPU_ACCOUNTING seems to be faster than IRQ_TIME_ACCOUNTING
      CONFIG_NO_HZ_IDLE seems to be faster than CONFIG_HZ_PERIODIC

      Please be patient (for a few days (or weeks)), i will post my cognitions.


    5. I don't know how much RAM throughput affects performance, but in Graysky's compile benchmark in 3.11 BFS was clearly the winner:

    6. Kernel: 3.11.6+bfs+bfq

      HZ_PERIODIC+IRQ_TIME_ACCOUNTING - 2000 TPS - 7035/7083 MB/s - 44.06 Sec - 09502 RPSec
      HZ_PERIODIC+TICK_CPU_ACCOUNTING - 2016 TPS - 7330/7455 MB/s - 43.31 Sec - 09558 RPSec
      NO_HZ_IDLE+IRQ_TIME_ACCOUNTING -- 2438 TPS - 7748/7813 MB/s - 43.22 Sec - 10110 RPSec
      NO_HZ_IDLE+TICK_CPU_ACCOUNTING -- 2427 TPS - 7800/7863 MB/s - 43.23 Sec - 10037 RPSec

      I could not figure out, which option affected the apache performance.

    7. interbench-0.31



    8. +compaction(+migration)+transparent_hugepage
      2551 TPS - 7461/7565 MB/s - 43.15 Sec - 09961 RPSec

      2525 TPS - 7479/7558 MB/s - 43.14 Sec - 10074 RPSec

      same as above (-zram), but with

      2551 TPS - 7443/7483 MB/s - 43.20 Sec - 09984 RPSec

      2688 TPS - 7456/7528 MB/s - 43.18 Sec - 10258 RPSec

      2688 TPS - 7464/7523 MB/s - 43.17 Sec - 10154 RPSec

      2631 TPS - 7520/7560 MB/s - 43.18 Sec - 10215 RPSec

      2747 TPS - 7530/7543 MB/s - 43.13 Sec - 10280 RPSec

      2688 TPS - 7754/7820 MB/s - 43.16 Sec - 09844 RPSec

      2688 TPS - 7833/7920 MB/s - 43.15 Sec - 10169 RPSec


      2688 TPS - 7758/7847 MB/s - 43.22 Sec - 10037 RPSec

    9. finally

      # CONFIG_COMPACTION is not set
      # CONFIG_CLEANCACHE is not set
      # CONFIG_CMA is not set
      # CONFIG_KSM is not set
      # CONFIG_FRONTSWAP is not set
      # CONFIG_ZSWAP is not set
      # CONFIG_ZRAM is not set
      # CONFIG_ZRAM_DEBUG is not set

      but i´m not sure about

      feel free

      greets tww

    10. Lots of low-jitter stuff here:

      Peace Be With You.

  30. Just noticed that while I am launching a compile job with -j num_core_in_my_system, UI responsiveness is impacted (ie: mouse cursor severely micro-blocking, web browser taking a very long time to load pages).

    I am wondering if this behavior is surprising and if using schedtool -I on the Xserver would be a good idea to workaround this issue.

    1. > I am wondering if this behavior is surprising and if using schedtool -I on the Xserver would be a good idea to workaround this issue.

      not surprising if all processes run in same scheduling class. You should start your compiler task with schedtool -D. In fact, I start every xterm that way by adding the following to my .profile and .bashrc (the latter sources the former):

      if [ "$TERM" = "xterm" ]; then
      schedtool -D $$
      ionice -c 3 -p $$

    2. Martin, awesome trick. I'll give a shoot to set ui terminal windows priority to IDLE. So basically every cmdline jobs have lower priority than anything using the GUI.

      I'm assuming that all children inherit priority settings.

      It will fix the GUI responsiveness for sure but I would have though that, out of the box, responsiveness would better.

    3. Hi Oliver,

      I use for the make job the toolsched from Con, you can find it on this blog, but 1 year or so in the past. It does automatically use the schedtool for make (and other user defined apps). But if there are big compile jobs with heavy io, it could be a problem with "vm.dirty_ratio", as described by me below. If the cache is to big, the write back of data to the disk take to long and blocks the other apps.

      Btw. Con, I do have a question for toolsched. If I use 'sudo make install' than I got a error, that make is not found. Ok, I can use 'su -c "make install" ', but thats not sudo. Any suggestions?

      cu sysitos

  31. with bfs apply to ubuntu source.
    seems just let this code
    " struct sched_rt_entity rt;"
    move out to the else
    but i move out the three code out of else.

    1. now i remove the else above that code.
      i think it will normal.
      but it need to tested.

    2. according to my test.
      just remove the #else above the
      "struct sched_rt_entity rt;"
      the compile will normal.
      is there will a patch that automatic fix this problem?

  32. I have been experiencing random shutdowns since some time ago. I cannot say for sure after which upgrade, but I'm positive it only occurs in the 3.11 series.
    My laptop simply shuts down (not a reboot!), the battery led lighting up for a second. After this, pressing the power button doesn't "wake up" the screen, which remains black and without any power (i.e. not even the backlight turns on). I press the power button once again, for a hard shutdown, and then everything works again. I cannot see a pattern -- sometimes this happens as soons as I boot into X, other times it can be after hours, even days.
    Booting with the vanilla ARCH kernel never elicits this behaviour, hence my suspicion that the kernel might be culprit.

  33. Any urw locks patch for grq? Old one fails with BFS v442.

  34. Phoronix: "BFS Scheduler Lost Some Charm With Linux 3.11"

    1. 3.11-6.towo-siduction-amd64
      2450 TPS - 7448/7569 MB/s - 43.25 Sec - 18703 RPSec

      3.11.6-1 vanilla+siductionconf+localmodconfig -bfs
      2427 TPS - 7497/7507 MB/s - 43.35 Sec - 18592 RPSec

      3.11.6-3 vanilla+siductionconf+localmodconfig +bfs
      1724 TPS - 7457/7557 MB/s - 43.20 Sec - 04376 RPSec

      3.11.6-2 vanilla+siductionconf+localmodconfig -bfs XXX
      2380 TPS - 7870/7940 MB/s - 43.43 Sec - 19334 RPSec

      3.11.6-2t vanilla+siductionconf+localmodconfig -bfs YYY
      2525 TPS - 7780/7857 MB/s - 43.40 Sec - 18922 RPSec

      3.11.6-4 vanilla+siductionconf+localmodconfig +bfs XXX
      1712 TPS - 7746/7819 MB/s - 43.19 Sec - 04431 RPSec

      3.11.6-4t vanilla+siductionconf+localmodconfig +bfs YYY
      1923 TPS - 7763/7825 MB/s - 43.20 Sec - 04015 RPSec

      # CONFIG_COMPACTION is not set
      # CONFIG_CLEANCACHE is not set
      # CONFIG_CMA is not set
      # CONFIG_KSM is not set
      # CONFIG_FRONTSWAP is not set
      # CONFIG_ZSWAP is not set
      # CONFIG_ZSMALLOC is not set
      # CONFIG_ZRAM is not set
      # CONFIG_ZRAM_DEBUG is not set
      # CONFIG_CONTEXT_TRACKING is not set
      # CONFIG_RCU_FAST_NO_HZ is not set
      # CONFIG_USER_NS is not set


    2. @ Jan Killius - can you give me your config, please!?

      for clarification:

  35. Any thoughts about a release date of linux-ck 3.12?

  36. Another time again a question to you latency addicted people:
    Does someone of you know any knobs, hints or web-links on how to ease the pain of heavy swapping I/O?
    (I've ported my setup to the newer machine with Core2duo, 4GB RAM, 4GB /dev/shm & 10GB of swap on the 2nd disk. Often using the /dev/shm as a RAMDISK to decode files to and re-code them back to disk from there.) Now with 3.11.7+BFQ+BFS as of ck1. During swapping, avi video replay will stutter in frames for video and sound.

    I've, so far, experimented with /proc/sys/vm/dirty_ratio, .../dirty_background_ratio, .../swappiness, and even with "schedtool -R -p 99 -n -19 `pidofproc -n kswapd0`" or "ionice -p `pidofproc -n kswapd0`" -- but don't see any direction to follow.

    I'm in doubt if there is an issue between /dev/shm + swap + physical RAM "communication".

    Thank you in advance for sharing your ideas/findings,
    Manuel Krause

    1. Hi Manuel,

      had here similiar problems (but could be different causes!) with newer kernels and BFS and BFQ, writing to some slow devices (USB Thumb drive and NFS share over my wlan) leads to a unusable system. The whole systems freeze for 10 second and more. (See it on the jumping clock applet). Using here a i7 with 8GB Ram and that should not happen on Linux ;). So I found a solution for my problem, described here Changed the default 40 to "vm.dirty_ratio = 4" and "vm.dirty_background_ratio = 2". Now its really snappier, no more drop outs.

      Maybe it could you help too.
      cu sysitos

    2. Hi Manuel, you can try also

      the .config options from above

      "Set sampling_down_factor to greater than 1 it acts as a multiplier for the scheduling interval for reevaluating load when the CPU is at its top speed due to high load. This improves performance by reducing the overhead of load evaluation and helping the CPU stay at its top speed when truly busy, rather than shifting back and forth in speed. Valid values are from 1-100000"
      cat /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor
      echo 10 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor

      "The value is in milliseconds, and the default value is set to 6ms. Valid values are from 1 to 1000. Decreasing the value will decrease latencies at the cost of decreasing throughput, while increasing it will improve throughput, but at the cost of worsening latencies"
      cat /proc/sys/kernel/rr_interval
      echo 4 > /proc/sys/kernel/rr_interval

      and adding -fno-defer-pop to cflags and cxxflags:

    3. Hi Manuel,

      if you use a Intel Core2Duo, than I would prefere X86_INTEL_PSTATE and switch from ondemand (at least on 3.11, 3.12 should be better with ondemand) to powersave (Laptop) or performance (PC) governor. I had some trouble with it, already described on this blog, with some detailed links.

      cu sysitos

    4. Thank you very much for these hints, fo far! Needs some time for testing. :-)

      Regarding the cflags & cxxflags: I'm not sure where to add the -fno-defer-pop. I've added them at the end of HOSTCFLAGS & HOSTCXXFLAGS, but reading /usr/src/linux/kernel/bounds.s this is not persistent, -fdefer-pop remains active.

      Any ideas? Thank you in advance,

    5. ./Makefile

      KBUILD_CFLAGS := -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \
      -fno-strict-aliasing -fno-common \
      -Werror-implicit-function-declaration \
      -Wno-format-security \
      -fno-delete-null-pointer-checks -fno-defer-pop

      I believe...

    6. Thank you, Anonymous @10 November 2013 12:25, that made the -fno-defer-pop persistent! -- And it seems to have a positive effect.

      I have to say, that I already had applied the ondemand-governor patches when they came up in this blog's thread.
      Setting /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor to sth. > 1 doesn't help.
      Regarding the vm tunables, I've now:
      8 > /proc/sys/vm/dirty_ratio (default openSUSE 12.3 @ 20)
      3 > /proc/sys/vm/dirty_background_ratio (default -"- @ 10)
      70 > /proc/sys/vm/swappiness (default -"- @ 60)
      Going lower with these values I'd get more stuttering in A/V playback during swapping. But it's very hard to subjectively distinguish interactivity with each variation of these three tunables.
      Additionally, setting the BFQ runtime tunable "low_latency" to 0 (i.e. off instead of 1 == on == default) results in worse interactivity. Just for the records.
      And... then I've set CONFIG_RCU_BOOST_PRIO=94 (results in -95 in htop) letting me to manually adjust the prios of kswapd, kwin, Xorg and pulseaudio via schedtool higher or lower than RCU_BOOST_PRIO. But at the latest from then I'm currently flying blindly, so far.
      Adjusting rr_interval slightly higher / lower than 6 (BFS default) both decreased interactivity, on here.

      Best regards, Manuel Krause

    7. How much does -fno-defer-pop persistent affect performance? Also, I saw you and Mike suggesting different tweaks and was wondering if there is any consensus about which values to use for the vm tunables on a desktop (activities like web browsing, watching movies, compiling...)

    8. Sorry, guys'n gals, I'm quite new with linux benchmarking.
      I want to deliver the above questioned performance-diff in the way of the previously posted ones and installed the most recent PTS and miss a GUI or a value in the console version to select "point 3 'complex system test'" as mentioned in "Anonymous 19 October 2013 06:33" in this thread.

      Thank you for any help,
      Manuel Krause

    9. #!/usr/bin/env xdg-open
      [Desktop Entry]
      Name=Phoronix Test Suite
      GenericName=Benchmarking Utility
      Comment=An Automated, Open-Source Testing Framework
      Comment[fr]=Outil pour étalonner les performances de votre ordinateur
      Exec=phoronix-test-suite interactive

    10. OMG! Sorry! It obviously was to late last night and I was too blind to find that "/usr/bin/phoronix-test-suite interactive" would lead to the selections menu. Some follow-up questions: Do you run it with X desktop up and in a console window? As root? Or booted into single user session without X?

      Thank you, Manuel Krause

    11. It's a pity -- the results don't show relly relevant performance differences in PTS complex system test (with only KDE up in a konsole) on here:

      -fdefer-pop / -fno-defer-pop
      159 / 161 Tps - PostMark
      2810.11 / 2810.07 MB/s - RAMspeed integer
      3095.08 / 3077.63 MB/s - RAMspeed floating point
      106.47 / 106.38 s - C-Ray
      6027.83 / 6003.55 Rps - Apache

      Best regards, Manuel Krause

    12. Hi Manuel,

      as I already wrote, if you are on OpenSuse (like me) or any other non Ubuntu distro, you don't need the ondemand governor patches. Ubuntu only forgot to reenable the X86_INTEL_PSTATE, which you should use for your Intel Core 2 Duo. It is much better than the other governors and handles the CPU clock speed for its own. But maybe Con could say more, if and how BFS would influence there.

      PS: As you can read in Phoronix comments to the benchmark test of the better ondemand code in Linux 3.12, these is only in Ubuntu distros without X86_INTEL_PSTATE.

      CU sysitos

    13. Mmh, I'm not sure about the X86_INTEL_PSTATE, it is activated in my kernel .config since months. But how can I be sure, that it's active or see the difference, if not? The active driver is acpi-cpufreq, the actual governor is ondemand.

      Regarding the consensus about the vm tunables, I have to say that I had ported my best-effort settings from -unicore- to my new system -dualcore-: Now seeing that Mike's (sysitos') settings are quite better for multicore systems.
      4 > /proc/sys/vm/dirty_ratio (default openSUSE 12.3 @ 20)
      2 > /proc/sys/vm/dirty_background_ratio (default -"- @ 10)
      These seem to eliminate most of the A/V hickups I had with heavy swapping.
      I still keep my opinion, that it'll remain a per system, per user job to adjust these for his/her own, what we don't want for linux future.

      Best regards, Manuel Krause

    14. Hi Manuel,

      a "ls /sys/devices/system/cpu/intel_pstate" should output something like that: "max_perf_pct min_perf_pct no_turbo". Than PSTATE is active.

      For my vm settings I even must lower it, because writing on an USB thumb drive with ntfs3g still blocks the system.
      So my /etc/sysctl.conf contains now:
      vm.dirty_bytes = 209715200
      vm.dirty_background_bytes = 104857600
      (Btw. you can use either bytes or ratio values there)

      cu sysitos

      PS: Using OpenSuse 13.1 now ;)

    15. Hi, Mike,
      the INTEL_PSTATE appears to be not supported for my "Penryn" Core2Duo P8400, neither by hardware nor by the kernel pstate driver? Kernel doesn't tell anything even if booting with commandline added "intel_pstate=enable". But I only did a short, i.e. too long, web search that didn't provide results stating at which generation of Intel cpus this feature was implemented. I assume going from Core i3 to newer.

      Btw., lowering the dirty_background_* and dirty_* in the vm would follow CK's approach until 3.7-ck1, setting both *_ratio to 1 in 'mm-decrease_default_dirty_ratio-1.patch', what he dropped since 3.8. (For my old unicore system this "=1" setup was unusable at that former times.)

      I, personally, will wait some weeks for openSUSE 13.1 and the repos to mature. I had too many bad experiences in the past when upgrading openSUSEs shortly after release date.

      Best regards, Manuel Krause

  37. under KDE as a normal user - i´m too lazy for something else

    1. sorry, something goes wrong with my firefox

  38. For those who waiting bfs on 3.12, I have ported bfs-0441 to 3.12, there is 3 conflicts, but seems that are minor ones. After resolved the conflicts and build the kernel, it runs on my core2 machine.

    Before ck release new version of bfs on 3.12, you can try this out.

    bfs patch at

    And also my patch to fix the circle dead-lock

    All credit goes to ck. :)

    1. Unfortunately, this BFS port breaks resuming from hibernation for me :(. Shows nothing but blank screen and hangs.

    2. Hi Chen,

      maybe I miss something, but could you please describe the advantages of your circle dead-lock patch and why it should be used.
      As I see in the patch, you set the mutex_lock one time for the whole systems and not like Con for every CPU (which does mean for my i7 8x lock and unlock).


      Btw, maybe you can contribute your patch to the ZEN Kernel team.

      cu sysitos

    3. @Mike ,
      the answer Chen gave me on 2013-10-10 on this page not good enough?

    4. @Ulrich,

      thx, "overread" it, because there is no dead lock boot warning here and googled for the wrong sentence. Thx.

      CU sysitos

  39. @ graysky: I really hope to see an updated version of your report ( when bfs for 3.13 comes out :-)