Friday, 25 March 2016

BFS 469, linux-4.4-ck1, linux-4.5-ck1

Announcing an updated BFS for linux-4.4 and 4.5 based kernels.

BFS by itself:
4.5-sched-bfs-469.patch

-ck branded linux-4.5-ck1 patches:
linux-4.5-ck1



This is purely a resync of BFS 467 from 4.3-ck3 to the current kernels. The only change is extra documentation of the interactive tunable in the scheduler documentation, and a build warning fix for uniprocessor builds.


While linux-4.5 is the latest kernel, as I had been slow in syncing up and missed 4.4, and given that 4.4 is deemed a Long Term Stable release, I've provided resyncs with both. Version number differences of 467/469 are only due to syncing with different kernels and otherwise they are only trivially different.


The patches are fairly new without a great deal of testing, so the usual warnings apply, but given how long it took me to getting around to catching up, I didn't want to delay releasing them.


Enjoy!
お楽しみ下さい
-ck

42 comments:

  1. Muchas gracias

    ReplyDelete
  2. Great job, CK! 4.4.6-1-ck is ticking away for me. Thank you.

    ReplyDelete
  3. Thanks Con! 4.5 patch works great with kernel 4.5.0 (gentoo box) but 4.4 patch broke suspend/hibernate with kernel version 4.4.5 and 4.4.6. Everything is ok up to kernel 4.4.4.

    thomas

    ReplyDelete
    Replies
    1. That's a shame. I only built it against 4.4.0 and don't have time to keep up with incrementals, so that pretty much ruins the whole point of providing a patch for 4.4. Oh well, maybe mainline will unintentionally unbreak it in a later 4.4 release.

      Delete
    2. You sure it's ck1-related? Have you tried the control experiment against 4.4.5 without ck-1?

      https://bugs.archlinux.org/task/48752

      Delete
    3. Actually I now have 2 people who have reported the bug is there with mainline for them too on 4.4.6 and 4.5, so I can confirm that BFS likely isn't at fault.

      Delete
    4. In my case gentoo sources-4.4.6 have no problem with susp/hib; the same sources with ck-1 have problem. Did not tested 4.4.5.

      thomas

      Delete
    5. I can confirm - while 4.4.4 ran fine for weeks, upgrading to 4.4.6 broke suspend. Machine just hangs with lights on and doesn't go to sleep.

      4.5.0 everything works. I certainly hope they didn't backport any relevant changes to upcoming 4.5.y incrementals.

      Delete
    6. It's probably the opposite. They may have backported changes from 4.5 into 4.4.6 in which case you could try applying the 4.5 patch onto 4.4.6 instead. Not sure if it cleanly applies or not but there were suspend changes in 4.5.

      Delete
    7. I tested and suspend works works on 4.4.6 with Intel Haswell

      Delete
  4. Hi Con!

    I've run into an odd situation, more than once. I run on a Btrfs root filesystem using a KDE Plasma 5 desktop.

    When I forceably close Konsole while it is logged into a systemd root shell via "machinectl shell", the system will randomly lockup, and begin consuming copious amounts of CPU, enough to send the fans blowing hard and loud.

    I wasn't even able to safely shutdown via magic sysrq. My only option was to risk a hard shutdown, which has unrecoverably corrupted my root Btrfs filesystem twice in a row.

    Exitting the machinectl shell by typing "exit" has gone without issue... I wonder why merely closing via the titlebar as opposed to typing "exit" has such an effect.

    Regardless, thank you for such a wonderful scheduler, Con! :D

    Kyle

    ReplyDelete
    Replies
    1. After testing various configurations, with and without BFS, I'm starting to think it's not BFS that is causing these problems.

      Please ignore my previous post. :P

      Kyle

      Delete
  5. Very thanks.
    Lightning fast.
    4.5-zen.
    Intel xeon.
    No problems.
    Much appreciated.


    ReplyDelete
  6. running without issues for couple of days on 4.5
    Thanks for the update

    ReplyDelete
  7. Thanks for the update, 4.4.6-ck1 works fine, suspend to disk works without problems too, with
    echo shutdown > /sys/power/disk
    echo disk > /sys/power/state

    ReplyDelete
  8. I compiled 4.5 kernel w/:
    *) ubuntu standard config
    *) ck1 patch
    *) BFQ
    *) enabled BFQ and set timer to 300

    This results in very good speed, but unfortunately, after 2 days of usage with couple of suspends in between, it crashes (nothing in logs, caps lock happily blinking) and it seems that it always crashes watching YouTube (HTML5 player).
    This is not the first time like that, Xanmod crashes for me as well when it used BFS, but that was 4.4 version.

    Does anyone else have crashes like I do? I need to get common denominator is it really the BFS, as it seems to me now.

    HW: i7 (mobile), HD4000, 16GB RAM
    Software: Ubuntu 16.04, quite a lot of stuff opened

    I'll get back if I'll dig up cause.

    ReplyDelete
    Replies
    1. So, You bet crash is because of 300Hz (or 500 in xanmod's case)?
      With 300hz, it was really a sweet spot for me, laptop was NOT hot / blowing, battery life was good and desktop was fluid, but only when I compiled it with ivybridge optimizations from Your patch :)

      Somewhere in BFS description, I read that we should at least select 300hz, 1K was not mandatory...

      How sure are You that low hz is the culprit?

      Delete
    2. 300Hz is for servers.
      On desktops/laptops you want 1000Hz.

      On 1000Hz your system is also more responsive although throughput might suffer but rarely plays a role on desktops/laptops...

      4.5
      bfs
      bfq
      1000Hz
      Periodic timer ticks
      High Resolution Timer Support

      Might not be ideal on laptops...
      ...but running stable on 2 workstations here, third is in the making although i freed the kernel of every energy-saving "bloat".

      Will try on 2 i5 laptops soon.
      Maybe something will show up.

      Delete
    3. where did you find bfq for kernel 4.5? the latest version is for 4.4

      Delete
    4. Con, hangs happen in Liquorix as well, just waaaay later, BFS accelerates hangs quite quickly (as expected).

      BFQ is just a disk scheduler patch, I apply the patch and if it succeeds then it's fine.

      Delete
  9. Hi Con,

    I would be interested to know your opinion of the new CPUFreq "schedutil" scheduler, by Rafael Wysocki, which will probably be making an appearance in mainline as of 4.7. The skeleton code is already present in 4.6-rc1, where it is integrated into CFS via the update_load_avg() function to provide information as to the CPU utilization. How do you foresee this working with BFS, if indeed you think it's worthwhile?

    Thanks for all the good work with BFS, it's been my scheduler of choice for many years!

    ReplyDelete
    Replies
    1. I have no opinion on it, but it would be trivial to implement support on BFS when it's needed.

      Delete
  10. Hi Con,

    what is Your opinion on full tickless kernel (CONFIG_NO_HZ_FULL)? Are there many downsides using it on laptop machine?
    It seems quite few are actually using it, I have read quite a lot of stuff on it, but still, what do You think...

    ReplyDelete
  11. hi,
    there is recent research on schedulers and some testcases interesting to reproduce with BFS.
    http://events.linuxfoundation.org/sites/events/files/slides/SCHED_DEADLINE-20160404.pdf

    ReplyDelete
  12. "Is The Linux Kernel Scheduler Worse Than People Realize?"

    www.phoronix.com/scan.php?page=news_item&px=Linux-Kernel-Scheduler-Bad
    www.ece.ubc.ca/~sasha/papers/eurosys16-final29.pdf

    YES! ;)

    ReplyDelete
    Replies
    1. http://www.i3s.unice.fr/~jplozi/wastedcores/files/extended_talk.pdf

      Delete
    2. Finally they come to that same conclusion Con stated more than ten years ago: CFS is broken due of too much heuristics.

      For the many CPUs of today they think of a new hierarchical scheduler. Some years ago Con had the idea to implement a recursive self forking 'many' scheduler. Though I don't know if I did fully understand back then ... but I do know that any hierarchy has its established semantic, an invested heuristic kind of ...

      Delete
    3. No doubt you are referring to this post of mine:
      https://lkml.org/lkml/2012/12/20/509

      Delete
    4. tools and patches: https://github.com/jplozi/wastedcores

      Delete
    5. @ulenrich, you're confused. CFS is broken in that for the people who wrote the paper, their bigger (not fully connected) NUMA topology system produced those numbers (this is, by the way, nothing you can run BFS on). One member of our scheduler team actually tested those patches (https://lkml.org/lkml/2016/4/23/194), and saw quote, unquote, "0% improvements on the systems I tested, for some simple workloads". An actual awful NUMA bug was later fixed by Mike in that same thread (https://lkml.org/lkml/2016/4/27/63).

      The reason that CFS, not BFS, is the superior and only scheduler of choice in the mainline Linux kernel is the horrible scaling problems that BFS suffers from being used on systems with 16+ cores. For desktop CPUs, it's excellent.

      CFS is a compromise on both of those points, so obviously it will be slower on what BFS addresses, which is limited to only tackle a certain market.

      Delete
  13. where can i found bfq for kernel 4.5? the katest version is for kernel 4.4

    ReplyDelete
    Replies
    1. Have a look at:
      https://groups.google.com/forum/?fromgroups=#!topic/bfq-iosched/xzzmZ1Vat-8

      and simply use the 4.4.0-v7r11 patches. I don't understand why they haven't released an official version so far.

      BR Manuel Krause

      Delete
  14. Could you give me some pointers as to where to plug in the schedutil call into the BFS code? This is the equivalent CFS commit: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/diff/kernel/sched/fair.c?id=277edbabf6fece057b14fb6db5e3a34e00f42f42

    Naturally the RT/deadline versions are much simpler, eg https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/diff/kernel/sched/rt.c?id=277edbabf6fece057b14fb6db5e3a34e00f42f42 but BFS deserves better than that.

    ReplyDelete
  15. This program summarizes scheduler run queue latency as a histogram, showing
    how long tasks spent waiting their turn to run on-CPU.
    github.com/iovisor/bcc/blob/master/tools/runqlat_example.txt

    ReplyDelete
    Replies
    1. That's very cool. Anyone feel like using this tool to take BFS v mainline for a spin?

      Delete
  16. Heya,

    Wanted to poke around the code, and took a look at niffy_diff.

    There is a wraparound problem on the test, if jiff_diff = -1, max_diff will always be less than niff_diff and return 1 (e.g min_diff). All other negative values do not alter niff_diff?

    I assume that a negative jiff_diff is unexpected. Having two places in the code calling niffy_diff with jiff_diff = 1, I *think* you could get ride of 2 branches (atleast 1 OR branch that can not be optimized out) on that scenario case with;

    niffy_diff_one(s64 *niff_diff) {
    // Case to unsigned makes all negative numbers larger than JIFFIES_TO_NS(1:int).
    if unlikely((u64)*niff_diff > JIFFIES_TO_NS(1)) *niff_diff = 1;
    }

    ReplyDelete
    Replies
    1. -1 is never supposed to happen in the first place.

      Delete
  17. Hi
    Im trying to port scheduler to 4.6 kernel, but found that there are too much changes in scheduler subsystem, so i cant get kernel ever compile(throw undefined reference errors at final link).
    So i want ask - how long we must wait for our BestForeverScheduler for 4.6? :-)
    P.S sorry for bad English. :-(

    ReplyDelete
  18. There's a 4.6-ck1 directory showing up on the FTP, but it's empty!

    Such a tease! >.<

    ReplyDelete
    Replies
    1. I've resynced but it's still unstable so it's waiting for me to fix it.

      Delete