Announcing an updated BFS for linux-4.4 and 4.5 based kernels.
BFS by itself:
4.5-sched-bfs-469.patch
-ck branded linux-4.5-ck1 patches:
linux-4.5-ck1
This is purely a resync of BFS 467 from 4.3-ck3 to the current kernels. The only change is extra documentation of the interactive tunable in the scheduler documentation, and a build warning fix for uniprocessor builds.
While linux-4.5 is the latest kernel, as I had been slow in syncing up and missed 4.4, and given that 4.4 is deemed a Long Term Stable release, I've provided resyncs with both. Version number differences of 467/469 are only due to syncing with different kernels and otherwise they are only trivially different.
The patches are fairly new without a great deal of testing, so the usual warnings apply, but given how long it took me to getting around to catching up, I didn't want to delay releasing them.
Enjoy!
お楽しみ下さい
-ck
Muchas gracias
ReplyDeleteGreat job, CK! 4.4.6-1-ck is ticking away for me. Thank you.
ReplyDeleteThanks Con! 4.5 patch works great with kernel 4.5.0 (gentoo box) but 4.4 patch broke suspend/hibernate with kernel version 4.4.5 and 4.4.6. Everything is ok up to kernel 4.4.4.
ReplyDeletethomas
That's a shame. I only built it against 4.4.0 and don't have time to keep up with incrementals, so that pretty much ruins the whole point of providing a patch for 4.4. Oh well, maybe mainline will unintentionally unbreak it in a later 4.4 release.
DeleteYou sure it's ck1-related? Have you tried the control experiment against 4.4.5 without ck-1?
Deletehttps://bugs.archlinux.org/task/48752
Actually I now have 2 people who have reported the bug is there with mainline for them too on 4.4.6 and 4.5, so I can confirm that BFS likely isn't at fault.
DeleteIn my case gentoo sources-4.4.6 have no problem with susp/hib; the same sources with ck-1 have problem. Did not tested 4.4.5.
Deletethomas
I can confirm - while 4.4.4 ran fine for weeks, upgrading to 4.4.6 broke suspend. Machine just hangs with lights on and doesn't go to sleep.
Delete4.5.0 everything works. I certainly hope they didn't backport any relevant changes to upcoming 4.5.y incrementals.
It's probably the opposite. They may have backported changes from 4.5 into 4.4.6 in which case you could try applying the 4.5 patch onto 4.4.6 instead. Not sure if it cleanly applies or not but there were suspend changes in 4.5.
DeleteI tested and suspend works works on 4.4.6 with Intel Haswell
DeleteHi Con!
ReplyDeleteI've run into an odd situation, more than once. I run on a Btrfs root filesystem using a KDE Plasma 5 desktop.
When I forceably close Konsole while it is logged into a systemd root shell via "machinectl shell", the system will randomly lockup, and begin consuming copious amounts of CPU, enough to send the fans blowing hard and loud.
I wasn't even able to safely shutdown via magic sysrq. My only option was to risk a hard shutdown, which has unrecoverably corrupted my root Btrfs filesystem twice in a row.
Exitting the machinectl shell by typing "exit" has gone without issue... I wonder why merely closing via the titlebar as opposed to typing "exit" has such an effect.
Regardless, thank you for such a wonderful scheduler, Con! :D
Kyle
After testing various configurations, with and without BFS, I'm starting to think it's not BFS that is causing these problems.
DeletePlease ignore my previous post. :P
Kyle
Very thanks.
ReplyDeleteLightning fast.
4.5-zen.
Intel xeon.
No problems.
Much appreciated.
running without issues for couple of days on 4.5
ReplyDeleteThanks for the update
Thanks for the update, 4.4.6-ck1 works fine, suspend to disk works without problems too, with
ReplyDeleteecho shutdown > /sys/power/disk
echo disk > /sys/power/state
I compiled 4.5 kernel w/:
ReplyDelete*) ubuntu standard config
*) ck1 patch
*) BFQ
*) enabled BFQ and set timer to 300
This results in very good speed, but unfortunately, after 2 days of usage with couple of suspends in between, it crashes (nothing in logs, caps lock happily blinking) and it seems that it always crashes watching YouTube (HTML5 player).
This is not the first time like that, Xanmod crashes for me as well when it used BFS, but that was 4.4 version.
Does anyone else have crashes like I do? I need to get common denominator is it really the BFS, as it seems to me now.
HW: i7 (mobile), HD4000, 16GB RAM
Software: Ubuntu 16.04, quite a lot of stuff opened
I'll get back if I'll dig up cause.
1000 Hz tick rate.
DeleteSo, You bet crash is because of 300Hz (or 500 in xanmod's case)?
DeleteWith 300hz, it was really a sweet spot for me, laptop was NOT hot / blowing, battery life was good and desktop was fluid, but only when I compiled it with ivybridge optimizations from Your patch :)
Somewhere in BFS description, I read that we should at least select 300hz, 1K was not mandatory...
How sure are You that low hz is the culprit?
300Hz is for servers.
DeleteOn desktops/laptops you want 1000Hz.
On 1000Hz your system is also more responsive although throughput might suffer but rarely plays a role on desktops/laptops...
4.5
bfs
bfq
1000Hz
Periodic timer ticks
High Resolution Timer Support
Might not be ideal on laptops...
...but running stable on 2 workstations here, third is in the making although i freed the kernel of every energy-saving "bloat".
Will try on 2 i5 laptops soon.
Maybe something will show up.
where did you find bfq for kernel 4.5? the latest version is for 4.4
DeleteCon, hangs happen in Liquorix as well, just waaaay later, BFS accelerates hangs quite quickly (as expected).
DeleteBFQ is just a disk scheduler patch, I apply the patch and if it succeeds then it's fine.
Hi Con,
ReplyDeleteI would be interested to know your opinion of the new CPUFreq "schedutil" scheduler, by Rafael Wysocki, which will probably be making an appearance in mainline as of 4.7. The skeleton code is already present in 4.6-rc1, where it is integrated into CFS via the update_load_avg() function to provide information as to the CPU utilization. How do you foresee this working with BFS, if indeed you think it's worthwhile?
Thanks for all the good work with BFS, it's been my scheduler of choice for many years!
I have no opinion on it, but it would be trivial to implement support on BFS when it's needed.
DeleteHi Con,
ReplyDeletewhat is Your opinion on full tickless kernel (CONFIG_NO_HZ_FULL)? Are there many downsides using it on laptop machine?
It seems quite few are actually using it, I have read quite a lot of stuff on it, but still, what do You think...
hi,
ReplyDeletethere is recent research on schedulers and some testcases interesting to reproduce with BFS.
http://events.linuxfoundation.org/sites/events/files/slides/SCHED_DEADLINE-20160404.pdf
"Is The Linux Kernel Scheduler Worse Than People Realize?"
ReplyDeletewww.phoronix.com/scan.php?page=news_item&px=Linux-Kernel-Scheduler-Bad
www.ece.ubc.ca/~sasha/papers/eurosys16-final29.pdf
YES! ;)
http://www.i3s.unice.fr/~jplozi/wastedcores/files/extended_talk.pdf
DeleteFinally they come to that same conclusion Con stated more than ten years ago: CFS is broken due of too much heuristics.
DeleteFor the many CPUs of today they think of a new hierarchical scheduler. Some years ago Con had the idea to implement a recursive self forking 'many' scheduler. Though I don't know if I did fully understand back then ... but I do know that any hierarchy has its established semantic, an invested heuristic kind of ...
No doubt you are referring to this post of mine:
Deletehttps://lkml.org/lkml/2012/12/20/509
tools and patches: https://github.com/jplozi/wastedcores
Delete@ulenrich, you're confused. CFS is broken in that for the people who wrote the paper, their bigger (not fully connected) NUMA topology system produced those numbers (this is, by the way, nothing you can run BFS on). One member of our scheduler team actually tested those patches (https://lkml.org/lkml/2016/4/23/194), and saw quote, unquote, "0% improvements on the systems I tested, for some simple workloads". An actual awful NUMA bug was later fixed by Mike in that same thread (https://lkml.org/lkml/2016/4/27/63).
DeleteThe reason that CFS, not BFS, is the superior and only scheduler of choice in the mainline Linux kernel is the horrible scaling problems that BFS suffers from being used on systems with 16+ cores. For desktop CPUs, it's excellent.
CFS is a compromise on both of those points, so obviously it will be slower on what BFS addresses, which is limited to only tackle a certain market.
where can i found bfq for kernel 4.5? the katest version is for kernel 4.4
ReplyDeleteHave a look at:
Deletehttps://groups.google.com/forum/?fromgroups=#!topic/bfq-iosched/xzzmZ1Vat-8
and simply use the 4.4.0-v7r11 patches. I don't understand why they haven't released an official version so far.
BR Manuel Krause
thanks
DeleteCould you give me some pointers as to where to plug in the schedutil call into the BFS code? This is the equivalent CFS commit: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/diff/kernel/sched/fair.c?id=277edbabf6fece057b14fb6db5e3a34e00f42f42
ReplyDeleteNaturally the RT/deadline versions are much simpler, eg https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/diff/kernel/sched/rt.c?id=277edbabf6fece057b14fb6db5e3a34e00f42f42 but BFS deserves better than that.
This program summarizes scheduler run queue latency as a histogram, showing
ReplyDeletehow long tasks spent waiting their turn to run on-CPU.
github.com/iovisor/bcc/blob/master/tools/runqlat_example.txt
That's very cool. Anyone feel like using this tool to take BFS v mainline for a spin?
DeleteHeya,
ReplyDeleteWanted to poke around the code, and took a look at niffy_diff.
There is a wraparound problem on the test, if jiff_diff = -1, max_diff will always be less than niff_diff and return 1 (e.g min_diff). All other negative values do not alter niff_diff?
I assume that a negative jiff_diff is unexpected. Having two places in the code calling niffy_diff with jiff_diff = 1, I *think* you could get ride of 2 branches (atleast 1 OR branch that can not be optimized out) on that scenario case with;
niffy_diff_one(s64 *niff_diff) {
// Case to unsigned makes all negative numbers larger than JIFFIES_TO_NS(1:int).
if unlikely((u64)*niff_diff > JIFFIES_TO_NS(1)) *niff_diff = 1;
}
-1 is never supposed to happen in the first place.
DeleteHi
ReplyDeleteIm trying to port scheduler to 4.6 kernel, but found that there are too much changes in scheduler subsystem, so i cant get kernel ever compile(throw undefined reference errors at final link).
So i want ask - how long we must wait for our BestForeverScheduler for 4.6? :-)
P.S sorry for bad English. :-(
There's a 4.6-ck1 directory showing up on the FTP, but it's empty!
ReplyDeleteSuch a tease! >.<
I've resynced but it's still unstable so it's waiting for me to fix it.
Delete