-ck hacking: MuQSS - The Multiple Queue Skiplist Scheduler v0.112

Monday, 17 October 2016

MuQSS - The Multiple Queue Skiplist Scheduler v0.112

Here's an updated version of MuQSS.

For 4.8.*:
4.8-sched-MuQSS_112.patch

For 4.7.*:
4.7-sched-MuQSS_112.patch

Git tree here as 4.7-muqss or 4.8-muqss branches:
https://github.com/ckolivas/linux

It's getting close now to the point where it can replace BFS in -ck releases. Thanks to the many people testing and reporting back, some other misbehaviours were discovered and their associated fixes have been committed.

In particular,
- Balancing across CPUs was not looking at higher and lower scheduling policies correctly (SCHED_ISO, SCHED_IDLEPRIO and realtime policies)
- A serious stall/hang could happen with tasks using sched_yield (such as f@h client and numerous GPU drivers)
- Some minor accounting issues on new tasks with affinity set were fixed
- Overhead was further decreased on task selection
- Spurious preemption on CPUs where the preempted task had already gone are now avoided
- Spurious wakeup on CPUs that were assumed and are no longer idle are avoided
- A potential race in suspending to ram was fixed
- Old unused code from BFS was removed, along with unnecessary intermediate variables.
- Clean ups
- Some work towards actually documenting MuQSS in Documentation/scheduler/sched-MuQSS.txt was done, though incomplete.

Enjoy!
お楽しみ下さい
-ck

22 comments:

Holger Hoffstätte17 October 2016 at 23:05
I've been trying various versions of MuQSS (while staying on BFS for production/work) and muqss-112 is the first one that passes my "wiggle test" - niced kernel build on all 8 vcores, playing a HD movie in vlc and frantically jiggling a terminal window around no longer causes stalls or jerks; it's completely smooth all the time,as if idle. Well done! \o/

One suggestion: I've noticed that the global_rq contains various atomic_t counters. It might be a good idea to make them cacheline-aligned so that they don't incur false sharing, which can lead to pretty pathological stalls esp. with contended SMT threads. I can create a GH pull if you like.
ReplyDelete
Replies
jwh717 October 2016 at 23:46
Running x86-64 MuQSS v112 no problems; also did a suspend/resume for good measure. Looking good!
ReplyDelete
Replies
Anonymous18 October 2016 at 05:34
Thanks Con.
Updated the results with MuQSS112 (interactive=1 only).
https://docs.google.com/spreadsheets/d/1ZfXUfcP2fBpQA6LLb-DP6xyDgPdFYZMwJdE0SQ6y3Xg/edit?usp=sharing

The performance is more or less the same as with MuQSS110, with a slight improvement on make -j2.

Pedro
ReplyDelete
Replies
kernelOfTruth18 October 2016 at 08:04
Con,

I'm not certain whether it's only imagination but with .112 reaction with compiz is very very good:
no delays during app switching right now

Might run compiz with sched_yield later to see how that works out

also portage (Gentoo Linux' package manager) seems to work really quickly with it

Once a new Chromium version is out I'll do a compilation test (update) and see if I can run additional backup jobs to really stress the system to see if I can still do work (that would be close to the ultimate test, well - in the extreme probably adding a game to the mix - we'll see about that ;) )

Thanks !
ReplyDelete
Replies
ulenrich18 October 2016 at 19:06
To run this new scheduler on a 2009 Mac Mini Core Duo Intel processor machine: How much a slow down compared to using BFS would one experience?
How big an overhead of this "it takes a thousend" cpus scheduler is it?
ReplyDelete
Replies
ulenrich18 October 2016 at 20:44
Yeah, I can remember: You wrote about some years ago how to allocate dynamically more scheduler units (run queues or some alike) ....

The old talking point of BFS that at some number of processors BFS isn't well performing ... isn't any more!
Also this was an obstacle going mainline, wasn't it?
ReplyDelete
Replies
Anonymous19 October 2016 at 05:40
@ck:
The issues I've reported last time for v0.111 have completely gone away with MuQSS v0.112 (without changes to the rest of the system software).
Thank you for your great work!
With this test run I've also been lucky to find a tunable again for my ancient TOI revision, named "no_flusher_thread", that, defaulting to 1 and now set to 0, makes the whole combination (MuQSS, BFQ, WBT, TOI) work fine without failures or performance regression. I'm glad that I can report 10 successful hibernations, done from time to time, within 1 1/2 days uptime atm.
Maybe that tunable eases some race condition/ timing issue, that an effective MuQSS brings into that old TOI algorithms. Painful, that I don't have enough programming knowledge to interpret it in depth.

BR, Manuel Krause
ReplyDelete
Replies
jwh719 October 2016 at 23:54
@ck, my i686 noSMP build fails with the latest patches thru 35d6279 (Merge branch '4.8-muqss'), but it succeeded with 1b7e569 (cacheline alignment). x64 built fine with the latest though; I'm running it now.

Here are the build log output snips: https://gist.github.com/jeremywh7/944c10e189300086f1de58b8fa7fc0b4#gistcomment-1901214
ReplyDelete
Replies
Anonymous28 October 2016 at 04:31
Running MuQSS (by means of the Liquorix kernel), also see
https://liquorix.net/atom

Just earlier, the combination of PulseAudio suspended via pasuspender (to run an application using ALSA) while alt-tabbing to Google Chrome to doublecheck something caused an unrecoverable stall, a clean shutdown was no longer viable (it would've probably taken hours, everything was incredibly slow).
ReplyDelete
Replies

Add comment