-ck hacking: MuQSS - The Multiple Queue Skiplist Scheduler v0.111

Tuesday, 11 October 2016

MuQSS - The Multiple Queue Skiplist Scheduler v0.111

Lots of bugfixes, lots of improvements, build fixes, you name it.

For 4.8:
4.8-sched-MuQSS_111.patch

For 4.7:
4.7-sched-MuQSS_111.patch

And in a complete departure from BFS, a git tree (which suits constant development like this, unlike BFS's stable release massive ports):

https://github.com/ckolivas/linux

Look in the pending/ directory to see all the patches that went into this or read the git changelog. In particular numerous warnings were fixed, throughput improved compared to 108, SCHED_ISO was rewritten for multiple queues, potential races/crashes were addressed, and build fixes for different configurations were committed.

I haven't been able to track the bizarre latency issues reported by runqlat and when I try to reproduce it myself I get nonsense values of latency greater than the history of the earth so I suspect an interface bug with BPF reporting values. It doesn't seem to affect actual latency in any way.

EDIT: Updated to version 0.111 which has a fix for suspend/resume.

Enjoy!
お楽しみ下さい
-ck

48 comments:

Anonymous11 October 2016 at 23:46
Thanks Con.
I've updated the results with muqss110, both throughput and runqlat.
https://docs.google.com/spreadsheets/d/1ZfXUfcP2fBpQA6LLb-DP6xyDgPdFYZMwJdE0SQ6y3Xg/edit?usp=sharing
I've put the old muqss108 results in the sheet 'dev 4.8 muqss' together with all old muqss releases.

For the runqlat tests, did you apply https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=58bfea9532552d422bde7afa207e1a0f08dffa7d
Because runqlat is broken since 4.8-rc4 (https://github.com/iovisor/bcc/issues/728).

Pedro
ReplyDelete
Replies
Anonymous12 October 2016 at 07:21
@ck:
Something makes my TOI port fail completely already at writing kernel data during 1st hibernation attempt with v0.110. I've bisected it down to muqss108-008-delay_cpu_switch.patch, introducing this.
Maybe you find time to have a look at the introduced code.
It's still with kernel 4.7.7.
In the meantime, I'll now finally rebuild some BFQ/WBT/TOI free kernels, stepwise, to see, if they spit out messages/ traces or such developers' delights. ;-)
I really don't want to bother you with this, but we need a fix for this anyways.

BR, Manuel Krause
ReplyDelete
Replies
jwh712 October 2016 at 09:48
I also noticed a fail to resume (from S3) on my x64 PC with my latest update (through patch 11 I think), while the previous (through patch 3) was fine. So good to see it's getting sorted out; thanks guys!
ReplyDelete
Replies
Anonymous12 October 2016 at 16:37
@Con,

I added 4.8.1 mux results to my Unigine benchmark gsheets. Performance are good.

There is one more thing I wanted to write about.
See, I use VRQ kernel (which is Alfreds improvements over BFS) on my host machine and I compile kernels in Ubuntu (6 vCPU) VBox VM, which till yesterday had an Ubuntu kernel. After compilation of MUX 110 I installed that in VM just to see how it fares in VM.
The results were astonishing, usually compilation of 4.7.7 kernel takes ~ 2:15, now with VRQ host and MUX guest kernels compilation time is almost halved, it's ~ 1:15, which is huuuuuge improvement for me.
To me it seems that it does not matter whether there is MUX or VRQ kernels in both, the thing is that the scheduler matters! I Could see that compilation took over all 6 vCPU at about 575% CPU usage that is, usually it does not go higher than 300-350%.
Thanks!

Br, Eduardo
ReplyDelete
Replies
ck13 October 2016 at 11:52
Updated to version 0.111 which has a fix for suspend/resume, but otherwise has no performance/behavioural changes.
ReplyDelete
Replies
Anonymous14 October 2016 at 02:00
I have tested this on older single core (2.7GHz) machine and it's good, but there's some freezing on playing/changing song when cpu is on load.

Xmorph.
ReplyDelete
Replies
jwh714 October 2016 at 12:14
I have the old netbook running 4.8.1 v111 noSM{P,T} without issue (last -ck for it was 4.3.1!); I also now have the x64 PC running with the "clean up bind_zero" from git (fdd879d*700e92a) as well.
ReplyDelete
Replies
kernelOfTruth15 October 2016 at 00:15
Hi Con,

Kernel compilation of a new kernel with 0.110 was also one of the lowest so far (less than 8 minutes),

tested GRID Autosport and it was really smooth - no stuttering observable,

smooth video playback with MPV while running compiz (composited desktop) also worked kind of fine,

will test now v111 with pending changes from github

Thanks A LOT !
ReplyDelete
Replies
Anonymous16 October 2016 at 02:10
Hi,
It seems that there is some bad interactions between muqss 0.111 and the Nvidia blob driver.
Today I switched to muqss and my PC has freezed for 10 seconds when playing Starcraft 2 with wine then the computer has recovered. 2 hours later another freeze not recoverable (the machine was still reachable via ssh).
dmesg after the first freeze shows this:
Oct 15 14:36:31 kernel: NVRM: GPU at PCI:0000:04:00: GPU-6b577e9d-dc5f-ad2d-f8a0-c23db512691f
Oct 15 14:36:31 kernel: NVRM: Xid (PCI:0000:04:00): 8, Channel 0000002b
Oct 15 14:36:33 kernel: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 15 14:36:37 kernel: NVRM: Xid (PCI:0000:04:00): 56, CMDre 00000001 00000080 00000000 00000005 00000034

No messages for the second freeze.
With BFS, Starcraft 2+wine has always worked flawlessly.
The only things that have changed is: kernel 4.8 ==> 4.8.1 and BFS 0.512 ==> MuQSS 0.111 (no nvidia driver change).
Now, I've switched back to 4.8.1 and BFS 0.512.
ReplyDelete
Replies
Anonymous17 October 2016 at 07:38
With the last 12 pending commits upon 111 applied, superuser applications won't get more cpu attention than any normal user program, they seem to get _lower_ attention.

Please look on that, BR, Manuel Krause
ReplyDelete
Replies
ck17 October 2016 at 09:29
Found a nasty behavioural bug when sched_yield is called in interactive=1, which a lot of GPU drivers and compositing managers use. It would lead to serious stalls and misbehaviour. It has been fixed in git and will be in the next released version which I'm planning to do today.
ReplyDelete
Replies
Anonymous17 October 2016 at 16:41
@Con,

I encountered this in version 111: http://pastebin.com/cPzRSxFx
The same old smp_processor_id bug.
Is this fixed in version 112 as well?

br, Eduardo
ReplyDelete
Replies

Add comment