-ck hacking: linux-4.8-ck4, MuQSS CPU scheduler v0.116

Monday, 24 October 2016

linux-4.8-ck4, MuQSS CPU scheduler v0.116

Yet another bugfix release for MuQSS and the -ck patchset with one of the most substantial latency fixes yet. Everyone should upgrade if they're on a previous 4.8 patchset of mine. Sorry about the frequency of these releases but I just can't allow a known buggy release be the latest version.

4.8-ck4 patchset:
http://ck.kolivas.org/patches/4.0/4.8/4.8-ck4/

MuQSS by itself for 4.8:
4.8-sched-MuQSS_116.patch

MuQSS by itself for 4.7:
4.7-sched-MuQSS_116.patch

I'm hoping this is the release that allows me to not push any more -ck versions out till 4.9 is released since it addresses all remaining issues that I know about.

A lingering bug that has been troubling me for some time was leading to occasional massive latencies and thanks to some detective work by Serge Belyshev I was able to narrow it down to a single line fix which dramatically improves worst case latency when measured. Throughput is virtually unchanged. The flow-on effect to other areas was also apparent with sometimes unused CPU cycles and weird stalls on some workloads.

Sched_yield was reverted to the old BFS mechanism again which GPU drivers prefer but it wasn't working previously on MuQSS because of the first bug. The difference is substantial now and drivers (such as nvidia proprietary) and apps that use it a lot (such as the folding @ home client) behave much better now.

The late introduced bugs that got into ck3/muqss115 were reverted.

The results come up quite well now with interbench (my latency under load benchmark) which I have recently updated and should now give sensible values:

https://github.com/ckolivas/interbench

If you're baffled by interbench results, the most important number is %deadlines met which should be as close to 100% as possible followed by max latency which should be as low as possible for each section. In the near future I'll announce an official new release version.

Pedro in the comments section previously was using runqlat from bcc tools to test latencies as well, but after some investigation it became clear to me that the tool was buggy and did not work properly with bfs/muqss either so I've provided a slightly updated version here which should work properly:

runqlat.py

Enjoy!
お楽しみ下さい
-ck

52 comments:

Anonymous24 October 2016 at 22:19
Hi Con,

thanks for your fantastic job with BFS and now with MuQSS.

I have been an anonymous user of BFS for years now and I have been doing my own testing on MuQSS.

MuQSS have been working flawless until V0.111 in my laptop. When I tried to compile V0.115 I noticed a regression in the compile kernel. Basically the kernel with V0.115 when on full load starts to stall progressively and eventually freezes the laptop.

My concept of full load is to compile a new kernel with 4 processes, Web browser running and running glxgears with optirun (nothing outstanding).

The laptop is an HP with Processor: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz, 8GB ram and using p-state powersave. The laptop has a hybrid GPU (Intel+NVidia).

I trace the problem back to a commit between V0.111 and V0.112, more exactly the commit fdd879d37e6ca088410511e9f1146c328700e92a.

I'm writing this message from a kernel with V0.111 and all the patchs up to V0.112 with exception of the one in the commit fdd879d37e6ca088410511e9f1146c328700e92a.

Also tested V0.116 and the problem still persists in this version.

Any idea what could be happening?

Cheers

PB
ReplyDelete
Replies
Anonymous24 October 2016 at 22:45
You got it Con. No more regression with intel pstate.

MuQSS112+intel-pstate powersave (linux 4.8.1)
make j2 317.21
bz2 62.11
xz 104.78

MuQSS116+intel-pstate powersave (linux 4.8.4)
make j2 341.32
bz2 60.23
xz 105.60

And thanks for the modified runqlat tool. I'll test it later.
Can I use this one also on cfs for a fair comparison ?

Pedro
ReplyDelete
Replies
Holger Hoffstätte25 October 2016 at 00:48
With v116 I'm now (again) seeing "NOHZ: local_softirq_pending 10" messages under load. First saw those with early versions, but I'm sure they went away at some point (with or after v112 I think). Some kind of starvation?
ReplyDelete
Replies
monotykamary26 October 2016 at 02:16
When opening osu! through wine, ck3 patch would freeze the whole system, ck4 would make the whole system stutter close to a hang.

This is the journal from the moment of the crash. Not too sure if it logged it correctly:
10月 25 13:30:27 pulseaudio[474]: E: [pulseaudio] bluez5-util.c: GetManagedObjects() failed: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: t
10月 25 13:30:27 dbus[350]: [system] Failed to activate service 'org.bluez': timed out
10月 25 13:30:32 dbus[350]: [system] Activating via systemd: service name='org.bluez' unit='dbus-org.bluez.service'
10月 25 13:30:44 dbus-daemon[453]: Activating service name='org.gnome.GConf'
10月 25 13:30:44 dbus-daemon[453]: Successfully activated service 'org.gnome.GConf'
10月 25 13:30:57 dbus[350]: [system] Failed to activate service 'org.bluez': timed out
10月 25 13:31:23 dbus[350]: [system] Activating via systemd: service name='org.freedesktop.UDisks2' unit='udisks2.service'
10月 25 13:31:23 systemd[1]: Starting Disk Manager...
10月 25 13:31:23 udisksd[2015]: udisks daemon version 2.1.7 starting
10月 25 13:31:24 dbus[350]: [system] Successfully activated service 'org.freedesktop.UDisks2'
10月 25 13:31:24 systemd[1]: Started Disk Manager.
10月 25 13:31:24 udisksd[2015]: Acquired the name org.freedesktop.UDisks2 on the system message bus
10月 25 13:31:41 wpa_supplicant[392]: wlp6s0: CTRL-EVENT-SCAN-FAILED ret=-22
10月 25 13:32:13 systemd[1]: Starting CUPS Scheduler...
10月 25 13:32:14 dbus[350]: [system] Activating via systemd: service name='org.freedesktop.ColorManager' unit='colord.service'
10月 25 13:32:14 systemd[1]: Starting Manage, Install and Generate Color Profiles...
10月 25 13:32:14 dbus[350]: [system] Successfully activated service 'org.freedesktop.ColorManager'
10月 25 13:32:14 systemd[1]: Started Manage, Install and Generate Color Profiles.
10月 25 13:32:14 colord[2239]: failed to get session [pid 2238]: No such device or address
10月 25 13:32:15 systemd[1]: Started CUPS Scheduler.
10月 25 13:35:44 wpa_supplicant[392]: wlp6s0: CTRL-EVENT-SCAN-FAILED ret=-22
10月 25 13:37:40 systemd-logind[347]: Power key pressed.
10月 25 13:37:40 systemd-logind[347]: Powering Off...
1
ReplyDelete
Replies
Anonymous26 October 2016 at 05:50
I have noticed that the game ark survival evolved runs bad with ck kernel and amd Cool n Quiet enabled. Im getting very short fps dropdowns to 10fps or 15fps every 1 or 2 seconds. with stock kernel I get at worse 20fps but only when moving fast around otherwise it's around 30fps or more. disabling amd cool and quiet is also working. I have noticed that in ark the cpu usage is spiking a lot maybe sth isn't fast enough to clock the cpu higher? the cpu clocks ingame around 1400mhz and 4200mhz. I have a amd fx-8350 and a nvidia gtx1070 (just got it and was a bit sad about those stutters)
ReplyDelete
Replies
Anonymous26 October 2016 at 05:57
You might try applying all the four patches which have been commited after v0.116 was tagged (https://github.com/ckolivas/linux/commits/4.8-muqss).
ReplyDelete
Replies
Anonymous26 October 2016 at 12:02
@CK:
I commented success too early.
Got the following after third resume attempt:
http://pastebin.com/dReCcQGm

Maybe you'd understand it.
BR, Manuel Krause
ReplyDelete
Replies
Anonymous27 October 2016 at 00:44
Would just like to report that latest 4 patches (after 0.116 -ck4) fixed problems with games for me, mainly Dota 2.
(Jitter, mini-freezes every 10 seconds and FPS fluctuations)
I am using latest legacy Nvidia drivers 340.98.
So thank you.
ReplyDelete
Replies
Anonymous28 October 2016 at 01:52
@ck:
Out of curiosity I've also added the two newest commits upon v0.116 (all six) and enabled the new possible settings:
CONFIG_TICK_CPU_ACCOUNTING=y
CONFIG_RCU_NOCB_CPU=y
CONFIG_RCU_NOCB_CPU_ALL=y
(Being in doubt, whether the latter two appear useful.)

At resume I get the following warning, you may want to have a look at: http://pastebin.com/FPAbxAv0

BR, Manuel Krause
ReplyDelete
Replies
ck29 October 2016 at 08:09
New muqss and -ck version coming out soon so if you were about to build a new kernel for 4.8.5, hold out a few hours :)
ReplyDelete
Replies

Add comment