-ck hacking: linux-4.8-ck6, MuQSS version 0.135

Saturday, 5 November 2016

linux-4.8-ck6, MuQSS version 0.135

Announcing a new version of MuQSS and a -ck release

4.8-ck6 patchset:
http://ck.kolivas.org/patches/4.0/4.8/4.8-ck6/

MuQSS by itself for 4.8:
4.8-sched-MuQSS_135.patch

MuQSS by itself for 4.7:
4.7-sched-MuQSS_135.patch

Git tree:
https://github.com/ckolivas/linux

A week has passed since the last major update to BFS and -ck was posted, allowing me to concentrate on receiving and responding to any bug reports. As it turns out, there were very few apart from the recurring local_softirq_pending warning/stalls. This is nice because it means MuQSS is mostly ~stable now. Mainline has even had more "stable" releases in the same time as MuQSS for 4.8, moving to 4.8.6 in the interim.

In this version I've added aggressive handling of pending softirqs in the hope the warnings and stalls all go away. The true reason the handling of softirqs are being dropped still escapes me but is likely related to the fact that MuQSS does a lot of lockless rescheduling across CPUs to decrease overhead but this does not give guarantees that locking would.

Additionally, I've added a number of APIs to the kernel to do specified millisecond schedule timeouts which use the highres timers which are mandatory now for MuQSS. The reason for doing this is there are many timeouts in the kernel that specify values below 10ms and the timer resolution at 100Hz only guarantees timeouts under 20ms.

I've also added a code sweep across the entire kernel looking for timeout calls under 50ms and use the new interface in its place. Additionally there are numerous places where schedule_timeout(1) are used in the kernel where a "minimum timeout" is expected, yet this is entirely Hz dependent, again being up to 20ms in duration. I've replaced all these with a 1ms timeout, emulating what would happen on a 1000Hz kernel, but without the overhead of running the higher Hz kernel. I'm not entirely sure this will equate to any real world improvements but the fact it's used in things like audio drivers worries me that it might.

Finally I've replaced the standard msleep call from userspace to use highres timers, in case there are userspace applications that expects msleep to actually give some kind of sleep that resembles what's asked of it, instead of something Hz limited, in case this is leading to slowdowns in userspace due to assumptions on the userspace coders' part. Calls to msleep() from userspace now give 100us accuracy at 100Hz instead of 20ms.

All these timing changes add overhead since they're trying to emulate the timing accuracy of running at 1000Hz but in a latency-focused scheduler I believe they're appropriate, and they do not incur the overhead that actually changing Hz would incur. Additionally they add accuracy to timers and timeouts that 1000Hz does not afford.

In the -ck tarball of broken-out patches, I've kept these timer changes separate to allow the muqss scheduler to be applied by itself should they prove problematic, and they will make merging with future kernels easier.

Enjoy!
お楽しみください
-ck

60 comments:

ck5 November 2016 at 16:03
Thanks. Fortunately harmless.
ReplyDelete
Replies
Anonymous5 November 2016 at 17:19
Linux 4.8 MuQSS 0.135

Get this message on startup:

[ 242.153019] snd_hda_intel 0000:00:1b.0: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj.

System hangs up whenever I have osu! running in Wine or JACK w Audacious (and may need some heavy CPU app in order to trigger this) on battery

Good news is the softirq messages are all gone
ReplyDelete
Replies
kernelOfTruth6 November 2016 at 02:06
And another one:

[ 0.064678] TSC deadline timer enabled
[ 0.064680] smpboot: CPU0: Intel(R) Xeon(R) CPU E3-1245 v3 @ 3.40GHz (family: 0x6, model: 0x3c, stepping: 0x3)
[ 0.064689] Performance Events: PEBS fmt2+, Haswell events, 16-deep LBR, full-width counters, Intel PMU driver.
[ 0.064716] ... version: 3
[ 0.064722] ... bit width: 48
[ 0.064728] ... generic registers: 4
[ 0.064734] ... value mask: 0000ffffffffffff
[ 0.064740] ... max period: 0000ffffffffffff
[ 0.064746] ... fixed-purpose events: 3
[ 0.064751] ... event mask: 000000070000000f
[ 0.078104] NMI watchdog: Disabling watchdog on nohz_full cores by default
[ 0.091452] x86: Booting SMP configuration:
[ 0.091458] .... node #0, CPUs: #1
[ 0.176792] ------------[ cut here ]------------
[ 0.176811] WARNING: CPU: 1 PID: 16 at kernel/sched/cputime.c:721 get_vtime_delta+0x87/0xb2
[ 0.176818] Modules linked in:
[ 0.176826] CPU: 1 PID: 16 Comm: migration/1 Not tainted 4.8.6_dtop-I.16 #1
[ 0.176832] Hardware name: ASUS All Series/P9D WS, BIOS 2202 05/14/2015
[ 0.176840] 0000000000000086 00000000175948c5 ffff9bdffa8e3cf0 ffffffff8956df44
[ 0.176847] 0000000000000000 0000000000000000 ffff9bdffa8e3d30 ffffffff89121073
[ 0.176855] 000002d100000000 0032dcd2b6161e57 0000000000000000 ffff9bdffa898000
[ 0.176862] Call Trace:
[ 0.176872] [] dump_stack+0x4d/0x63
[ 0.176879] [] __warn+0xc5/0xe0
[ 0.176886] [] warn_slowpath_null+0x18/0x1a
[ 0.176893] [] get_vtime_delta+0x87/0xb2
[ 0.176900] [] vtime_account_idle+0x9/0x13
[ 0.176907] [] vtime_common_task_switch+0x16/0x28
[ 0.176914] [] finish_task_switch+0xbb/0x2da
[ 0.176922] [] __schedule+0x8c8/0xb90
[ 0.176928] [] ? preempt_schedule+0x1e/0x20
[ 0.176937] [] ? ___preempt_schedule+0x16/0x18
[ 0.176945] [] ? sort_range+0x1d/0x1d
[ 0.176951] [] schedule+0x86/0xce
[ 0.176959] [] __kthread_parkme+0x39/0x5c
[ 0.176965] [] kthread+0xd6/0xe4
[ 0.176972] [] ret_from_fork+0x1f/0x40
[ 0.176979] [] ? kthread_create_on_node+0x1ac/0x1ac
[ 0.176986] ---[ end trace fb7f61e5ef93b6b1 ]---
[ 0.177115] #2 #3 #4 #5 #6 #7
[ 0.683611] x86: Booted up 1 node, 8 CPUs
[ 0.683626] smpboot: Total of 8 processors activated (54301.91 BogoMIPS)
ReplyDelete
Replies
Anonymous6 November 2016 at 18:16
Here is the log you requested: http://pastebin.com/kZpqzYKt

Also, the first time i try to record the log, the system stutter, mouse pointer jumps around, dmesg doesn't show anything unusual, but rebooting stuck forever at remount RO state, so sysrq is used to shutdown. Here is the log of that time: http://pastebin.com/EqdwGPuZ
ReplyDelete
Replies
Holger Hoffstätte7 November 2016 at 00:44
It seems that the hrtimer changes subtly broke task accounting: with 0.135 I see individual processes (esp. burners like longer-running C++ compilers) accounted with >100% CPU time (usually ~105-110%), which is clearly impossible.
ReplyDelete
Replies
Anonymous7 November 2016 at 01:44
@ck:
Regarding the 4.7 kernel 135 MuQSS patch and if you're willing to support it for another little while:
I've needed to change one hunk for kernel/sched/idle.c to eliminate a build failure:
@@ -213,7 +219,10 @@ static void cpu_idle_loop(void)

__current_set_polling();
quiet_vmstat();
- tick_nohz_idle_enter();
+ if (unlikely(softirq_pending(cpu)))
+ pending = true;
+ else
+ tick_nohz_idle_enter();

while (!need_resched()) {
check_pgt_cache();

where -----
+ if (unlikely(softirq_pending(cpu)))
should be -----
+ if (unlikely(softirq_pending(smp_processor_id())))

if I understood this correctly.

And additionally for kernels from 4.7.7 upwards the hunk for kernel/sched/sched.h should be taken from the 4.8 MuQSS patch.
Then, kernel compiles and works fine :-) Thank you Con!
(I hope the above cited patch lines are understandable.)

BR, Manuel Krause
ReplyDelete
Replies
Anonymous8 November 2016 at 05:08
Thanks as always Con.

I've run the usual benchmarks. I used my desktop this time (Intel Haswell 4770k). The fan on my laptop is starting to fail. So no comparison with older MuQSS, sorry.

I've put the results in a new spreadsheet:
https://docs.google.com/spreadsheets/d/163U3H-gnVeGopMrHiJLeEY1b7XlvND2yoceKbOvQRm4/edit?usp=sharing

Nothing new in terms of throughput. MuQSS is roughly on par with CFS, except under partial load (make j2 and j4).

I also ran interbench -L 8 on CFS@300Hz, CFS@1000Hz and MuQSS135@100Hz, with intel_pstate+powersave frequency governor.
It doesn't show differences between all the kernels, so I wonder if I did things right. I used interbench from your git repo.

Pedro
ReplyDelete
Replies
Anonymous8 November 2016 at 11:41
1000 Hz, periodic timer ticks, nice low latency with rr_interval of 3.
Impressed.

100Hz will make latency worse? yes? no?
ReplyDelete
Replies
Anonymous8 November 2016 at 17:23
My laptop didn't completely turned off (keyboard and screen light still on, screen blanked, laptop get hot) after suspend. I was still able to wake the system up using the power button, and was able to retrieve this log: http://pastebin.com/zJifGCWm
ReplyDelete
Replies
Anonymous10 November 2016 at 19:24
I'm using your patches since a lot of time to give some life to an old atom Z520 with great satisfaction.

Since MuQSS (now i'm using version 0.135 with HZ 100) i'm experiencing random panics at boot.
When the boot process doesn't hangs, the system seems to work very well (except hybernation - but i'm not sure it is related)

The same kernel without MuQSS works and boots with no problems.

Here attached a part of the trace:
[] __hrtimer_run_queues+0xcb/0/x2a0
[] ? perf_trace_sched_switch+0x180/0x180
[] hrtimer_interrupt+0x8a/0x180
[] local_apic_timer_interrupt+0x32/0x60
[] smp_apic_timer_interrupt+0x34/0x3c
[] apic_timer_interrupt+0x34/0x3c
ReplyDelete
Replies
ck11 November 2016 at 09:01
These warnings should now be fixed in git.
ReplyDelete
Replies
Anonymous11 November 2016 at 17:00
Does VirtualBox require cgroups or Isochronous scheduling?
https://lkml.org/lkml/2016/10/29/4
Windows 7, Ubuntu, and Arch guests on Arch host extremely sluggish to load and run, audio is broken and lags rendering guests unuseable, WinXP guest blue screens on launch. Arch Linux 4.8.6-2-ck (piledriver) since MuQSS.
ReplyDelete
Replies
Anonymous12 November 2016 at 01:38
Turn out the problem with osu! only appear if you set the FPS limiter to Unlimited. I'm unable to reproduce the problem with FPS limiter set to 120fps or 240fps.

You may need a complex map such as this one: https://osu.ppy.sh/s/157896

Play the map on Xtra difficulty with Auto mod (press F2 and select Auto).

I'm currently using MuQSS 1769b2d on Linux 4.8.
Intel HD Graphics 5500.
ReplyDelete
Replies
Anonymous12 November 2016 at 06:07
Thank you for reply ck - I figured it did but could only find cgroups reference to a *.slice file on pc regarding vbox. Setting cgroups to work with muqss way over my head for now, but again thanks.
ReplyDelete
Replies

Add comment