-ck hacking: linux-4.8-ck8, MuQSS version 0.144

Tuesday, 22 November 2016

linux-4.8-ck8, MuQSS version 0.144

Here's a new release to go along with and commemorate the 4.8.10 stable release (they're releasing stable releases faster than my development code now.)

linux-4.8-ck8 patch:
patch-4.8-ck8.lrz

MuQSS by itself:
4.8-sched-MuQSS_144.patch

There are a small number of updates to MuQSS itself.
Notably there's an improvement in interactive mode when SMT nice is enabled and/or realtime tasks are running, or there are users of CPU affinity. Tasks previously would not schedule on CPUs when they were stuck behind those as the highest priority task and it would refuse to schedule them transiently.
The old hacks for CPU frequency changes from BFS have been removed, leaving the tunables to default as per mainline.
The default of 100Hz has been removed, but in its place a new and recommended 128Hz has been implemented - this just a silly microoptimisation to take advantage of the fast shifts that /128 has on CPUs compared to /100, and is close enough to 100Hz to behave otherwise the same.

For the -ck patch only I've reinstated updated and improved versions of the high resolution timeouts to improve behaviour of userspace that is inappropriately Hz dependent allowing low Hz choices to not affect latency.
Additionally by request I've added a couple of tunables to adjust the behaviour of the high res timers and timeouts.
/proc/sys/kernel/hrtimer_granularity_us
and
/proc/sys/kernel/hrtimeout_min_us

Both of these are in microseconds and can be set from 1-10,000. The first is how accurate high res timers will be in the kernel and is set to 100us by default (on mainline it is Hz accuracy).
The second is how small to make a request for a "minimum timeout" generically in all kernel code. The default is set to 1000us by default (on mainline it is one tick).

I doubt you'll find anything useful by tuning these but feel free to go nuts. Decreasing the second tunable much further risks breaking some driver behaviour.

Enjoy!
お楽しみ下さい
-ck

115 comments:

Anonymous22 November 2016 at 14:38
cylictest (cyclictest -N -S -p 80) avg times increased by a factor of 10 with this version.

duud
ReplyDelete
Replies
Anonymous22 November 2016 at 15:56
on 4.8.10 kernel, 4.8-ck8 patchset

patch -p1 < ../4.8-ck8/patches/0006-Implement-min-and-msec-hrtimeout-un-interruptible-sc.patch
patching file include/linux/sched.h
Hunk #1 FAILED at 437.
1 out of 1 hunk FAILED -- saving rejects to file include/linux/sched.h.rej
patching file kernel/time/hrtimer.c
Hunk #1 FAILED at 1788.
1 out of 1 hunk FAILED -- saving rejects to file kernel/time/hrtimer.c.rej
ReplyDelete
Replies
Florian22 November 2016 at 21:33
Hi,

I just updated to MuQSS v0.144 and for the first time I saw this kernel message in my Arch syslog during boot:

kernel: APIC calibration not consistent with PM-Timer: 93ms instead of 100ms

My HZ-config on my Core2 Duo:

CONFIG_HZ_PERIODIC=y
# CONFIG_NO_HZ_IDLE is not set
# CONFIG_NO_HZ_FULL is not set
# CONFIG_HZ_100 is not set
CONFIG_HZ_128=y

Does this anything has to do with HZ_128=y? Do I have to optimize one of the following options:

/proc/sys/kernel/hrtimer_granularity_us is defaulting to 100 and
/proc/sys/kernel/hrtimeout_min_us defaults to 1000.

Thanks,

Florian.
ReplyDelete
Replies
Anonymous23 November 2016 at 00:49
Hello.
Here comes the usual benchmarks. The kernel configuration is Archlinux's 4.8.7 one. Intel-pstate+powersave frequency governor is used.

CFS vs MuQSS144
http://openbenchmarking.org/result/1611224-LO-CFSVSMUQS05

MuQSS140 vs MuQSS144
http://openbenchmarking.org/result/1611224-LO-MUQSS140164

There is some small improvement with MuQSS144+interactive=1, notably on ebizzy.

Pedro
ReplyDelete
Replies
Anonymous23 November 2016 at 01:55
Hey,

Just wanted to let you know that I still have issues with the spotify scrolling even with this new release (Same workload as I described in my email).
I have tested this with 1kHz so far. I will test the new 128Hz in a moment.
ReplyDelete
Replies
Anonymous23 November 2016 at 02:39
@ck:
With my first shot test I see happened progress between 140 and 144, regarding my FF "issue".
Running 4.8.10 with the -ck8 timer patches at default 128Hz. And very low system base load. Very nice :-)
BR, Manuel Krause
ReplyDelete
Replies
Anonymous23 November 2016 at 06:17
Hi Con.
I've some questions on interbench.

I ran it several times and if found that sometimes there are big variations with CFS in 'Max Latency', '% Desired CPU' and '% Deadlines Met'. Average latency are more consistent, but still with variations.
I tried with both intel-pstate performance and powersave. Doesn't make a difference.
I tried running interbench longer (-t 90), and it is better.

You can see the results here:
https://docs.google.com/spreadsheets/d/163U3H-gnVeGopMrHiJLeEY1b7XlvND2yoceKbOvQRm4/edit?usp=sharing

The colors mean:
blue = within +- 10% of the reference
green = better
red = worse
The reference is the first run on the left.

I wonder if such variations are expected.
If it is so, how to do a fair comparison between schedulers ?

Interestingly, there are less variations with MuQSS (maybe due to the heuristics in CFS ?).

Pedro
ReplyDelete
Replies
Anonymous23 November 2016 at 12:27
Thanks a lot, this release seems to alleviate TF2's startup time & fps problems. I also haven't run into any crashes or issues in an hour of testing.

Subjectively, my whole system seems to be more responsive and also boot up a little quicker but that could be placebo as I haven't done scientific testing. Although I could run a youtube video in the background while gaming without noticing any changes in input responsiveness (which I usually do in that case) so I guess something in this release does improve that.

~ kiwii, the anon who filed the TF2 bug report
ReplyDelete
Replies
Anonymous23 November 2016 at 18:24
With the conerns regarding security and privacy, distrobutions like debian have started to release grsecurity in their repos. SID only for now.

any plans to make CK compatible with grsec (in the [near] future)?
ReplyDelete
Replies
Anonymous25 November 2016 at 03:29
Thanks a lot. Very responsive on a i7-870 quadcore (oc) even more than CFS. I also used "KBUILD_CFLAGS += -falign-functions=1 -falign-jumps=1 -falign-loops=1 -falign-labels=1 -fno-builtin -pipe" in Line 200 of /arch/x86/Makefile which seems to give a nice boost.
ReplyDelete
Replies
Florian25 November 2016 at 06:08
Thanks for the tip concerning

-falign-functions=1 -falign-jumps=1 -falign-loops=1 -falign-labels=1 -fno-builtin -pipe

I used to do "KBUILD_CFLAGS += -march=native -mtune=native -pipe" before and am now testing if I realize some speed boosting on my Core2 Duo. Compiling and starting without any issues, perhaps I discover some milliseconds of speed boosting. ;-)
ReplyDelete
Replies
Anonymous28 November 2016 at 08:32
Thanks for the patches.
Hello Linux Desktop ;).
Had to revert to 4.8(.0) though since 4.8.7-4.8.11 were too slow for my taste.
ReplyDelete
Replies
kernelOfTruth1 December 2016 at 05:13
Con, is the scheduler responsible for interaction with workqueues ?

Just got the 53 second lockup while browsing with chromium, having compiz active

afaik I got more of these in the past few days, X was frozen but it could be rebooted via Magic SYSRQ Key,

didn't know that it would take a minute or longer for it to "pass", otherwise I would have waited longer and reported here earlier ...

http://pastebin.com/tdeKZ9ai

[more than 4096 chars]
ReplyDelete
Replies
thunderrd2 December 2016 at 01:36
375.xx drivers are riddled with bugs, for the last month or so. I wouldn't get anxious until I saw the next major update. Even Folding@Home is crippled in the newer drivers, so we have to stay with 343.xx.
ReplyDelete
Replies
Anonymous5 December 2016 at 05:12
Is there a way to configure the kernel with CONFIG_SCHED_BFS_AUTOISO but then for MUQSS?
ReplyDelete
Replies
Anonymous5 December 2016 at 07:10
I see, I assumed that https://github.com/zen-kernel/zen-kernel/blob/4.7/master/init/Kconfig#L75 was your work.
ReplyDelete
Replies
Anonymous5 December 2016 at 07:14
having said that is there a way to use an automatic sched_iso policy for X using MuQSS?
ReplyDelete
Replies
Anonymous6 December 2016 at 10:07
@ck:
Quite a nice one from Virtualbox after trying 1024Hz:
/tmp/vbox.1/r0drv/linux/the-linux-kernel.h:332:3: error: #error "HZ is not a multiple of 1000, the GIP stuff won't work right!"
# error "HZ is not a multiple of 1000, the GIP stuff won't work right!"

BR, Manuel Krause
ReplyDelete
Replies
Anonymous6 December 2016 at 11:28
Hi ck, a while back you offered a Ubuntu 4.8.7-ck7 kernel. That is running ever so smoothly that I took to building a more recent 4.8.12 kernel, incl. your latest MuQSS patches. Builds fine, but I must be missing an important part of the puzzle, as I can't get it to boot. Would you consider posting your Ubuntu kernel build script here (if you use one..)?
ReplyDelete
Replies
monotykamary8 December 2016 at 18:43
osu! still crashes, hangs, and locks up (the entire system) for me even with the workaround mentioned in earlier post comments. After several tries with ck and ck-ivybridge, I did get a different result in dmesg:

snd_hda_intel 0000:00:1b.0: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj

CPU: Intel i5-3317u
RAM: 7853MiB
GPU: Intel HD4000/NVIDIA 640M LE
WM (tested): bspwm/i3
Dist: Arch Linux (Reinstalled twice)
Device: Dell 3421
ReplyDelete
Replies
Anonymous12 December 2016 at 09:21
Thanks for 4.8-ck8.
How is 4.9 going on? :)
ReplyDelete
Replies

Add comment