-ck hacking: linux-4.8-ck7, MuQSS version 0.140

Saturday, 12 November 2016

linux-4.8-ck7, MuQSS version 0.140

Another week has passed, another stable linux release, and to follow, another -ck and MuQSS release.

linux-4.7-ck7 patch:
patch-4.8-ck7.lrz

Split out patches:
http://ck.kolivas.org/patches/4.0/4.8/4.8-ck7/patches/

MuQSS by itself for 4.8:
4.8-sched-MuQSS_140.patch

MuQSS by itself for 4.7:
4.7-sched-MuQSS_140.patch

This release marks a change towards conservative changes only.

I've rolled back the extensive timer changes outside the main scheduler code. There are too many assumptions made about timeouts in the kernel code that are potentially problematic in the real world, and there is code that is poorly prepared for freezer usage (suspend to ram) that breaks. Additionally, not a single user reported a workload that they noticed benefited from the lower latency accurate timeouts. Finally, the added overhead is demonstrable in throughput benchmarks, and when doing comparisons with mainline it is doing MuQSS a disservice to mix in other code that it's not actually responsible for.

There are also a small number of bugfixes for warnings/crashes in the updated MuQSS that showed up after the last release as people are using it on more and varied hardware in the wild now. These may have positive effects on other less defined issues in the wild too.

The -ck release also includes an updated version of BFQ. Along with this updated version, I would like to issue a warning regarding BFQ. I have heard rumour that a number of users have reported filesystem corruption with the combination of BTRFS and BFQ. If you are using this filesystem, I urge you to not compile in BFQ at all, or at the very least not make it default to BFQ, using it selectively on devices you are running a different filesystem (I still recommend people use ext4.) I would like to encourage users who have run into this problem to report it to the BFQ maintainer.

I've cleaned up the patches in the -ck tarball once again to include only the changes in combined related patches. This will ease the burden of porting to the next major linux kernel release and allow users to easily select which patches they wish to use themselves.

As always, make sure to give me your feedback, bug reports, warnings, and bitcoin.

Enjoy!
お楽しみ下さい
-ck

132 comments:

Anonymous12 November 2016 at 15:30
I've found out which settings cause the osu! system lockup bug manifest.

As long as the FPS Limiter in osu! is set to a value lower than Unlimited, the bug won't show up. If the value is set to unlimited, play any map would result in system lockup.

The bug is easier to be triggered when a complex beatmap is used, such as this one (Game Over difficulty): https://osu.ppy.sh/s/332532

NOTE: You don't have to play the game, just turn the Auto mod on and the map will be played by a bot
ReplyDelete
Replies
Oleksandr Natalenko12 November 2016 at 20:56
Btrfs+BFQ issues? Bullshit. I use that all the time on multiple machines, and had no issues at all.
ReplyDelete
Replies
Anonymous13 November 2016 at 03:09
@ck:
Regarding the timer changes rollback for non-scheduler code, I assume that you still recommend the HZ_100 config, right?

BR, Manuel Krause
ReplyDelete
Replies
Anonymous13 November 2016 at 14:23
@Pedro

Could you please benchmark this:
https://sourceforge.net/projects/xanmod/files/sources/linux-4.8.6-xanmod8_4.8.6-xanmod8.orig.tar.gz/download
(not the newer one 4.8.7!)

It has some set some of the cfs parameters for low latency by default.
Feels very snappy for me, especially when running xonotic, very low input latency. I prefer this one for xonotic.
ReplyDelete
Replies
Anonymous14 November 2016 at 04:32
^^tested xanmod kernel on my custom slackware install. while it feels pretty responsive for a cfs kernel (also on xonotic) 4.8.6 muqss runs better on that old core2duo machine. also xanmod modified the makefile to -Ofast which is questionable.
ReplyDelete
Replies
kernelOfTruth14 November 2016 at 08:10
Even though 100Hz or even 300Hz might provide better throughput there were several issues with 100Hz (tested):

upon reboot the unmounting of the dozens of ZFS subvolumes took ages - whereas with 300 Hz it was faster and 1000 HZ it was almost instantaneous

Running Deus Ex Mankind Divided with 100 Hz wasn't fun - showing augmentations menus (with running videos on them) would regularly slow and clog down the animation/FPS, also while looking left and right or getting moving

there was much more lag involved compared with 1000 Hz where it runs significantly smoother

This might in part be thanks to running with CONFIG_NO_HZ_FULL while on 100 Hz and

now CONFIG_NO_HZ_IDLE=y with 1000 Hz but there's quite a difference

# CONFIG_RCU_FAST_NO_HZ is OFF
ReplyDelete
Replies
Anonymous15 November 2016 at 07:22
No problems, old Intel cpu.
Very fast, love it.
ReplyDelete
Replies
Anonymous15 November 2016 at 14:00
Having input freezes (USB 2.0, mouse, keyboard, 500 Hz polling) when running Xorg + Xonotic with SCHED_ISO on 4.8.7-ck7.
4.8.7-ck6+0001-Make-freezable-timeouts-not-use-the-highres-timers.patch is fine although when enabling multicore-scheduler support in the kernel config it will tend to input freeze also.
1000 Hz, periodic timer ticks.
Input freeze as in game continues normally but accepts no input anymore for maybe 3 to 7 seconds (time varies) on random occasions which can be like a few minutes apart.
Apart from that 4.8.7-ck7 seems to run better.
No other patches.
ReplyDelete
Replies
ck15 November 2016 at 18:32
Just for grins I've put generic Ubuntu LTS kernel packages in the ck7 directory as well.
ReplyDelete
Replies
Anonymous15 November 2016 at 22:46
New benchmarks of MuQSS140 on linux 4.8.7 here:
http://openbenchmarking.org/result/1611152-LO-CFSVSMUQS66

MuQSS135 vs MuQSS140 here:
http://openbenchmarking.org/result/1611150-LO-MUQSS135127

Nothing new with these results.

Pedro
ReplyDelete
Replies
Anonymous16 November 2016 at 10:40
>"Additionally, not a single user reported a workload that they noticed benefited from the lower latency accurate timeouts"

I found one. Probably Valve's at fault here, but still: Team Fortress 2 takes about 20 minutes to start up with ck7 @ 100Hz while it takes "only" 2 minutes with the other patches from the previous release and also only about 2 minutes with the old 1k Hz default.
(I think it's compiling shaders during that time; only uses 1 cpu core but at about 100% though..)

And even when I wait the 20 minutes, TF2 is running at an unplayable <15fps while with the 4.8-ck6 patchset it is running at 60 to 120fps.

This is on a system that takes under 20 seconds to boot, just for perspective.

OS: Arch Linux, linux-ck-k10-4.8.7-2 from graysky's repo-ck
CPU: k10, amd phenom II x4 955 black edition (OC at 3.7GHz)
GPU: amd radeon hd 7870 with fglrx/catalyst 15.12
board: asus m4a77td (not the pro variant)

Let me know if I should provide any logs/further info or do some tests.
Also if this is something that has to be fixed in TF2, I'd be grateful for any hints on what I should include in a bug report over at valve's github.

$ zcat /proc/config.gz |grep HZ
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
# CONFIG_NO_HZ is not set
CONFIG_HZ_100=y
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=100
CONFIG_MACHZ_WDT=m
ReplyDelete
Replies
ck16 November 2016 at 11:27
If you post a message and it doesn't show up here, just wait as I eventually go through and unmark things as spam. Blogger is very aggressive at marking comments as spam.
ReplyDelete
Replies
ck16 November 2016 at 19:35
For those that have applications that are definitely faster at higher Hz configs, it would be interesting to see if just replacing the msleep calls from userspace with high resolution ones is enough to fix the slowdown. Try this patch on top of ck7: 0001-Use-hrtimeouts-when-possible-instead-for-msleep.patch
ReplyDelete
Replies
zar17 November 2016 at 14:05
I've definitely noticed the MuQSS leads to a less responsive desktop when compiling very large projects, like Unreal Engine. Could this just be since it's more efficient at using all available resources than BFS?
ReplyDelete
Replies
graysky18 November 2016 at 06:55
Running 4.8.8 with ck7 results in much lower idle CPU frequency than without it on my i7-4790K (Haswell). I wrote a script that samples the CPU frequency once per sec then uses a python script to plot a histogram of the data. I ran this script under 4.8.8 and 4.8.8+ck7 and the differences where striking. Much lower idle with the ck patchset/MuQSS.

Median frequency without patch = 3.72 GHz
Median frequency with patch = 1.13 GHz

Link to script: https://github.com/graysky2/bin/blob/master/cpufreq_histogram.sh

Histogram without patch:
# NumSamples = 180; Min = 799.80; Max = 4400.80
# Mean = 3286.151667; Variance = 1422892.793164; SD = 1192.850700; Median 3718.850000
# each ∎ represents a count of 1
799.8000 - 1159.9000 [ 17]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ (9.44%)
1159.9000 - 1520.0000 [ 5]: ∎∎∎∎∎ (2.78%)
1520.0000 - 1880.1000 [ 7]: ∎∎∎∎∎∎∎ (3.89%)
1880.1000 - 2240.2000 [ 6]: ∎∎∎∎∎∎ (3.33%)
2240.2000 - 2600.3000 [ 8]: ∎∎∎∎∎∎∎∎ (4.44%)
2600.3000 - 2960.4000 [ 32]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ (17.78%)
2960.4000 - 3320.5000 [ 7]: ∎∎∎∎∎∎∎ (3.89%)
3320.5000 - 3680.6000 [ 6]: ∎∎∎∎∎∎ (3.33%)
3680.6000 - 4040.7000 [ 11]: ∎∎∎∎∎∎∎∎∎∎∎ (6.11%)
4040.7000 - 4400.8000 [ 81]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ (45.00%)

Histogram with patch:
# NumSamples = 180; Min = 799.30; Max = 4400.10
# Mean = 1612.930556; Variance = 1172476.469566; SD = 1082.809526; Median 1127.550000
# each ∎ represents a count of 1
799.3000 - 1159.3800 [ 95]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ (52.78%)
1159.3800 - 1519.4600 [ 27]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ (15.00%)
1519.4600 - 1879.5400 [ 16]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ (8.89%)
1879.5400 - 2239.6200 [ 4]: ∎∎∎∎ (2.22%)
2239.6200 - 2599.7000 [ 7]: ∎∎∎∎∎∎∎ (3.89%)
2599.7000 - 2959.7800 [ 6]: ∎∎∎∎∎∎ (3.33%)
2959.7800 - 3319.8600 [ 5]: ∎∎∎∎∎ (2.78%)
3319.8600 - 3679.9400 [ 0]: (0.00%)
3679.9400 - 4040.0200 [ 4]: ∎∎∎∎ (2.22%)
4040.0200 - 4400.1000 [ 16]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ (8.89%)
ReplyDelete
Replies
ck18 November 2016 at 14:07
I've added updated and improved versions of all the timer patches in ck6 back into the -ck git branch. They'll still be kept separate from the muqss code but assuming these latest patches improve behaviour without the bugs they previously introduced, I can roll them into another -ck release (muqss has no pending changes.)
ReplyDelete
Replies
Anonymous19 November 2016 at 03:23
blogspot seems to have a problem with replying/ displaying at 99th+ posts. Please have a look, thank you.
BR, Manuel Krause
ReplyDelete
Replies
Anonymous19 November 2016 at 17:29
Hi, I have tested yesterday ck-ivybridge kernel on Manjaro Distro and didnt perform quite well for me...

Just an example of what was going on...
I have like 5-6 autostart applications when I login...
Clementine was playing music , I have opened steam , clementine had stop playing music while loading steam and after steam loaded it started again playing music....
I was so shocked only from that simple task so I didnt perform any additional tests...

The above behavior is not happening with Manjaro Kernels
ReplyDelete
Replies
Anonymous20 November 2016 at 08:44
I have tested the kernel today but the same thing.
Maybe is because I am using mdadm raid 0 with 2x SSD?
ReplyDelete
Replies
ck20 November 2016 at 10:20
I don't have time to release anything today. Maybe Tuesday; 4.8.10 will probably be out by then.
ReplyDelete
Replies
Anonymous21 November 2016 at 07:01
Before Con prepares a new patch publishing run... a question to the people with the softirq error messages: Are these gone away by now? And if, by what means? With his included workaround patch?
Thank you, BR Manuel Krause
ReplyDelete
Replies
Anonymous21 November 2016 at 10:27
Finally I got time to test this. Before I was still using BFS 0.512 with the backport patches (so basically the 4.8-bfs branch). The thing I noticed was that it couldn't deal high cpu loads anymore. For example with BFS I had a very smooth desktop experience where everything worked well (no lags) while compiling big projects like LLVM (plus extra workloads like running Spotify, Firefox with 50 tabs, 2x windows vms, chromium 8 tabs, pycharm). With MUQSS (from linux-ck 4.8.9) every was lagging while compiling LLVM, literally everything, even scrolling in spotify lead to some milliseconds delay which did not happen with BFS.

I know I am quite late to the party as I didn't want to test out older Muqss releases and couldn't report this ealier. Also I am not sure how I can provide more information. The easiest way for me to reproduce this, is to compile LLVM + Clang (ninja -j8/make -j8) while scrolling a large spotify playlist up and down. You will notice lags while doing it which again did not happen with BFS.
ReplyDelete
Replies
Anonymous4 February 2017 at 22:29
thank you
you can also read Why Linux is Free?
ReplyDelete
Replies
Unknown19 April 2017 at 12:03
I'm hitting a BUG when trying to create a QEMU/KVM VM, any ideas? I saw similar BUG in previous user comments regarding to BFS for 4.8, where person hit same BUG when he tried using VirtualBox.

Apr 18 20:08:35 ROG audit[5962]: AVC apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-f02a7ff8-d128-4db2-
Apr 18 20:08:35 ROG kernel: audit: type=1400 audit(1492564115.688:55): apparmor="STATUS" operation="profile_replace" profile="unconfined"
Apr 18 20:08:35 ROG kernel: usercopy: kernel memory overwrite attempt detected to ffff9b05d3ece708 (kmalloc-8) (128 bytes)
Apr 18 20:08:35 ROG kernel: ------------[ cut here ]------------
Apr 18 20:08:35 ROG kernel: kernel BUG at /usr/src/linux-4.10.0/mm/usercopy.c:75!
Apr 18 20:08:35 ROG kernel: invalid opcode: 0000 [#1] SMP
Apr 18 20:08:35 ROG kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 b
Apr 18 20:08:35 ROG kernel: cryptd snd_hwdep snd_pcm intel_cstate nvidia(POE) intel_rapl_perf snd_seq_midi saa7164 snd_seq_midi_event sn
Apr 18 20:08:35 ROG kernel: multipath linear uas usb_storage hid_generic usbhid hid raid0 i915 i2c_algo_bit drm_kms_helper syscopyarea s
Apr 18 20:08:35 ROG kernel: CPU: 0 PID: 4052 Comm: libvirtd Tainted: P OE 4.10.0-19+my-generic #21
Apr 18 20:08:35 ROG kernel: Hardware name: ASUS All Series/MAXIMUS VII GENE, BIOS 3003 10/28/2015
Apr 18 20:08:35 ROG kernel: task: ffff9b05aa5c5300 task.stack: ffffaaf342f30000
Apr 18 20:08:35 ROG kernel: RIP: 0010:__check_object_size+0x77/0x1d6
Apr 18 20:08:35 ROG kernel: RSP: 0018:ffffaaf342f33ee0 EFLAGS: 00010282
Apr 18 20:08:35 ROG kernel: RAX: 000000000000005e RBX: ffff9b05d3ece708 RCX: 0000000000000000
Apr 18 20:08:35 ROG kernel: RDX: 0000000000000000 RSI: ffff9b05efa0dbc8 RDI: ffff9b05efa0dbc8
Apr 18 20:08:35 ROG kernel: RBP: ffffaaf342f33f00 R08: 0000000000000005 R09: 0000000000000551
Apr 18 20:08:35 ROG kernel: R10: 0000000000000008 R11: ffffffffa84469cd R12: 0000000000000080
Apr 18 20:08:35 ROG kernel: R13: 0000000000000000 R14: ffff9b05d3ece788 R15: ffff9b05d3ece708
Apr 18 20:08:35 ROG kernel: FS: 00007f410c5d1700(0000) GS:ffff9b05efa00000(0000) knlGS:0000000000000000
Apr 18 20:08:35 ROG kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 18 20:08:35 ROG kernel: CR2: 00007f41196eaaa0 CR3: 00000003f0da5000 CR4: 00000000001406f0
Apr 18 20:08:35 ROG kernel: Call Trace:
Apr 18 20:08:35 ROG kernel: SyS_sched_setaffinity+0x6b/0xe0
Apr 18 20:08:35 ROG kernel: entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 18 20:08:35 ROG kernel: RIP: 0033:0x7f41188425dc
Apr 18 20:08:35 ROG kernel: RSP: 002b:00007f410c5d0798 EFLAGS: 00000246 ORIG_RAX: 00000000000000cb
Apr 18 20:08:35 ROG kernel: RAX: ffffffffffffffda RBX: 00007f41193e271c RCX: 00007f41188425dc
Apr 18 20:08:35 ROG kernel: RDX: 00007f40e81211e0 RSI: 0000000000000080 RDI: 000000000000174c
Apr 18 20:08:35 ROG kernel: RBP: 00007f40e83155d0 R08: 00007f40e81de0e0 R09: 0000000000000000
Apr 18 20:08:35 ROG kernel: R10: 00007f40e81211e0 R11: 0000000000000246 R12: 00007f40e83155d0
Apr 18 20:08:35 ROG kernel: R13: 00007f41196eaa90 R14: 0000000000000001 R15: 00007f410c5d1698
Apr 18 20:08:35 ROG kernel: Code: c7 c2 13 4f ed a7 48 c7 c6 d1 da e9 a7 48 c7 c7 60 a5 e9 a7 48 0f 44 d1 48 c7 c1 8a 2e e9 a7 48 0f 44 f
Apr 18 20:08:35 ROG kernel: RIP: __check_object_size+0x77/0x1d6 RSP: ffffaaf342f33ee0
Apr 18 20:08:35 ROG kernel: ---[ end trace 7f5e3e96a69c8802 ]---
ReplyDelete
Replies