Wednesday, 28 April 2021

linux-5.12-ck1, MuQSS version 0.210 for linux-5.12

 Announcing a new -ck release, 5.12-ck1  with the latest version of the Multiple Queue Skiplist Scheduler, version 0.210 These are patches designed to improve system responsiveness and interactivity with specific emphasis on the desktop, but configurable for any workload.

This was a resync and build bugfix from 5.11-ck1. The only new change to the -ck patch is the ability to reselect ondemand and conservative governors with Intel Pstate, and to deselect schedutil.

 

 linux-5.12-ck1:

patch-5.12-ck1.xz

Git tree:

5.12-ck


MuQSS only:

0001-MultiQueue-Skiplist-Scheduler-v0.210.patch

Git tree:

5.12-muqss


Web: kernel.kolivas.org

Enjoy!
お楽しみ下さい
-ck

36 comments:

  1. Thanks for your work!

    ReplyDelete
  2. Thank you very much!

    ReplyDelete
  3. Hi, @ckolivas. There are some fixes coming from zen-kernl and mine. Could you review it?

    fixes for
    https://github.com/ckolivas/linux/commit/9c7270e3e1dcf6c990a7ccd5fbf5d21e5a1ef2e2#diff-cdac74c1da5cfa0a30beabf049b3058ff354b315704e1cd4776f82d66796dffb
    https://github.com/ckolivas/linux/commit/4b14ea13eccf84c9c429e3459b777fed674d811f#diff-41fafa60f2e3316826fc2c6126ffdc2cc018332ac9e307d80be55f62ddc1aeae

    see https://github.com/zen-kernel/zen-kernel/commit/f1eb219c76a218c76b065e7a4380a3cbde7c5f73

    fixes for
    https://github.com/ckolivas/linux/commit/9c7270e3e1dcf6c990a7ccd5fbf5d21e5a1ef2e2#diff-4853e8b72f312cc8497dd737b1f7e6bb52f59fe133e965723fe168a1220c3b78

    diff --git a/drivers/block/swim.c b/drivers/block/swim.c
    index ac5c170..87d0d40 100644
    --- a/drivers/block/swim.c
    +++ b/drivers/block/swim.c
    @@ -371,7 +371,6 @@ static inline int swim_step(struct swim __iomem *base)
    for (wait = 0; wait < HZ; wait++) {

    set_current_state(TASK_INTERRUPTIBLE);
    - schedule_timeout(1);
    schedule_min_hrtimeout();

    swim_select(base, RELAX);

    ReplyDelete
    Replies
    1. Thanks, I added your fix to zen-kernel/5.12/muqss:
      https://github.com/zen-kernel/zen-kernel/commit/68e54611dda766838c743158164318a4e40d8e27

      Delete
    2. Thanks for those. I haven't done a proper code sweep on those in a while. Any chance you could submit pull requests to my git instead?

      Delete
  4. FYI Alfred Chen got his scheduler to work properly with schedutil with this commit.
    https://gitlab.com/alfredchen/linux-prjc/-/commit/d9f8735ff184981cfe16057642e0126d32f7d945

    ReplyDelete
    Replies
    1. I didn't say schedutil won't work with MuQSS. I'm just offering the option of disabling it.

      Delete
  5. Thanks Con.

    I've done some benchmarks.
    Nothing new on the throughput side.

    I've experimented with turbostat.
    If the energy results are accurate (they are at least coherent), schedutil is indeed less energy efficient than ondemand. And suprisingly, ck+ondemand is more energy efficient than CFS+schedutil.


    https://docs.google.com/spreadsheets/d/163U3H-gnVeGopMrHiJLeEY1b7XlvND2yoceKbOvQRm4/edit?usp=sharing

    Pedro

    ReplyDelete
    Replies
    1. That's a very interesting discovery, thanks for that. Looks like it was worth it to keep ondemand available for those that don't have hwp.

      Delete
  6. System: Ryzen 3700x

    qemu-system-x86_64 -enable-kvm -smp 8 -vga virtio -drive file=/dev/vg0/vmubuntu,if=virtio,media=disk,index=0,format=raw,cache=none -m 8G -display sdl,gl=on

    with rqshare=llc or smt the system freezes completely
    rqshare=all/none/mc working ok

    =mc reports total runqueues=1. Is this even the correct behavior?

    ReplyDelete
    Replies
    1. Freezing is obviously not correct behaviour, but yes MC should give you only one runqueue without two (emulated or real) physical cores.

      Delete
    2. Ok, thanks for the quick reply. New info: System only freezes with -smp 8 and -smp 32.
      I also used 2, 4, 6, 7, 9 and 16. No freeze here. Really strange.

      Delete
  7. I must say, this specific release is giving me some incredible performance gains (in the gaming side).

    Comparing this against both PDS,BMQ, and even the recently made CacULE scheduler I am getting the highest frames with MuQSS all tested with the tkg kernel. BMQ/PDS for example gave me constant 120-190 FPS in Krunker depending on the map I played in, CacULE was around the same but it would stuttering sometimes, and with MuQSS I am getting between 170-250 FPS which is a gain massive enough to scale across other games as well (Overwatch has zero stuttering whatsoever even when the shaders are compiling in contrast with the other schedulers, with CacULE stuttering the most).

    Thanks for your amazing work on updating this scheduler, and I can't wait to see how it improves along further versions.

    ReplyDelete
    Replies
    1. Thanks for the feedback! I uh, did... nothing new to MuQSS. It's been stable and unchanged for many versions now. Perhaps you're seeing the effect of a CPU frequency scaling governor change?

      Delete
    2. Maybe, but I made sure for all of my tests that I was using the performance governor because I couldn't believe my eyes when I tried the new release and I wanted to make sure that this wasn't a coincidental effect.

      For reference, I am using an Intel i5 6th gen Skylake 2.3 GHz base clock, but I have Turbo Boost enabled so I can get 2.8 GHz at max, and I used cpupowergui to monitor the scaling frequencies across the different schedulers and they were all the same (between 2500-2700 MHz). So I am not sure if this is the reason for the performance gains.

      Delete
    3. Sounds comprehensive. Don't get me wrong, I believe you, but I didn't do anything special this release. There may have been some bottleneck to MuQSS from mainline code that coincidentally got fixed.

      Delete
  8. Hello.

    I'm using Archlinux x86_64 and compile myself package with ck patch (based on
    PKGBUILD from AUR:linux-ck, https://aur.archlinux.org/packages/linux-ck/). Seems
    that ck patch doesn't work fine with GCC11 and "old" processors.

    On my notebook with Intel Core i5 2410M processor (sandybridge) kernel 5.12.4
    doesn't boot while the same version kernel from distribution package boots
    fine.

    The strange thing is that I also compile kernels with ck patch (packages) for
    two other processor types (haswell and skylake) and these kernels work fine on
    hosts with such processors.

    The AUR:linux-ck "package" allows to compile "optimized" kernels for given
    processor type: sandybridge, haswell or skylake and this is the only difference
    between these "packages".

    The previously used kernel, 5.12.2, was compile with GCC10 while 5.12.4 is
    compiled by GCC11. But the same compiler is used for kernel from distribution
    package. I suspect an issue with ck patch and GCC11 compiler because when I
    recompiled 5.12.2 kernel with GCC11 it doesn't boot, even if I built "generic"
    one (AUR:linux-ck allows not only to select specific architecture but also
    choose "generic x86_64" one).

    Seems that other users face similar issue (two users reported a problem on
    AUR:linux-ck page).

    ReplyDelete
    Replies
    1. Sorry, that was a false alarm! There was a problem with build script (PKGBUILD), gcc11 and wrong optimization level. After graysky updated PKGBUILD I recompiled kernel for sandybridge and was able to boot linux-ck-sandybridge without glitch.

      Delete
  9. I cannot start docker containers when using the 5.12.5-1-ck-skylake kernel in Archlinux. I get this error:

    $ docker run --rm -it -p 80:80 ckulka/baikal:nginx
    docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:508: setting cgroup config for procHooks process caused: load program: invalid argument: unknown.

    which does not happen if I use the stock Archlinux kernel. Is this normal/expected or is there an issue with the CK patchset and docker? Thanks.

    ReplyDelete
    Replies
    1. Yes, MuQSS provides a basic shim that docker and other applications can use. But if the actual knobs are tuned and stressed to get a specific behavior, the chance of the container failing to launch is rather high.

      However, you're using the ck kernel in the AUR. You should verify that the cgroup settings are actually enabled first in the kernel config. I don't know how good the maintainer is at keeping the ck config in sync with Arch upstream.

      Delete
    2. Ok, I should have just tried it. Using latest Liquorix (running MuQSS, 5.12.6 base), the docker command you gave works.

      Post a comment on the AUR package and hopefully the maintainer can figure out what they're missing.

      If you want working MuQSS with docker _today_, you can install Liquorix here (binary available in pinned comment): https://aur.archlinux.org/packages/linux-lqx/

      Delete
    3. Thanks Steven for your checks. Actually, the Archlinux linux-ck maintainer (graysky) usually suggests to post issues here, but I'll try anyway to post to AUR.

      Delete
    4. As suggested by the artafinde user in the AUR page, forcing the cgroup v1 behavior by setting the kernel parameter `systemd.unified_cgroup_hierarchy=0` makes docker work with linux-ck. I don't understand the meaning and the implications of this, but I'm reporting this here anyway. Is this something that can/will be fixed in the CK patchset and/or docker?

      Delete
  10. Hi! I would like to thank for your work. The computer is working marvelous with your kernel. However, I encountered a big issue regarding my wireless module Realtek RTL8821AE. The download speed and especially the upload speed are very low and I was forced to use an older kernel. My card is working very good with kernel 5.4 LTS. Since then there was a very bad implementation of the driver for this specific card on every newer kernel. Could you fix that, please, and insert the older driver in your latest kernels? I'd appreciate that! Thanks!

    ReplyDelete
    Replies
    1. If it's not MuQSS-related (your card works properly with default CFS scheduler), then it's a wrong place to post. Author can't fix each possible performance issue within Linux kernel.
      I suggest you to fill bugreport for mainline kernel or your distro maintainers.

      Delete
  11. Hi Con,

    Something "interesting" happened in one of the mainline versions (I guess) and MuQSS did not work optimally on my machine.
    So I went to refresh my memories about the last patches I made :)

    It seems that I have fixed the stutters and performance is back with LLC as well as MC in lightly loaded system, like gaming. It looks it even improved, not by much, though, but I could measure it, like Unigine Valley is 0.97% improvement.
    When all cores are busy, there are virtually no difference between the current version.

    Please take a look at the patches in folder 2021.07 (these include some documentation fixes too):
    https://drive.google.com/drive/folders/1MxUcptaOgPbPgJoUdeq0GkEuoeyaRHdG?usp=sharing

    If they are ok, maybe they are worth including in next version.

    BR,
    Eduardo

    ReplyDelete
    Replies
    1. Eduardo, can you publish these patches to a git repository with a summary+description of what you're fixing? I would start with forking the 5.12-muqss branch that Con maintains here: https://github.com/ckolivas/linux/tree/5.12-muqss.

      Overall, Google Drive is fine for sharing media and documents with friends and family, but all the alternatives for sharing code and code snippets are much better (github, gitlab, etc).

      Delete
    2. I make those patches once in 2 years :)
      But I'll check what needs to be done to share them properly.

      BR,
      Eduardo

      Delete
    3. While I was working on proper patch sharing via github, I have an idea to try one more small change, so the patches will be released eventually, but I'll need some time to test the idea.

      @damentz, just to share my findings: I have checked that your fix for zen contains rr_interval 4 instead of 6, which I was testing too and I found that it was better on lightly loaded machine, which gaming is part of (if there are enough resources, of course), however when all cores were loaded (compiling) there was a small, but measurable performance loss, which is expected.

      Delete
    4. Submitted a pull request: https://github.com/ckolivas/linux/pull/24

      Delete
    5. Hey Eduardo, I added your commits to Zen Kernel's 5.12/muqss branch. Haven't tested them yet but I'll get to that this weekend.

      https://github.com/zen-kernel/zen-kernel/commits/5.12/muqss

      And I hope you don't mind, I tweaked the summary of both commits to match our standards in the branch:

      5ea144678c39 muqss: Fix documentation regarding MC LLC runqueue sharing
      08f6a6f8ef28 muqss: Tune CPU selection

      Delete
    6. Aaaand the patches didn't go so well. As part of https://github.com/damentz/liquorix-package/issues/60, I reverted the CPU tuning commit since it makes an oops on the i7-7700k:

      smp: Bringing up secondary CPUs ...
      x86: Booting SMP configuration:
      .... node #0, CPUs: #1 #2 #3 #4
      MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
      TAA CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html for more details.
      #5 #6 #7
      smp: Brought up 1 node, 8 CPUs
      smpboot: Max logical packages: 1
      smpboot: Total of 8 processors activated (67200.00 BogoMIPS)
      MuQSS possible/present/online CPUs: 8/8/8
      MuQSS sharing MC runqueue from CPU 0 to CPU 1
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 1 at kernel/sched/MuQSS.c:7608 share_and_free_rq+0xe/0x6b
      Modules linked in:
      CPU: 0 PID: 1 Comm: MuQSS/0 Not tainted 5.12.0-16.1-liquorix-amd64 #1 liquorix 5.12-22ubuntu1~focal
      Hardware name: Intel Corporation S1200SP/S1200SP, BIOS S1200SP.86B.03.01.0049.060120200516 06/01/2020
      RIP: 0010:share_and_free_rq+0xe/0x6b
      Code: c7 c7 00 fe 4c 82 e8 92 87 7b fe 48 c7 c7 f5 f1 10 82 e8 84 b6 f7 fe 31 c0 5b 5d c3 55 48 89 fd 53 83 7e 30 00 48 89 f3 74 02 <0f> 0b 48 8b bb f0 00 00 00 e8 95 d6 85 fe 48 8b bb f8 00
      00 00 e8
      RSP: 0000:ffffc90000073ea8 EFLAGS: 00010002
      RAX: 000000000000002d RBX: ffff88885166c100 RCX: c0000000ffffefff
      RDX: 0000000000000000 RSI: ffff88885166c100 RDI: ffff88885162c100
      RBP: ffff88885162c100 R08: 0000000000000000 R09: ffffc90000073cf8
      R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000001
      R13: ffff88885166c100 R14: ffff88885162c100 R15: 0000000000011470
      FS: 0000000000000000(0000) GS:ffff888851600000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffff8888717ff000 CR3: 0000000005410001 CR4: 00000000003706f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

      Delete
    7. ok, I'll look into this, from the trace it does not look related, but maybe I'm mistaken...

      The real challenge would be to find some intel CPU to check this on :)

      Have you tried other rq sharing methods or you used just the default mc?

      BR,
      Eduardo

      Delete
    8. I kept Liquorix on MC so as not to make too many changes at once. You can verify in the issue and linked commit that I only reverted the CPU tuning commit and nothing else. So something in the patch made the oops above.

      Delete
    9. I'm trying to improve the best CPU selection a little more, but probably I'll push a quick fix for this particular issue (which is clear what happens, but not clear to me why and why I can not reproduce it).
      BR,
      Eduardo

      Delete
    10. I have pushed the new version of the patch which supposedly fixes the issue with scheduling tasks on to CPU before the queues are shared.
      Please try this version and report back whether that works for you.
      BR,
      Eduardo

      Delete