Monday 31 December 2018

linux-4.20-ck1, MuQSS version 0.185 for linux-4.20

Announcing a new -ck release, 4.20-ck1  with the latest version of the Multiple Queue Skiplist Scheduler, version 0.185. These are patches designed to improve system responsiveness and interactivity with specific emphasis on the desktop, but configurable for any workload.

linux-4.20-ck1:
-ck1 patches:
Git tree:
MuQSS only:
Download:
Git tree:


Web: http://kernel.kolivas.org


In addition to a resync from 4.19-ck1 I've extended the runqueue sharing options to all CPUs as well, meaning it can be used in NUMA hardware as a single runqueue if desired.

Merry Christmas, and have a happy new year everyone. May your new year be filled with good health, stable kernels, and more bitcoin adoption and value.

Enjoy!
お楽しみ下さい
-ck

70 comments:

  1. Thanks so much.
    Perfect way to start into the new year.
    Happy New Year!

    ReplyDelete
  2. Damnit, I love you Con. Happy new year

    ReplyDelete
  3. ooo, good, thx for new ver Con<3

    ReplyDelete
  4. Joyeuse année 2019 et merci de ta persévérance :-)

    What do you think about gnu hurd project ?

    ReplyDelete
  5. Runs great <3.

    ReplyDelete
  6. Hi Con, I'm having a compilation issue. x64 builds fine. But when I try to build for x86, I'm getting the following compile error:
    https://del.dog/mojukeyoqa

    (the 4.19 patch worked fine for x86)

    Thanks for all your work!

    ReplyDelete
    Replies
    1. There are some fixes in git on both the muqss and -ck branch, courtesy of SSB. Try those :)

      Delete
    2. Worked like a charm, thanks again!

      Delete
  7. Hi,

    I'm reporting this here as it is the current blog item, but is affects all recent MuQSS and kernel releases. I've observed it since kernel version 4.17, when I introduced encryption for my main workstation. I'm using LUKS (linux unified key setup) and the system is configured to read the pass codes from the console relatively early during kernel boot.

    With the vanilla kernel it works as designed. However, as soon as I compile in MuQSS, all key presses sort of "bounce". In other words each key press results in any number of characters being generated, making the blind input of key phrases impossible. I have not been able to use MuQSS since then.

    Any ideas how I could fix it?

    Thanks

    ReplyDelete
    Replies
    1. I used to have this issue too, it was probably solved by enabling periodic timer ticks - CONFIG_HZ_PERIODIC=y . A workaround was to use a USB keyboard. But it was really long ago and it's hard to recall the details, my current muqss config just does not exhibit this issue.

      Delete
    2. Config for 4.14 which does not exhibit the issue:

      https://drive.google.com/file/d/1k4hHglg-gdPOJMGK-s3gfGwCfwmnld1K/view?usp=sharing

      Hope that helps.

      Delete
    3. Thank you a lot. The timer tick option did the trick. I'm back on MuQSS. ^_^
      It still means that there is some underlying issue preventing me from using MuQSS should the kernel ever go tickless.

      Delete
    4. Were you using MuQSS alone or with the rest of the -ck patches?

      Delete
    5. The issue being solved with periodic ticks does not surprise me actually. Ran into a different issue with MuQSS and the (default) idle dynticks.

      Some audio distortion in WINE that I eventually solved by recompiling the kernel with full periodic ticks.

      This was with just MuQSS, sans -ck.

      The kernel will probably eventually go full tickless by default, so whatever the underlying problem may be, it does need to be addressed.

      Delete
    6. As people (or is it person?) keep repeating ad nauseam. Perhaps the ck patches should become part of muqss since muqss is intrinsically a tickless scheduler and relies on highres timeouts to work properly, but unfortunately the highres timers in mainline are stupidly tick resolution limited...

      Delete
    7. No offense was implied, no need to infer any either.

      Delete
    8. to answer the earlier question, I have been applying the pure MuQSS patch, no other kernel patches. Also I'd like to clarify, I have no deep insight in kernel development. I just picked the rumour up here...

      Delete
  8. Smooth as silk. Partially because of the low level VLA work on 4.20 itself and the rest is MuQSS' doing.

    Great work as always. Amazing job on including NUMA nodes in the single runqueue option.

    ReplyDelete
    Replies
    1. I agree.
      Great job and smooth AF 4.20.1.
      Thanks.

      Delete
  9. Hi all, I was wondering if other people are seeing a 'psi: task underflow!' message when booting 4.20-1 linux-ck kernel?
    More info on psi (pressure stall information for CPU, memory, and IO): https://lwn.net/Articles/763629/

    When adding 'psi=0' kernel parameter to effectively disable psi, this message goes away. Alas my experience with/knowledge of psi is lacking, so I cannot judge if this is a wise thing to do, or at all related to linux-ck or MuQSS...

    $ pacman -Q linux-ck-core2
    linux-ck-core2 4.20-1

    dmesg snippet:
    ...
    [ 0.509321] MuQSS locality CPU 0 to 1: 2
    [ 0.509323] Sharing MC runqueue from CPU 1 to CPU 0
    [ 0.509327] CPU 0 RQ order 0 RQ 1
    [ 0.509328] CPU 1 RQ order 0 RQ 1
    [ 0.509329] CPU 0 CPU order 0 RQ 0
    [ 0.509331] CPU 0 CPU order 1 RQ 1
    [ 0.509332] CPU 1 CPU order 0 RQ 1
    [ 0.509333] CPU 1 CPU order 1 RQ 0
    [ 0.509334] MuQSS runqueue share type MC total runqueues: 1
    [ 0.509542] psi: task underflow! cpu=0 t=2 tasks=[0 0 0] clear=4 set=0
    ...

    full dmesg: https://ptpb.pw/xwAE.log

    ReplyDelete
    Replies
    1. PSI support is new on MuQSS and completely untested at this stage and probably broken. That said, it's a debugging feature that you won't be using so there's not much point enabling it.

      Delete
    2. Thanks for clearing that up so quickly!

      Delete
    3. > it's a debugging feature

      It isn't. Or, it is, but to the same extent as loadavg.

      Delete
    4. Whereas it may not be a literal debugging feature in the strictest sense of the word, it is a feature that is most commonly used by and most commonly useful for developers.

      That does make its use case mostly of a debugging nature.

      Delete
    5. psi: task underflow! cpu=0 t=2 tasks=[0 0 0 1] clear=c set=0

      on 5.7.4-ck1 on a ryzen 1600

      Delete
  10. Hi Con,

    First of all, thank you for your continuous work with the patchset.

    I have a question about using the 'workqueue.power_efficient' kernel boot parameter, which can be used to disable per-cpu workqueues in order to improve power efficiency, and how it relates to the runqueue sharing in MuQSS.

    I understand these are two different things, but I'm curious whether the per-cpu workqueues should work in any way differently with MuQSS compared to vanilla kernel, that should be taken into consideration with the kernel configuration.

    Do you have any thoughts or recommendations about using the workqueue.power_efficient option with MuQSS enabled kernel?

    Thank you again, and I hope you'll have a great year.

    ReplyDelete
    Replies
    1. It should just work the same as in vanilla, though I have no informed opinion on its usage as such.

      Delete
    2. Thanks for the clarification.

      Delete
  11. Replies
    1. No, docker and containers that use CPU scheduler cgroups in general do not work at all with MuQSS. There is no 'containment' as such, and the cgroups are only there to allow systems to run that mandate their existence.

      Delete
    2. Which is a good thing imho.

      Delete
    3. I actually suspect systemd was doing something right before v240 that broke support in docker. One can switch back to CFS to use docker with modern systemd, but that why does modern systemd suddenly make MuQSS incompatible?

      I'd say it's probably worth investigating restoring the behavior that let MuQSS work without the cpuacct cgroup controller. Docker doesn't _need_ it to work properly, and especially for my use case, I just use docker to build kernels, so I don't really care how docker wants to use cgroups to manage CPU usage.

      And really, that leaves us to, how hard would it be to add the most simplest shim to MuQSS, even if the cgroup itself is functionally useless? Obviously we were fine before without it, but now suddenly docker needs it since systemd updated. Very bizarre.

      Delete
    4. Also, I forgot to link to the docker issue for anyone that's unaware. There's already a confirmation that downgrading systemd lets you run docker with MuQSS.

      https://github.com/docker/for-linux/issues/552

      Delete
  12. I regret to inform you that after a long use of the MuQSS, I decided to try the CFS + cgroup + ulatencyd combination and this combination turned out to be more beneficial for use on the desktop.

    Although the system began to use more RAM, with a large load it behaves more smoothly and more responsively. There is also no interruption of sound reproduction. In normal operation, the consumption of electricity has decreased.

    ReplyDelete
    Replies
    1. Switched back to CFS as well here; although without ulatencyd (as it has been abandoned).

      CFS, tickless, 100 Hz, BFQ ( and scsi_mod.use_blk_mq=1 ). Since there is even talk of outright dumping all legacy IO schedulers and there seems to be some sort of interaction between at least BFQ_MQ and cgroups; the latter of which MuQSS does not support well.

      Not overly happy with recent developments in the kernel but, well... there are politics in play as well and the direction is set. So... what can we do but adapt?

      Delete
  13. I did an effort to create a "tickless" system with the complete -ck patchset, but i think i have something wrong with my .config for that.

    Creating "make defconfig" does not seem to set CONFIG_NO_HZ_FULL so i am not sure what is the correct way to implement this tbh.

    What i did experience was that when compiling with -j12 (i7 8700K), the desktop was more or less useless, and i even had a gcc error spewing out something about "resource temporarily unavailable". So obviously i have done something wrong when setting the build parameters.

    Could you point me to something that MUST be set for a full tickless and amazing performing system? :) Eg. CONFIG_NO_HZ_FULL and stuff like that.
    Using only MuQSS and CONFIG_NO_HZ_IDLE++ seems to be oki, but wanted to try the "full tickless" type of system.

    ReplyDelete
    Replies
    1. I did it once, full tickless with MuQSS. What I did was that I used the base Ubuntu generic kernel; recompiled that as full tickless and use the config that resulted from that as a basis to use for a tickless MuQSS kernel.

      It did work.

      But, accounting is off with a tickless MuQSS, the consequences of which might be harmless but personally, I am not sure we can rule out problems with CPU states as well as governors functioning normally when accounting is off as is the case with tickless MuQSS.

      Delete
    2. I keep trying to tell people to not make completely tickless kernels. No idle ticks is ideal for MuQSS. There is no advantage to a completely tickless kernel even for mainline for a normal desktop or mobile device.

      Delete
    3. And yet, kernels across the board seem to be adopting it:

      - Mainline Linux (for years now)
      - FreeBSD (from 9 on)
      - The Solaris kernel (from Solaris 8 on)
      - The NT kernel (from Win 8 on)
      - The Zircon kernel (Google's new microkernel)

      So, here we have kernels aimed at desktops, servers as well as mobile and even embedded devices all adopting or at the very least allowing a tickless mode. Whether or not there is an actual advantage to it is becoming moot. It is simply becoming the industry standard.

      Delete
    4. No, it's not there's no advantage. It's DISadvantageous, but that's fine you can keep posting here to taunt me on this issue.

      Delete
    5. Oh, then i misunderstood tbh. You wrote:
      ---
      As people (or is it person?) keep repeating ad nauseam. Perhaps the ck patches should become part of muqss since muqss is intrinsically a tickless scheduler and relies on highres timeouts to work properly, but unfortunately the highres timers in mainline are stupidly tick resolution limited...
      ---

      And i thought you actually meant that -ck patchset was MEANT TO be configured as "tickless" (CONFIG_NO_HZ_FULL). Guess i did not really grasp the meaning of that :)

      It works fine with CONFIG_NO_HZ_IDLE=y tho.

      Delete
    6. My bad then. I mean intrinsically it doesn't depend on ticks as such, but the configuration of ticks should be nohz idle as you correctly figured out.

      Delete
  14. Apparently the 4.20.8 patch breaks the build for 4.20-ck1 for at least kvm-intel with the following error:

    ERROR: "sched_smt_present" [arch/x86/kvm/kvm-intel.ko] undefined!

    Reverting this commit allows the build to finish: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v4.20.8&id=f29a8be0e5d28f89c835cbae700e67a383280916

    I'm assuming the proper fix would be adding "EXPORT_SYMBOL_GPL(sched_smt_present);" somewhere in MuQSS sources..

    ReplyDelete
    Replies
    1. Thank you very much, sir.

      Delete
    2. Seems as that is the change for PDS scheduler too. Will see if i can do a test compile when i get home.
      The change is in /kernel/sched/MuQSS.c
      Adding the line like this i guess:
      +DEFINE_STATIC_KEY_FALSE(sched_smt_present);
      +EXPORT_SYMBOL_GPL(sched_smt_present);
      +#endif

      Delete
    3. Seemed to work for me:
      ---
      diff --git a/0001-MultiQueue-Skiplist-Scheduler-version-v0.185.patch b/0001-MultiQueue-Skiplist-Scheduler-version-v0.185.patch
      index 1b235e8..bf61ce0 100644
      --- a/0001-MultiQueue-Skiplist-Scheduler-version-v0.185.patch
      +++ b/0001-MultiQueue-Skiplist-Scheduler-version-v0.185.patch
      @@ -1598,7 +1598,7 @@ new file mode 100644
      index 000000000000..e8610b659791
      --- /dev/null
      +++ b/kernel/sched/MuQSS.c
      -@@ -0,0 +1,7437 @@
      +@@ -0,0 +1,7438 @@
      +// SPDX-License-Identifier: GPL-2.0
      +/*
      + * kernel/sched/MuQSS.c, was kernel/sched.c
      @@ -1828,6 +1828,7 @@ index 000000000000..e8610b659791
      +
      +#ifdef CONFIG_SCHED_SMT
      +DEFINE_STATIC_KEY_FALSE(sched_smt_present);
      ++EXPORT_SYMBOL_GPL(sched_smt_present);
      +#endif
      +
      +#else
      ---

      Delete
    4. Indeed, the above change looks correct and seems to work.

      Here's a patch against -ck patched kernel sources: https://pastebin.com/EPMEir9b

      Delete
    5. I get this with just the MUQSS patch:

      (Stripping trailing CRs from patch; use --binary to disable.)
      patching file kernel/sched/MuQSS.c
      patch unexpectedly ends in middle of line
      Hunk #1 succeeded at 227 with fuzz 1.

      Delete
    6. Is this a problem?

      Delete
    7. Updated patch: https://github.com/SveSop/kernel_cybmod/blob/MuQSS/0001-MultiQueue-Skiplist-Scheduler-version-v0.185.patch

      I should probably have named it _v2.patch or something, but well.. Soon 5.0 kernel, and it will be a new version anyway :)

      Delete
    8. All good.
      Thanks.

      Delete
    9. So I use just that one^ for MUQSS only (no ck)?

      Delete
    10. It is the MuQSS patch, and not the whole -ck patch set.

      You can either replace this "fixed" patch in your -ck patchset for a full -ck kernel, or use it as a single patch if you only want MuQSS. (My github has the full -ck patchset if interested in that).

      Delete
    11. Kind of late to the party, but you can refer to Zen Kernel's MuQSS branch if you're having trouble building - we probably already ran into and fixed many build problems due to stable patches.

      https://github.com/zen-kernel/zen-kernel/commit/7c660f6524371c3f9d693deb9595ff6c0725942c

      Delete
  15. the pastebin is missing a newline at the end which results in the error.
    It still applies fine regardless, as stated at the last line of patch output (or you can add the newline yourself if the error bothers you)

    ReplyDelete
    Replies
    1. And this was obviously meant to be a reply for the >=4.20.8 patch comments... :|

      Delete
  16. Any ETA for 5.0?

    ReplyDelete
    Replies
    1. Might be a while, I'm guessing. With the complete removal of legacy IO schedulers (yes, it happened) and BFQ's interaction with cgroups, MuQSS might need some work to be fully compatible with 5.0.

      Delete
    2. Yes.
      Thanks.
      MUQSS is still the best.
      Well worth to wait.

      Delete
    3. How would any of the IO scheduler changes require changes to MuQSS? Nothing was changed within the kernel's default CPU scheduler either for that reason.

      Also, there were barely any changes to BFQ, definitely nothing related to cgroups. Additionally MuQSS has never supported cgroups, so even if there was any such changes, I don't think they would require huge amounts of work.

      Energy Aware Scheduler is probably the biggest scheduler related change, but I'm not sure whether that requires big changes for MuQSS.

      Delete
    4. MuQSS not supporting cgroups is exactly the problem. Some time ago I predicted, in response to some proposed patches, the removal of the legacy IO schedulers.

      As of right now, I am also predicting that the hierarchical support (cgroup support; see KConfig for reference) of BFQ will become non-optional, it will become mandatory. An integral part of BFQ. At that point we can expect MuQSS to no longer fully support BFQ. And since BFQ is the only remaining viable option for an IO scheduler on non-SSD devices (MQ-DEADLINE is just a joke for heavy IO), well... I think the pattern should become obvious.

      Why do you think the legacy IO schedulers were removed? The argument was to simplify the code, maintainability. Obviously they are going to play that card for blk_mq and the mq schedulers as well. At which point the aforementioned hierarchical support of BFQ will become non-optional.

      Additionally, BFQ is being pushed HARD as the de-facto standard IO scheduler. And cgroups are likewise being pushed hard as the de-facto standard to priority handling.

      Regarding CFS not being changed -- CFS has fully supported cgroups from the day those were implemented. Since it predates cgroups (although not by much).

      Continuing not support cgroups will probably be fine for 5.0, probably even for the remainder of 2019. But at some point, they will simply become unavoidable. Probably mid-2020, I'm guessing.

      Delete
    5. So much misinformation... MuQSS has nothing to do with BFQ, nor anything to do with any I/O schedulers. -ck also has nothing to do with BFQ nor any I/O schedulers.

      Delete
    6. It's actually already up on git. Lacking only separate patches and an announce.

      Delete
    7. looking forward to the availability of the patches. Somehow I am too stupid to create them from git on my own ...

      Delete
    8. How to git2patch(es)?

      Delete
    9. The patches are uploaded. Too busy to announce right now.

      Delete
  17. [ 1.535455] ------------[ cut here ]------------
    [ 1.535460] Current state: 1
    [ 1.535464] WARNING: CPU: 1 PID: 0 at 0xffffffff8108c865
    [ 1.535466] Modules linked in:
    [ 1.535469] CPU: 1 PID: 0 Comm: MuQSS/1 Not tainted 5.0.7-ck1 #3
    [ 1.535471] Hardware name: System manufacturer System Product Name/Crosshair IV Formula, BIOS 3029 10/09/2012
    [ 1.535474] RIP: 0010:0xffffffff8108c865
    [ 1.535476] Code: 04 77 29 89 f1 ff 24 cd 38 76 a0 81 80 3d 53 1b bd 00 00 75 17 89 c6 48 c7 c7 90 c6 ad 81 c6 05 41 1b bd 00 01 e8 7b ae fa ff <0f> 0b 48 83 c4 08 5b c3 48 8b 47 60 48 85 c0 75 64 83 fe 03 89 73
    [ 1.535480] RSP: 0018:ffff888437c43f50 EFLAGS: 00010082
    [ 1.535482] RAX: 0000000000000010 RBX: ffff888437c504c0 RCX: ffffffff81c1fdb8
    [ 1.535483] RDX: 0000000000000001 RSI: 0000000000000086 RDI: ffffffff81f8fcac
    [ 1.535485] RBP: 7fffffffffffffff R08: 00000000000001f0 R09: 0000000000000000
    [ 1.535487] R10: 0720072007200720 R11: 0720072007200720 R12: 7fffffffffffffff
    [ 1.535489] R13: ffff888437c56900 R14: ffff888437c569f8 R15: ffff888437c56a38
    [ 1.535491] FS: 0000000000000000(0000) GS:ffff888437c40000(0000) knlGS:0000000000000000
    [ 1.535493] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 1.535494] CR2: 0000000000000000 CR3: 0000000001c0c000 CR4: 00000000000006e0
    [ 1.535496] Call Trace:
    [ 1.535498]
    [ 1.535500] 0xffffffff8108e7cb
    [ 1.535502] 0xffffffff810817cb
    [ 1.535503] 0xffffffff81601568
    [ 1.535505] 0xffffffff8160117f
    [ 1.535506]
    [ 1.535507] RIP: 0010:0xffffffff8100f592
    [ 1.535509] Code: 0f ba e0 24 72 11 65 8b 05 bb eb ff 7e fb f4 65 8b 05 b2 eb ff 7e c3 bf 01 00 00 00 e8 17 e0 07 00 65 8b 05 a0 eb ff 7e fb f4 <65> 8b 05 97 eb ff 7e fa 31 ff e8 ff df 07 00 fb c3 66 66 2e 0f 1f
    [ 1.535512] RSP: 0018:ffffc9000007bf00 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
    [ 1.535515] RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000001
    [ 1.535516] RDX: 000000005b7f1466 RSI: 0000000000000001 RDI: 0000000000000380
    [ 1.535518] RBP: ffffffff81c601a8 R08: 0000000000000000 R09: 0000000000019840
    [ 1.535520] R10: 0000001e3c819be7 R11: 000000007260bc7a R12: 0000000000000000
    [ 1.535522] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
    [ 1.535524] 0xffffffff8105bd2f
    [ 1.535526] 0xffffffff8105bf5b
    [ 1.535527] 0xffffffff810000d4
    [ 1.535529] ---[ end trace 71fe021b29fa5d1f ]---

    I am having this problem on all my phenom 2 systems, looks like some kind of interrupt problem, I tried to enable nothreadedirqs option and also enabled fix for broken boot irqs option but none of them had any effect on this, I enabled stack traces but for some reason he don't show them :x

    anyone might know what this is ? I searched around and found this which may be helpful: https://pastebin.com/y0aXvBNP
    (this is not mine but it looks very much like mine)

    thank :)

    ReplyDelete