Announcing
a new -ck release, 5.6-ck2 with the latest version of the Multiple
Queue Skiplist Scheduler, version 0.2. These are patches designed to
improve system responsiveness and interactivity with specific emphasis
on the desktop, but configurable for any workload. This is a maintenance release to address a build failure on -ck1 when built with full dynticks, and fix some cosmetic CPU load accounting issues. Upgrading is not required unless you are affected by the aforementioned issues or are rebuilding for a new stable release. It's worth pointing out that changing the reported load might have repercussions with how CPU frequency scaling behaves.
linux-5.6-ck2:
-ck2 patch:
Git tree:
MuQSS only:
Download:
Git tree:
Web: http://kernel.kolivas.org
As an aside, it has been brought to my attention that the MESA code uses SCHED_IDLEPRIO for what it considers low priority threads. In the mainline kernel this only makes them lower priority than regular tasks, but on MuQSS which has true idle scheduling, it can potentially lead to stalls under conditions of load. Once a thread has stalled for an extended time, it is possible that it will not progress normally depending on how the code expects to run. This could lead to GUI stalls in applications that use MESA, of which there are quite a few now, such as firefox. I've been considering submitting a change to the MESA code in the hope they approach this differently, but I am a pragmatist and expect the turnaround time and acceptability of the changes may be drawn out and unsatisfactory. So I am considering alternatively softening the idle scheduling and making it configurable to behave more like mainline's by default and optionally be set to be true idle scheduling. In the meantime, I've prepared some hacked mesa packages for those on ubuntu 20.04 variants that disable this behaviour, but this is a cludge only for the time being:
mesa-ubuntu20.04 packages
Here is a much better patch for Mesa that converts threads to nice 19 SCHED_BATCH instead:
0001-Linux-Change-minimum-priority-threads-from-SCHED_IDL.patch
Enjoy!
As an aside, it has been brought to my attention that the MESA code uses SCHED_IDLEPRIO for what it considers low priority threads. In the mainline kernel this only makes them lower priority than regular tasks, but on MuQSS which has true idle scheduling, it can potentially lead to stalls under conditions of load. Once a thread has stalled for an extended time, it is possible that it will not progress normally depending on how the code expects to run. This could lead to GUI stalls in applications that use MESA, of which there are quite a few now, such as firefox. I've been considering submitting a change to the MESA code in the hope they approach this differently, but I am a pragmatist and expect the turnaround time and acceptability of the changes may be drawn out and unsatisfactory. So I am considering alternatively softening the idle scheduling and making it configurable to behave more like mainline's by default and optionally be set to be true idle scheduling. In the meantime, I've prepared some hacked mesa packages for those on ubuntu 20.04 variants that disable this behaviour, but this is a cludge only for the time being:
mesa-ubuntu20.04 packages
Here is a much better patch for Mesa that converts threads to nice 19 SCHED_BATCH instead:
0001-Linux-Change-minimum-priority-threads-from-SCHED_IDL.patch
Enjoy!
お楽しみ下さい
-ck
Could you link to the mesa email thread or bug you've created, please?
ReplyDeleteI haven't made a thread or bug report (yet?)
DeleteThank you very much.
ReplyDeleteCould you clarify what was changed for the custom mesa packages, so that could possibly be implemented for other distributions as well? I tried digging trough the package files and comparing to the original ubuntu package files, but couldn't figure this out.
ReplyDeleteJust this https://aargh.no-ip.org/u_queue-no-sched-idle.patch
DeleteThank you.
DeleteFrom a quick glance at the code, it looks like currently Mesa only uses SCHED_ISO when creating shader caches, so I'd imagine there wouldn't be any real world difference between the SCHED_ISO implementations in -ck and mainline kernels in this case. In theory -ck kernel's SCHED_ISO may even result in better performance, since the cache creation would leave more resources available for more important things. (Note that I'm not any kind of expert in Mesa or GPU stuff in general, so I could be mistaken on this).
In any case, this would mainly have any effect with some games, and only on the first run when the caches are being created (which often happens when loading the game, rather than during the actual gameplay).
Of course, there's a chance that Mesa, or other applications would use SCHED_ISO somewhere where the different implementations actually matter, so it probably wouldn't hurt to have an upstream-compatible idle scheduling solution available.
Nevertheless, I think recompiling Mesa over this is probably not worth the trouble at the moment.
As far as I can see, it uses SCHED_IDLE which is the issue entirely. Not sure what you're looking at. There is no use of SCHED_ISO.
DeleteSee:
Deletehttps://cgit.freedesktop.org/mesa/mesa/tree/src/util/u_queue.c#n334
Sorry, I meant SCHED_IDLE but my tired brain messed up.
DeleteYou can replace SCHED_ISO with SCHED_IDLE on every instance on the above post.
This has been confirmed to create GUI stalls which go away when the scheduling policy is removed, so whilst it sounds good in principle, in reality it is problematic. SCHED_IDLE in mainline is very different to that in MuQSS.
DeleteInteresting. Thanks for clarification.
DeleteIs there some specific test case for the GUI stalls? Or other information/discussions about this somewhere?
Here is a more comprehensive solution to the problem which still makes the most of Linux's ability to set policy and priority by thread only by switching to nice 19 SCHED_BATCH instead:
Deletehttp://ck.kolivas.org/mesa-ubuntu20.04/0001-Linux-Change-minimum-priority-threads-from-SCHED_IDL.patch
If I can ever figure out the mesa submission process I will try to submit a merge request.
And here's the merge request. Fingers crossed:
Deletehttps://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4912
Thanks! https://www.phoronix.com/scan.php?page=news_item&px=Mesa-GUI-Stalls-Fixed-Kolivas
Deletemay be correct?
ReplyDelete--- pid_t tid = gettid();
+++ pid_t tid = syscall(SYS_gettid);
I see your point. gettid() was added to glibc 2.30. I suppose it's possible someone will be building a new mesa on an old distribution which would require the syscall wrapper.
Deleteyes, that's right.
DeleteThanks, I've added it to the git merge request.
DeleteThanks.
DeleteThe mesa devs were very accommodating and helped me negotiate their convoluted merge process and the code is now in the master mesa git which is great news.
Deletecool.
Deletewhen to expect release with these changes?
will these changes be backported to older versions of the mesa?
No idea of their process of merging to stable releases. I seriously doubt it will be backported unless there's a strong push by someone saying it's a bugfix. I suspect we won't be seeing this change in stable packages in distros till their next release cycle unless you're on a rolling distro.
DeleteSo, does this mean it's "ok" to use full dynticks config? (tickless), or is it just to fix compile problems for those that insist on using tickless? :)
ReplyDeleteI seem to remember some posts a while back about that kinda concluded that it was not really recommended with ck/muqss?
Yes it's fine now with -ck2, it was always a problem with forced context tracking, but is still unlikely to be advantageous.
DeleteSomehow, I'm never able to boot both ck1 and ck2 kernel on my new ThinkPad X390 (i7 Gen10). I tried with/without irqthread and both idle tickless and full dynticks, all combinations with no success.
ReplyDeleteWorst case scenario, try disabling all runqueue sharing which is the riskiest part of the code. Add the following kernel command line to test it without needing to rebuild:
Deleterqshare=none
I press the power button a few times for a short while when booting stalls, it will boot up on my machine eventually.
DeleteI tried rqshare=none and also other option (e.g. smt, all) nothing work. I also tried them with nothreadirqs still unable to boot.
DeleteI remember having similar issue with some combination of CONFIG_HZ, threadirqs, rqshare, rr_interval & iso_cpu settings.
DeleteYou can also try with different HZ (100 or 1000), periodic ticks (nohz=off command line parameter). Or maybe some combination of nohz, 1000HZ and muqss-only (without other patches).
There also might be some distro or config-related problem.
Sounds strange cwt.
DeleteAre you sure you use the same kernel (source/patches) as the one that you normally use?
Just asking cos if you are using a distro kernel (eg. Ubuntu kernel), and do not use the same source you could run into some patched kernel driver or whatnot that is not added to your "custom" kernel with MuQSS.
I use vanilla kernel with minimal patches, one from https://github.com/graysky2/kernel_gcc_patch/blob/master/enable_additional_cpu_optimizations_for_gcc_v9.1%2B_kernel_v5.5%2B.patch to enable native optimization, another one is https://github.com/dolohow/uksm/blob/master/v5.x/uksm-5.6.patch.
Deletewith these two patches and config from distro (fedora 32), I just make some changes like "Preemptible Kernel" and remove some drivers for hardware that don't exist on my laptop, the kernel is working perfectly. However after added ck patch or even just MuQSS patch, the kernel won't boot.
Can You remove "quiet" from the kernel commandline, try to boot and share what's on the screen when it crashes?
DeleteYeah, "it won't boot" mean it won't boot at all, not boot for a while and crash. However, here the screen, just a cursor on the top left. http://imgur.com/gallery/kn7HlhA
DeleteThe only time recently i had that happen, was when i tried to compile the kernel with gcc-10. Just black screen with the "cursor" on the left corner.
DeleteDid not bother to debug it more, cos gcc-10 i use is not really "production" ready i guess (Ubuntu testing PPA). I am only mentioning this, since i think Fedora 32 got gcc-10 now?
Something about CONFIG_STACKPROTECTOR_STRONG or some shit, but it should then happen anyways, and not only when using MuQSS i guess?
Yes, Fedora 32 came with gcc10, but with the same compiler I can build and run vanilla kernel. I will try again without native optimization or with gcc9.
DeleteSome information about gcc-10 vs. kernel
Deletehttps://github.com/Frogging-Family/linux-tkg/issues/7
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit?id=f670269a42bfdd2c83a1118cc3d1b475547eac22
Mentioning the first one, cos this is a custom kernel with different schedulers - like muqss/bmq/pds and the last one actually mentioning that 5.6 kernel is not 100% ready for gcc-10.
https://lore.kernel.org/lkml/20200423161126.GD26021@zn.tnic/
I have not tried it, but if you find the kernel boots fine when compiled with gcc-9 (like it does for me), you could try skim through this and test the patch with gcc-10?
I also cannot boot with CK kernel 5.6 on Fedora 32. I always build a custom kernel, and vanilla (5.6/5.7) / BMQ (5.6) kernels both work fine. With CK kernel using GCC or LLVM I cannot boot at all, just gets a black screen, removing quiet option does nothing the kernel never outputs anything.
DeleteI also tried the GCC patch and it does not help, but given it didn't work with LLVM I guess that makes sense.
You could try the 5.6-muqss tree. There are some fixes in there.
DeleteFedora 32 cannot install and run gcc9 on it easily, so I tried something by built the kernel with clang10 instead, and the result was still the same that ck kernel cannot boot. However, vanilla kernel that built with clang10 can boot and run perfectly. I also tried BMQ patch with gcc10 and the kernel is working too. Is it possible that there is something in CK or MuQSS patch that cause (or trigger) a problem with gcc10?
DeleteUsing 5.6-muqss tree makes no difference, still unable to boot.
DeleteArch Linux just updated to gcc-10.1, and I was able to successfully build and boot 5.6.13-ck1 kernel (built with -O3 -march=native, and no other changes) with it.
DeleteThis comment has been removed by the author.
DeleteThis comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteHi.
ReplyDeleteI have panic when kernel is building with option CONFIG_PSI=y
https://pastebin.com/vnV1GkdD
Did you try to boot with psi=0? dmesg don't report about psi with this option
DeleteIn config
CONFIG_PSI_DEFAULT_DISABLED=y
---
Currently schedutil also doesn't work correctly with alternative schedulers (muqss, pds, bmq)
ok. thanks.
DeleteHi,
ReplyDeleteI get an error with the eth0 interface in my USB-C adapter when using the linux-ck kernel in Archlinux. The problem is not present with stock kernels, so I guess it may be due to the ck patchset. The issue is described here:
https://bbs.archlinux.org/viewtopic.php?pid=1907858#p1907858
I can provide additional info if needed. Thanks.
If you've built with irqthreads force enabled, try disabling them. Add the nothreadirqs parameter to your kernel command line. Otherwise, try the extra bugfixes committed to the muqss git tree.
DeleteThanks Con, I tried with the nothreadirqs flag to no avail. I never built the patched kernel so I don't know how to try the muqss git. If there's nothing else that I can try then I'll stick with the stock kernel and see if the new patchset for Linux 5.7 will fix the problem.
DeleteIt turned out that the problem was not related to CK patchset, and was present also in vanilla kernel. The cuplrit was the powersave for the ethernet adapter of the USB-C dock, disabling it by means of a udev rule fixed the problem. I'm sure I had no problem at the end of February, but even reverting to kernel/firmware versions at that date didn't fix the issue (just gave different, mitigated phenomenology). Nevertheless, the patchset was ok, so sorry for the noise and thanks for the support.
ReplyDeletetanx
ReplyDelete