tag:blogger.com,1999:blog-6469704299235308349.post2746605735175141216..comments2024-03-28T15:50:13.644+11:00Comments on -ck hacking: MuQSS - The Multiple Queue Skiplist Scheduler v0.112ckhttp://www.blogger.com/profile/02904761195451530213noreply@blogger.comBlogger22125tag:blogger.com,1999:blog-6469704299235308349.post-30084412614115080232016-10-28T04:32:48.158+11:002016-10-28T04:32:48.158+11:00Edit: It might also be worthy to note this was usi...Edit: It might also be worthy to note this was using the schedutil CPUFreq scheduler.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-38971875463414615702016-10-28T04:31:53.523+11:002016-10-28T04:31:53.523+11:00Running MuQSS (by means of the Liquorix kernel), a...Running MuQSS (by means of the Liquorix kernel), also see<br />https://liquorix.net/atom<br /><br />Just earlier, the combination of PulseAudio suspended via pasuspender (to run an application using ALSA) while alt-tabbing to Google Chrome to doublecheck something caused an unrecoverable stall, a clean shutdown was no longer viable (it would've probably taken hours, everything was incredibly slow).Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-75253226038983265072016-10-21T04:15:41.309+11:002016-10-21T04:15:41.309+11:00@ck:
Also with kernel 4.7.9 and your 4.8 addons up...@ck:<br />Also with kernel 4.7.9 and your 4.8 addons upto 7e3bed6f from github, with my full combo, everything is behaving well. Nice :-)<br /><br />BR, Manuel KrauseAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-88125533835414871452016-10-20T17:01:50.726+11:002016-10-20T17:01:50.726+11:00Ok...I built and am running x86 SMP and i686 noSMP...Ok...I built and am running x86 SMP and i686 noSMP, both thru 27fe1ef (fix for UP), and also the just released BFQ v8r4, and the 4.8.3-rc test release, plus a couple upstream patches. All good... :)jwh7https://www.blogger.com/profile/09659185315567537391noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-21902825174140839332016-10-20T07:30:00.814+11:002016-10-20T07:30:00.814+11:00Yes that's correct, thanks for testing it. I m...Yes that's correct, thanks for testing it. I may be able to go back to the old way of yielding (like BFS) now that I've fixed other bugs in the code but your testing needs to confirm that's where the problem lies.ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-19222035081523810222016-10-20T07:28:40.999+11:002016-10-20T07:28:40.999+11:00You mean
export __GL_YIELD=NOTHING
?
it's ...You mean <br /><br />export __GL_YIELD=NOTHING<br /><br />?<br /><br />it's seemingly running better with it,<br /><br />but needs more testing<br /><br />ThankskernelOfTruthnoreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-72850893410134288052016-10-20T01:18:13.340+11:002016-10-20T01:18:13.340+11:00Commit cc32bf3* seems to define set_nr_and_not_pol...Commit cc32bf3* seems to define set_nr_and_not_polling for "CONFIG_SMP && TIF_POLLING_NRFLAG" -and- its else, but set_nr_if_polling is not defined for the else declaration.<br /><br />* Implement wake lists for CPUs that don't share cachejwh7https://www.blogger.com/profile/09659185315567537391noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-31246622153994340312016-10-19T23:54:37.386+11:002016-10-19T23:54:37.386+11:00@ck, my i686 noSMP build fails with the latest pat...@ck, my i686 noSMP build fails with the latest patches thru 35d6279 (Merge branch '4.8-muqss'), but it succeeded with 1b7e569 (cacheline alignment). x64 built fine with the latest though; I'm running it now.<br /><br />Here are the build log output snips: https://gist.github.com/jeremywh7/944c10e189300086f1de58b8fa7fc0b4#gistcomment-1901214jwh7https://www.blogger.com/profile/09659185315567537391noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-13891756446760331502016-10-19T09:01:08.605+11:002016-10-19T09:01:08.605+11:00Try the gl yield usleep workaround. It could be th...Try the gl yield usleep workaround. It could be the very aggressive change I did to sched_yield which may not be required any more.ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-87664041181863685172016-10-19T08:29:13.126+11:002016-10-19T08:29:13.126+11:00Okay, great :)
I'm however still seeing some ...Okay, great :)<br /><br />I'm however still seeing some kind of occasional stuttering<br /><br /><br />Reproducer:<br /><br />running Konqueror (4.14.24, which seems to use QT5)<br />or<br />Chromium (55.0.2873.0, 64-bit)<br /><br /><br />then browsing via Mouse through the bookmarks<br /><br />while moving the Mouse-Pointer up or down it "hangs" (like it is stuck, [driver?] transmission interrupted] then continues after 0.5-2 secs<br /><br />Chromium has a bookmark file 20+ MiB, <br /><br />in the case of Konqueror it was just launched and going to Settings -> scrolling down<br /><br />(the intention was to move to Settings -> Load View Profile -> Filesystem)<br /><br /><br />compiz is running in default mode WITHOUT workarounds or export __GL_YIELD=USLEEP<br /><br /><br />X and the whole system was NOT yet run with reniced IRQs (threadirqs appended to kernel)<br /><br />e.g.<br /><br />pnvidia=$(pgrep "irq/.*-nvidia")<br />[[ -n $pnvidia ]] && chrt -f -p 84 $pnvidia<br /><br />was NOT used<br /><br /><br />This has happened before on MuQSS, if I recall correctly indistriminate of compiz, kwin or xfwm4 window decorator/compositor<br /><br /><br />ThankskernelOfTruthnoreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-44515412149577587062016-10-19T05:40:49.590+11:002016-10-19T05:40:49.590+11:00@ck:
The issues I've reported last time for v0...@ck:<br />The issues I've reported last time for v0.111 have completely gone away with MuQSS v0.112 (without changes to the rest of the system software).<br />Thank you for your great work!<br />With this test run I've also been lucky to find a tunable again for my ancient TOI revision, named "no_flusher_thread", that, defaulting to 1 and now set to 0, makes the whole combination (MuQSS, BFQ, WBT, TOI) work fine without failures or performance regression. I'm glad that I can report 10 successful hibernations, done from time to time, within 1 1/2 days uptime atm.<br />Maybe that tunable eases some race condition/ timing issue, that an effective MuQSS brings into that old TOI algorithms. Painful, that I don't have enough programming knowledge to interpret it in depth.<br /><br />BR, Manuel KrauseAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-33573227456948156692016-10-19T00:41:40.494+11:002016-10-19T00:41:40.494+11:00@ulenrich; I've been running MuQSS on an old A...@ulenrich; I've been running MuQSS on an old Asus EeePC 701 from 2007 without issue. Has a unicore Intel Celeron-M ULV 353 running at 630 MHz stock; luckily I can overclock it to 990 MHz.jwh7https://www.blogger.com/profile/09659185315567537391noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-73077915092729313532016-10-18T20:44:00.325+11:002016-10-18T20:44:00.325+11:00Yeah, I can remember: You wrote about some years a...Yeah, I can remember: You wrote about some years ago how to allocate dynamically more scheduler units (run queues or some alike) .... <br /><br />The old talking point of BFS that at some number of processors BFS isn't well performing ... isn't any more!<br />Also this was an obstacle going mainline, wasn't it?<br />ulenrichhttps://www.blogger.com/profile/14868668041053017590noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-59771607374116514912016-10-18T19:12:19.528+11:002016-10-18T19:12:19.528+11:00The idea is that this scheduler is a drop-in repla...The idea is that this scheduler is a drop-in replacement for BFS where you won't notice any difference at all; this is why it took me years to come up with a design that had the best of both worlds. It should be perfectly fine in an old mac mini.ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-61559719125650483312016-10-18T19:06:31.319+11:002016-10-18T19:06:31.319+11:00To run this new scheduler on a 2009 Mac Mini Core ...To run this new scheduler on a 2009 Mac Mini Core Duo Intel processor machine: How much a slow down compared to using BFS would one experience?<br />How big an overhead of this "it takes a thousend" cpus scheduler is it?ulenrichhttps://www.blogger.com/profile/14868668041053017590noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-37516075376650107142016-10-18T08:25:44.386+11:002016-10-18T08:25:44.386+11:00Thanks, KoT. It's probably not your imaginatio...Thanks, KoT. It's probably not your imagination since there were quite significant scheduling logic issues missing until 112. It should now be equal to or better than BFS in every way, and as you see from the comments section here, you're not the only one who's noticed it.ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-19591160701341990912016-10-18T08:04:59.738+11:002016-10-18T08:04:59.738+11:00Con,
I'm not certain whether it's only i...Con, <br /><br />I'm not certain whether it's only imagination but with .112 reaction with compiz is very very good: <br />no delays during app switching right now<br /><br /><br />Might run compiz with sched_yield later to see how that works out<br /><br />also portage (Gentoo Linux' package manager) seems to work really quickly with it<br /><br />Once a new Chromium version is out I'll do a compilation test (update) and see if I can run additional backup jobs to really stress the system to see if I can still do work (that would be close to the ultimate test, well - in the extreme probably adding a game to the mix - we'll see about that ;) )<br /><br /><br />Thanks !kernelOfTruthnoreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-45311048883342857882016-10-18T05:34:32.572+11:002016-10-18T05:34:32.572+11:00Thanks Con.
Updated the results with MuQSS112 (int...Thanks Con.<br />Updated the results with MuQSS112 (interactive=1 only).<br />https://docs.google.com/spreadsheets/d/1ZfXUfcP2fBpQA6LLb-DP6xyDgPdFYZMwJdE0SQ6y3Xg/edit?usp=sharing<br /><br />The performance is more or less the same as with MuQSS110, with a slight improvement on make -j2.<br /><br />PedroAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-74255121180833735172016-10-17T23:46:05.616+11:002016-10-17T23:46:05.616+11:00Running x86-64 MuQSS v112 no problems; also did a ...Running x86-64 MuQSS v112 no problems; also did a suspend/resume for good measure. Looking good!jwh7https://www.blogger.com/profile/09659185315567537391noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-16261826452102449852016-10-17T23:15:54.362+11:002016-10-17T23:15:54.362+11:00Oops I should have said create a pull request and ...Oops I should have said create a pull request and I'll see what it looks like, thanks!ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-47956194058670204232016-10-17T23:09:21.973+11:002016-10-17T23:09:21.973+11:00Great thanks! I've been toying with the idea o...Great thanks! I've been toying with the idea of just using the same runqueue variables that mainline does instead of most of those atomic counters anyway, leaving only the idle CPU map.ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-2146109013637364012016-10-17T23:05:32.376+11:002016-10-17T23:05:32.376+11:00I've been trying various versions of MuQSS (wh...I've been trying various versions of MuQSS (while staying on BFS for production/work) and muqss-112 is the first one that passes my "wiggle test" - niced kernel build on all 8 vcores, playing a HD movie in vlc and frantically jiggling a terminal window around no longer causes stalls or jerks; it's completely smooth all the time,as if idle. Well done! \o/<br /><br />One suggestion: I've noticed that the global_rq contains various atomic_t counters. It might be a good idea to make them cacheline-aligned so that they don't incur false sharing, which can lead to pretty pathological stalls esp. with contended SMT threads. I can create a GH pull if you like.Holger Hoffstättehttps://www.blogger.com/profile/05303916065597906848noreply@blogger.com