Comments on -ck hacking: linux-5.2-ck1, MuQSS version 0.193 for linux-5.2

Thank you very much, sir.

2019-10-13T01:19:02.507+11:00

Thank you very much, sir.

Might as well post the full command in case the ab...

2019-10-12T12:37:38.194+11:00

Might as well post the full command in case the above link is updated:

sed -i -e '/^-CFLAGS/ s,+=,:=,' -i -e '/^+CFLAGS/ s,+=,:=,' patch-5.2-ck1

linux-ck PKGBUILD on Arch User Repository has a on...

2019-10-12T12:33:39.036+11:00

linux-ck PKGBUILD on Arch User Repository has a oneliner fix you can run against the patch-5.2-ck1 file: https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=linux-ck#n138

(Replace the './"${_ckpatch}"' at the end with the location of the patch, and it should apply against >=5.2.18 afterwards)

Any patch for the inexperienced?

2019-10-08T11:13:22.994+11:00

Any patch for the inexperienced?

Thanks for reporting. I am hoping for 5.3-ck1 to a...

2019-10-03T08:49:51.527+10:00

Thanks for reporting.
I am hoping for 5.3-ck1 to arrive soon.

A commit (https://git.kernel.org/pub/scm/linux/ker...

2019-10-02T22:49:54.611+10:00

A commit (https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/tools/objtool?h=v5.2.18&id=47af17950b03b748eea68ad7613f8d8b4c688d45) in the 5.2.18 patch conflicts with 5.2-ck1 (due to https://github.com/ckolivas/linux/commit/40846db6244abc4696bcad4f889016e1952630f4).

It should be simple enough to fix by hand, but I thought I'd mention it here, if anyone's looking for a explanation.

Sveinar, this is interesting, thanks for info. I&#...

2019-09-30T16:58:26.826+10:00

Sveinar, this is interesting, thanks for info.
I'll go and check PDS, I thought it was dead :) I switched to MuQSS because BMQ had teething issues and PDS was not updated to newer kernels.
But MuQSS was acting weird with runqueues and llc, so I hopefully fixed it (at least I tried), it's working well now.

Nevertheless, for the sake of interest, I'll check my usual stuff on PDS similarly how I tested my patches to MuQSS, let's see how it performs.
I need to throw vanilla kernel into the mix as well.

BR, Eduardo

Better.. hmm.. I used PDS mostly with 4.x kernel ...

2019-09-30T07:03:50.964+10:00

Better.. hmm..
I used PDS mostly with 4.x kernel branch, and most things worked very well. Then 5.x came and BMQ came, but there was quite a few "starting issues" with BMQ, so i ended up with MuQSS.

I feel PDS is working well for me, but to really compare i would need to do all 3 for 5.3 i guess... but i dunno if i have it in me to fiddle with it.

Perhaps it could be an idea for Phoronix to do some comparison tests with the Phoronix test suite? Would be a fun experiment to suggest. Possibly also comparing AMD/Intel and the different schedulers.

As with all things: "The absolutely best color is green!"

Meaning: Its all in the eye of the beholder :)

Thank you very much, sir.

2019-09-30T06:58:31.300+10:00

Thank you very much, sir.

https://github.com/SveSop/kernel_cybmod 0001 and ...

2019-09-30T06:55:44.159+10:00

https://github.com/SveSop/kernel_cybmod

0001 and 0002 is the PDS patches for 5.3 and is from the TK-Glitch git repo.

As I mentioned all of my patches are in my google ...

2019-09-30T06:55:22.481+10:00

As I mentioned all of my patches are in my google drive, address as usual: https://drive.google.com/drive/folders/1MxUcptaOgPbPgJoUdeq0GkEuoeyaRHdG

Sveinar, PDS works better than BMQ and MuQSS for You?

BR, Eduardo

Interesting. Care to share?

2019-09-30T06:41:30.682+10:00

Interesting.
Care to share?

Currently i am on 5.3 and a reworked PDS scheduler...

2019-09-30T05:38:03.478+10:00

Currently i am on 5.3 and a reworked PDS scheduler. This works quite well atm, but i will perhaps give it another go once -ck/MuQSS is put up for 5.3.

Somewhat limited timewise due to some IRL stuff i am working on atm.

So, I had some free time and I have made a nice pr...

2019-09-29T04:56:14.258+10:00

So, I had some free time and I have made a nice progress regarding Diablo3 stutter on Ryzen using smt. It seems that I have fixed it :)
There is 0007 patch on google drive for someone to try it out.
So, with this patch I'm not exactly sure how it behaves on Intel. I have changed the bits which select the best CPU to schedule task to, it now selects CPU a bit more accurately. I would like Con to look at it as I lack the knowledge of an idea why it was as it was before - CPU cache busyness was not checked in all cases just in siblings locality, however thread busyness is checked always (my guess is because it's not exactly a full core and task would not run as fast on sibling as on normal core, if it's free).
In addition, in this patch I have switched to using llc CPU map to check whether CPU caches are busy in all rq sharing cases, which should not change anything on Intel.

I have comparisons of performance as well: D3 behaves gut, Valley and MHO shows better results, especially with smt, compilations are a bit down since previous patches by fair bit, it seems to be on par with numbers from vanilla muqss, cs:go numbers are up with smt and a little down using anything else.
So after this patch smt seems to be best overall :)

If Sveinar and Anonymous are still around, please give this patch a bit of testing and report back how it behaves for You. Thanks. I have not booted this up on Intel, though :)

BR, Eduardo

I had some time yesterday, installed and ran cs:go...

2019-09-25T14:41:36.850+10:00

I had some time yesterday, installed and ran cs:go "FPS benchmark" map, results surprised me. One of the first times I have seen that mc is the slowest (at least on Ryzen):
LLC: Average framerate: 238.53
MC: Average framerate: 229.29
SMT: Average framerate: 238.79
NONE: Average framerate: 236.86

All went smooth, no stuttering and such, smt and llc were the same.
Settings were autodetected to max, I'll try lowering them next time I'll run the tests.
Strange results, but they are repeatable.

BR, Eduardo

Thanks, now it's clear about LLC. Interestingl...

2019-09-24T18:08:19.654+10:00

Thanks, now it's clear about LLC. Interestingly in this case LLC numbers are 0 and 2, not 0 and 1 as in case of Ryzen...
RQ and CPU orders seem to be right, localities are slightly different, but everything sort of looks ok.
At least theoretically I can see how small improvements could be observed in this CPU topology.

BR, Eduardo

I realize that I poorly phrased "2 dies with ...

2019-09-24T07:36:28.433+10:00

I realize that I poorly phrased "2 dies with split llc + single memory controller", I meant that its 2 dies each with their own share LLC (In this case a large shared l2 cache) with an off die memory controller.

The core 2 quad and its xeon counterparts, are advertised as having a 12mb l2 cache, but in reality it is 2x6MB.

as requested [ 0.519762] MuQSS possible/prese...

2019-09-24T07:00:41.737+10:00

as requested

[ 0.519762] MuQSS possible/present/online CPUs: 4/4/4
[ 0.519769] MuQSS locality CPU 0 to 0: 0
[ 0.519769] MuQSS locality CPU 0 to 1: 3
[ 0.519770] MuQSS locality CPU 0 to 2: 3
[ 0.519771] MuQSS locality CPU 0 to 3: 2
[ 0.519771] MuQSS locality CPU 1 to 0: 3
[ 0.519772] MuQSS locality CPU 1 to 1: 0
[ 0.519772] MuQSS locality CPU 1 to 2: 2
[ 0.519773] MuQSS locality CPU 1 to 3: 3
[ 0.519773] MuQSS locality CPU 2 to 0: 3
[ 0.519774] MuQSS locality CPU 2 to 1: 2
[ 0.519774] MuQSS locality CPU 2 to 2: 0
[ 0.519775] MuQSS locality CPU 2 to 3: 3
[ 0.519775] MuQSS locality CPU 3 to 0: 2
[ 0.519776] MuQSS locality CPU 3 to 1: 3
[ 0.519777] MuQSS locality CPU 3 to 2: 3
[ 0.519777] MuQSS locality CPU 3 to 3: 0
[ 0.519778] MuQSS sharing MC runqueue from CPU 1 to CPU 2
[ 0.519780] MuQSS sharing MC runqueue from CPU 0 to CPU 3
[ 0.519788] MuQSS CPU 0 llc 0 RQ order 0 RQ 0 llc 0
[ 0.519789] MuQSS CPU 0 llc 0 RQ order 1 RQ 1 llc 2
[ 0.519790] MuQSS CPU 1 llc 2 RQ order 0 RQ 1 llc 2
[ 0.519790] MuQSS CPU 1 llc 2 RQ order 1 RQ 0 llc 0
[ 0.519791] MuQSS CPU 2 llc 2 RQ order 0 RQ 1 llc 2
[ 0.519792] MuQSS CPU 2 llc 2 RQ order 1 RQ 0 llc 0
[ 0.519792] MuQSS CPU 3 llc 0 RQ order 0 RQ 0 llc 0
[ 0.519793] MuQSS CPU 3 llc 0 RQ order 1 RQ 1 llc 2
[ 0.519794] MuQSS CPU 0 llc 0 CPU order 0 RQ 0 llc 0
[ 0.519794] MuQSS CPU 0 llc 0 CPU order 1 RQ 3 llc 0
[ 0.519795] MuQSS CPU 0 llc 0 CPU order 2 RQ 1 llc 2
[ 0.519796] MuQSS CPU 0 llc 0 CPU order 3 RQ 2 llc 2
[ 0.519797] MuQSS CPU 1 llc 2 CPU order 0 RQ 1 llc 2
[ 0.519797] MuQSS CPU 1 llc 2 CPU order 1 RQ 2 llc 2
[ 0.519798] MuQSS CPU 1 llc 2 CPU order 2 RQ 3 llc 0
[ 0.519799] MuQSS CPU 1 llc 2 CPU order 3 RQ 0 llc 0
[ 0.519799] MuQSS CPU 2 llc 2 CPU order 0 RQ 2 llc 2
[ 0.519800] MuQSS CPU 2 llc 2 CPU order 1 RQ 1 llc 2
[ 0.519801] MuQSS CPU 2 llc 2 CPU order 2 RQ 0 llc 0
[ 0.519801] MuQSS CPU 2 llc 2 CPU order 3 RQ 3 llc 0
[ 0.519802] MuQSS CPU 3 llc 0 CPU order 0 RQ 3 llc 0
[ 0.519803] MuQSS CPU 3 llc 0 CPU order 1 RQ 0 llc 0
[ 0.519803] MuQSS CPU 3 llc 0 CPU order 2 RQ 2 llc 2
[ 0.519804] MuQSS CPU 3 llc 0 CPU order 3 RQ 1 llc 2
[ 0.519804] MuQSS runqueue share type LLC total runqueues: 2
[ 1.500417] MuQSS CPU scheduler v0.193 by Con Kolivas.

Just the reference about "slapping two dual c...

2019-09-24T05:57:50.200+10:00

Just the reference about "slapping two dual cores together and call it a quad": https://www.extremetech.com/computing/49528-core-2-quad-q6600-four-cores-for-the-masses/2

BR, Eduardo

If I remember correctly, there were times when int...

2019-09-24T03:34:37.445+10:00

If I remember correctly, there were times when intel created quad core CPUs by slapping two dual cores in the same package and call it a quad core :)
I'm not exactly sure how they organized LLC in that case, but it may well be that there were two LLCs.

Can Anonymous please pastebin results of "journalctl -b | grep -i muq", then we'll be more sure, how kernel sees that particular CPU.

On Ryzen LLC sharing (two queues, last 0006 patch) did not give any measurable performance boost or degradation for compilation tasks for my Ryzen 1700, but maybe Threadripper CPUs would get a boost, because cores / CCX etc. are organized in slightly different manner (and I don't know exactly how either) than my Ryzen. I have no access to TR, so I can not verify. TR even has two NUMA nodes.
I still need to test more stuff on that last LLC sharing patch.

If Anonymous shares the output, I could at least theoretically guess whether that may or may not help. We don't even know how cores are organized in that CPU.

BR, Eduardo

Yes, I'm not sure how performance improvements...

2019-09-23T22:29:01.334+10:00

Yes, I'm not sure how performance improvements from those changes on that CPU are possible either.

So is this with a "Core 2 Quad" processo...

2019-09-23T22:21:52.263+10:00

So is this with a "Core 2 Quad" processor? And this processor does not share any l2 cache?

Ref. https://ark.intel.com/content/www/us/en/ark/products/33924/intel-core-2-quad-processor-q9550-12m-cache-2-83-ghz-1333-mhz-fsb.html this processor have a "CPU Cache is an area of fast memory located on the processor. Intel® Smart Cache refers to the architecture that allows all cores to dynamically share access to the last level cache."

This is the same wording used for a I7 8700K aswell. I dunno what this implies, or if i am viewing the right processor tho. But does using the patches with "llc" show to any degree that it is actually differentiating this?

Should it show 2 runqueues when such a "separated" cache is used? (Cos i am fairly sure that it shows 1 for my 8700K when i tried).

I am just asking. I found no/little difference between llc and mc for the benchies i did, but i am willing to revisit this if it SHOULD be a different behavior.

I did some bench-marking using hl2 lost coast (bec...

2019-09-23T17:57:09.774+10:00

I did some bench-marking using hl2 lost coast (because it is cpu bound) average of 3 runs

"stock" ck patches rqshare=mc avg fps 269.85

ck+eduardo's patches rqshare=mc avg fps 273.23 +1.2%

ck+eduardo's patches rqshare=llc avg fps 277.45 +2.8%

as you can see, eduardo's patches definitely improves performance on my ancient setup.
I haven't experienced any regressions or bugs.

specifically between the stock multicoresiblings v...

2019-09-23T15:07:00.480+10:00

specifically between the stock multicoresiblings vs the mc-llc mode added by the patch. which ends up creating 2 run queues instead 1

Increased frame-rate in multiple opengl applications, its not alot maybe 2-4%, it makes sense since the latency penalty between cores that don't share an l2 cache, is quite high on this generation of cpu,(as high as a 50% increase according to this benchmark anyway https://github.com/ajakubek/core-latency)

I wonder how 5.3 kernel with the new "utiliza...

2019-09-23T00:24:49.041+10:00

I wonder how 5.3 kernel with the new "utilization clamping support" turns out vs. needing to use things like MuQSS/-ck patches?

Sounds like it WILL give a performance boost to things like gaming and its like tho..