Showing posts with label -ck. Show all posts
Showing posts with label -ck. Show all posts

Friday, 29 July 2016

BFS 472, linux-4.7-ck1

Announcing an updated BFS for linux-4.7 based kernels.

BFS by itself:
4.7-sched-bfs-472.patch

-ck branded linux-4.7-ck1 patches:
linux-4.7-ck1

This was quite a substantial merge effort this time around with a fair amount of changes in mainline kernel that affected the patch. Nonetheless everything appears to be working as planned in my limited testing. I'm unsure if the changes will fix the problems people had with suspend during the 4.6-bfs patches but the new code does touch that area. I was never affected on any of my machines so was unable to reproduce the problem in the first place.

In addition to the resync, a few minor changes have made their way into this release with respect to the way tasks preempt other tasks. See bfs470-updates.patch for details.

One other fairly significant change was properly hooking into the new schedutil parameters that drive cpufreq scaling governors. What I committed into bfs470 would not have been working properly in choosing the correct CPU frequency to run at and may have led to slowdowns and/or more power usage. This should be fixed in 472.

I should also mention that if, like me, you use the evil proprietary nvidia driver, the latest will not build with the current kernel and you'll need a couple of patches to get it working.

Enjoy!
お楽しみ下さい
-ck

EDIT: This patch will fix crashes when configured without SMT_NICE enabled:
bfs472-fix_set_task_cpu.patch
And will be applied to the next BFS release.

Wednesday, 8 June 2016

BFS 470, linux-4.6-ck1

Announcing an updated BFS for linux-4.6 based kernels.

BFS by itself:
4.6-sched-bfs-470.patch

-ck branded linux-4.6-ck1 patches:
linux-4.6-ck1

Resync to 4.6. You know the drill.


Enjoy!
お楽しみ下さい
-ck

Friday, 25 March 2016

BFS 469, linux-4.4-ck1, linux-4.5-ck1

Announcing an updated BFS for linux-4.4 and 4.5 based kernels.

BFS by itself:
4.5-sched-bfs-469.patch

-ck branded linux-4.5-ck1 patches:
linux-4.5-ck1



This is purely a resync of BFS 467 from 4.3-ck3 to the current kernels. The only change is extra documentation of the interactive tunable in the scheduler documentation, and a build warning fix for uniprocessor builds.


While linux-4.5 is the latest kernel, as I had been slow in syncing up and missed 4.4, and given that 4.4 is deemed a Long Term Stable release, I've provided resyncs with both. Version number differences of 467/469 are only due to syncing with different kernels and otherwise they are only trivially different.


The patches are fairly new without a great deal of testing, so the usual warnings apply, but given how long it took me to getting around to catching up, I didn't want to delay releasing them.


Enjoy!
お楽しみ下さい
-ck

Tuesday, 15 December 2015

BFS 467, linux-4.3-ck3

Announcing an updated BFS for linux-4.3 based kernels.

BFS by itself:
4.3-sched-bfs-467.patch

-ck branded linux-4.3-ck3 patches:
4.3-ck3

After my initial enthusiasm regarding the improved throughput on the previous BFS release, I unfortunately had some reports of regressions in interactive behaviour for the first time in a while, both on this forum and through other channels. So for the first time in a while, I've released a -ck3 with yet another updated BFS to address this since I can't stand having a dodgy release out for any extended period.

With this release what I've done is reinstate an old tunable that used to be on my scheduler patches many years ago

/proc/sys/kernel/interactive

This has two settings, 1 for on and 0 for off. By default - of course since this is BFS - it is set to 1. What it does is prioritise latency over throughput in mode 1 and vice versa in mode 0. In addition to addressing the latency issues in the previous kernel, mode 1 actually completely turns off all soft affinity scheduling in the kernel, for the lowest possible latencies all round, so this may be the first kernel with an improvement in latency in a while too.

Bear in mind none of these changes make any difference on uniprocessor kernels so there is no need for UP users to update unless they need the build fixes that came with BFS466+.

Amusingly enough, linux-4.3.2 wouldn't boot for me for unrelated reasons so I'm using 4.3.0-ck3 myself. So if you can't get 4.3.2 to boot, roll back.

Enjoy!
お楽しみください

Friday, 11 December 2015

BFS 466, linux-4.3-ck2

Announcing an updated BFS for linux-4.3 based kernels.

BFS by itself:
4.3-sched-bfs-466.patch

-ck branded linux-4.3-ck2 patches:
4.3-ck2

In addition to a build fix for the nohz compile issue with BFS 465, this is the first BFS in a very long time to have performance improvements. For some time now it's bugged me that tasks would have very poor affinity with CPUs on BFS, even if the performance was good. By this I mean if you fired up a fully CPU bound task and watched a CPU monitor/graph on a multicore machine, you'd see the one task would bounce around from CPU to CPU very frequently instead of occasionally. While this was great for latency purposes and interactivity, single threaded workloads would suffer as a result, and additionally it would represent a small amount of performance loss in multithreaded workloads too since CPU cache effects improving throughput would be diminished. Every time I'd previously tackled this issue, I found myself making some other workload worse.

After approximately 100 rebuilds of the kernel and benchmarking, I finally found where the problem lay, and it wasn't just trying to maintain bias against moving tasks from CPU to CPU, it was also that the code responsible is in the most frequently traversed code path in the schedule() call. Simplifying the code that biased against moving tasks in earliest_deadline_task, as well as calling on the bias for all tasks, not just fully CPU bound tasks, improved performance statistically significantly without detriment to latency or other workloads in my testing.

Apart from improving measurable throughput benchmarks, users may notice that some workloads that are single threaded (such as some video playback software, or even virtualisation with kvm etc.) may actually improve because of their ability to bind to one CPU better and not incur the wrath of being moved to a CPU speed throttled for power saving. Please give it a whorl and report back anything you find, positive or negative - though it should all be positive. If you have benchmarks you want to throw at it, even better.

EDIT: After the initial enthusiasm, it appears this DOES have a detrimental effect on interactivity so I will be looking for another change in the near future with yet another release.

Enjoy!
お楽しみください

Thursday, 12 November 2015

BFS 465, linux-4.3-ck1

Finally a resync to mainline linux, with linux-4.3.

BFS by itself:

4.3-sched-bfs-465.patch

-ck branded linux-4.3-ck1 patches:

4.3-ck1 patches

In addition to the usual collection of resyncs and minor updates, this includes 3 patches courtesy of Alfred Chen who maintained the fort while I was too busy to work on a resync for linux-4.2. (THANKS!) His changes fix a warning that happens on some system on startup, hopefully fix the long standing hang on heavy file access with truly unlocked block flush unplugging, and a build problem.

With such a large resync and update, the usual warnings apply regarding instability, file system corruption and unwanted impregnations.

EDIT: Build fix: bfs465-nohz-buildfix.patch

Enjoy!
お楽しみください

Sunday, 9 August 2015

BFS 464, linux-4.1-ck2

Here's an updated BFS/CK which includes the one test patch I put on this blog after 463 and another trivial fix for the previous release. The patch fixed a lot of regressions including hangs with BTRFS and panics on shutdown.

BFS by itself:

4.1-sched-bfs-464.patch

-ck branded linux-4.1-ck1 patches:

4.1-ck2 patches

Enjoy!
お楽しみください

Sunday, 2 August 2015

BFS 463, linux-4.1-ck1

Finally a resync to linux-4.1 . Sorry I was just too preoccupied to get around to doing this, and I haven't directly addressed a few known problems that have workarounds, and it comes with a warning.

BFS by itself:

4.1-sched-bfs-463.patch

-ck branded linux-4.0-ck1 patches:

4.1-ck1 patches

The usual collection of resyncs and minor updates including pending fixes post 462.

This includes a fix for some uniprocessor build problems courtesy of Serge Belyshev. If you still have boot problems with uniprocessor builds the workaround is to create an SMP kernel.

I've finally bit the bullet and removed the block flush code from within the main schedule() call, in keeping with how mainline does it. This is a problem that has recurred every time I've removed this change from previous kernels and had to re-add it every time. Complete hangs under particularly heavy IO used to be the problem, please report back if these come back with this kernel, hence the warning.

On the previous kernel, some had crashes unless they enabled NUMA. I have no idea what caused these and have done no specific changes to address it. I don't want people to enable NUMA unnecessarily but if you have crashes this is the first thing to try and please report back.

Enjoy!
お楽しみください

Thursday, 16 April 2015

BFS 462, linux-4.0-ck1

Announcing a resync and update of BFS for linux-4.0

BFS by itself:

4.0-sched-bfs-462.patch

-ck branded linux-4.0-ck1 patches:

4.0-ck1 patches

The usual collection of resyncs and minor updates only.

It includes the following changes:
- Minor tweaks to uniprocessor build (though enabling SMP will fix breakage if it still exists).
- Fix for tracing build failure
- SMT nice update to ignore kernel threads
- Decrease log level of locality information to debug

EDIT Fix for 4.0.2+: bfs462-rtmn-fix.patch

Enjoy!
お楽しみください

Friday, 27 February 2015

BFS 461, linux-3.19-ck1

Announcing a resync and update of BFS for linux-3.19

BFS by itself:

3.19-sched-bfs-461.patch

-ck branded linux-3.19-ck1 patches:

3.19-ck1 patches

Apart from a resync with mainline and merging of the pending patches that were around for BFS460, there are no new changes. Apologies if I've been unable to address any new issues posted here - as per usual lack of time is the reason. There are some pending changes to the scheduler for mainline (as pointed out by kernelOfTruth here: link) but they're not finalised so I won't be delaying this release to wait for them.

Enjoy!
お楽しみください

Thursday, 11 December 2014

BFS 460, linux-3.18-ck1

Announcing a resync and update of BFS for linux-3.18

BFS by itself:

3.18-sched-bfs-460.patch

-ck branded linux-3.18-ck1 patches:

3.18-ck1 patches

Uncharacteristically I found time to resync up quickly for this latest stable linux release. There are no new BFS features, but there have been a number of changes to stay in sync with mainline. Apart from keeping up with the usual churn in new releases, of which there was a modest amount this time, a number of other low level changes were committed making this much less of a trivial resync so some caution is warranted before blindly updating.

Hilf Danton pointed out a bug in the yield_to code (thanks!) which is now fixed. Since almost nothing uses this code you probably won't notice anything. He also pointed out some other now outdated components in BFS which are also updated. The above_background_load function has also been removed since the VM tweaks in older -cks no longer exist to use it. 

More substantially, I've reworked the plugged I/O code to match mainline now, which I had been reluctant to touch previously because of the deadlocks the unlocking and relocking in the scheduler code path introduced when the the first plugged I/O code made its way into BFS needing iterations of fixes - watch for any I/O misbehaviour/stalls. There are some changes to how mainline responds to idle CPUs so watch for any unusual behaviour there.

Having said that I've been using it for a while and not noticed anything out of the ordinary, but please report back if there are any issues.

Enjoy!
お楽しみください

Tuesday, 18 November 2014

BFS 458, linux-3.17-ck2

This is a bugfix release for the power usage regression as reported here with BFS 457.

BFS by itself:
3.17-sched-bfs-458.patch

CK branded BFS separate and combined patches:
3.17-ck2

Incremental change from BFS 457-458:
bfs457-458.patch

Enjoy!
お楽しみください

Tuesday, 11 November 2014

BFS 457, linux-3.17-ck1

Finally announcing a resync and minor update of BFS for the linux-3.17(.x) kernel releases.

Only minor updates have gone into this release apart from including one of the rework patches by Alfred Chen (thanks!) and the removal of the old KVM workaround that was no longer required with the bugfixes last release courtesy of Graysky (thanks!).

BFS by itself:
3.17-sched-bfs-457.patch

CK branded BFS separate and combined patches:
3.17-ck1

For those interested in the minor changes that made it up to 457, the incremental patches are available:
bfs457-incremental

Enjoy!
お楽しみください

Monday, 25 August 2014

BFS 453/454/455/456 and 3.16-ck2

Here is an updated set of BFS patches with the accumulated bugfixes as debugged on this blog for kernels 3.13 to 3.16 inclusive. The main obvious bug which affected people was the ath9k module which would hang on suspend/resume. However there were likely a number of subtle bugs across the board that most people would not be aware of and even I only noticed that kvm behaved much better after this applied bugfix which stretches back to every BFS after 3.12.

In order to make up for the fact that there are numerous kernels out there based on BFS across the different versions, I have updated BFS and numbered the versions according to which base kernel they are on. Note that there are no feature backports on the older kernels, only the bugfixes, so SMT nice is only on the 3.16 BFS.

3.13-sched-bfs-453.patch
3.14-sched-bfs-454.patch
3.15-sched-bfs-455.patch
3.16-sched-bfs-456.patch

And along with that an updated ck branded release for 3.16, 3.16-ck2:

3.16-ck2


Enjoy!
お楽しみ下さい

Saturday, 16 August 2014

BFS 450, 3.16-ck1

Announcing a resync and update of BFS for linux kernel 3.16.x. Coding has proven a nice distraction from unpleasant life events so I've been able to bring the patch up to date with the latest kernel.

A number of minor fixes as queued up post 3.15-ck1 made their way into this patchset, along with some changes inspired by the development work of Alfred Chen (thanks!).

The major feature upgrade in this one is the inclusion of SMT nice as discussed at length on this blog. This version of BFS includes an updated version of SMT nice beyond version 6 posted here with one change - 25% of the CPU time of any nice level of SCHED_NORMAL tasks can be shared with any other nice level over and above the nice-based CPU distribution. This is to capitalise on the slightly increased throughput that is available by using the sibling CPU concurrently without too dramatically affecting higher priority process CPU loss. In addition it dramatically reduces the massive latencies that can sometimes otherwise be seen by heavily niced tasks with SMT nice enabled by dithering the metering out of CPU instead of giving it all as a burst only when it's entitled to CPU.

Making SMT nice configurable means users can get to choose if they still want the standard behaviour. The config option will recommend users who enable the SMT scheduler option also enable the SMT nice option. I believe this to be a good default choice for virtually all desktop users, and selectively for server users if they depend heavily on the use of 'nice' or scheduling policies for their work cases (but otherwise it should be disabled).

BFS by itself:
3.16-sched-bfs-450.patch
3.16-ck1 branded BFS patchset directory:
3.16-ck1

EDIT: A build fix for non SMT enabled kernels to prevent it being possible to enable SMT nice is here:
bfs450-nosmt-buildfix.patch
Just disabling SMT nice will achieve the same thing for those affected.


Enjoy!
お楽しみください

Friday, 1 August 2014

SMT/Hyperthreading, nice and scheduling policies

The concept of symmetric multi-threading, which Intel called "Hyperthreading" and introduced into their commodity CPUs first around 2001, is not remotely a new one and goes back a long way before Intel introduced it into the mainstream market. I suspect the introduction of it back then by Intel was them easing the concept of increasing threads and cores for marketing reasons with the imminent walls they'd soon hit with CPU heat and power requirements that would stop the pursuit for higher and higher single CPU frequencies. The idea is that, since a lot of the CPU sits unused even when something is running as fast as it can on part of it, with a bit of extra logic and architecture, you could throw another "virtual core" at some of the unused execution units and behave like 2 (or more) CPUs, putting more of the CPU to good use. These days the vast majority of CPUs sold by Intel have hyperthreading on them, thus doubling the virtual or "logical" cores the CPU has, including even their low power atom offerings.

There have been numerous benchmarks, in-field tests, workloads etc., where people have tried to find whether hyperthreading is better or not. With a bit of knowledge of the workings of hyperthreading, it's pretty easy to know what the answer is, and not surprisingly, it's the frustrating answer of "it depends". And that's the most accurate answer by far, but I'd go further than that and say that if you have any kind of mixed workload, hyperthreading is always going to be better, whereas if you have precisely one workload , then you have to define exactly how it's going to work and whether hyperthreading will be better or not. Which means that in my opinion at least, hyperthreading is advantageous on a desktop, laptop, tablet and even phone since by design they're nothing but mixed workloads. I won't spend much longer on this discussion, but suffice to say that I think about 4 threads (at the moment) is about optimal for most real world desktop(y) workloads.

Imagine for a moment you have a single core CPU which you can run as is, or enable hyperthreading to run as a 2 thread CPU. If you were to run your CPU in single core only mode, then when you run one task at a time it will always use the full power of the CPU, but if you run two tasks, each task runs at 50% the speed and completes in double the time. If you enable hyperthreading, then if you have two mixed workloads that actually use different parts of the CPU, you can actually get effectively (at best) about 140% of the performance of running the CPU in single core mode. This means that instead of the two tasks running at 50% speed when run concurrently, they run at 70% speed. In practice, the actual performance benefit is rarely 40% but it is often on the order of 25%, so each task tends to run about 60% speed instead of 50% speed. Still a nice speedup for "free".

One thing has always troubled me about hyperthreading, though, and that is the way it tends to break priority support in the scheduler. By priority support, I refer to the use of 'nice' and other scheduling policies, such as realtime, sched idleprio etc.

If you have a single core CPU and run a nice 0 task concurrently with a nice +19 task, the nice 0 task will get about 98% of the CPU time and the nice +19 task only about 2%. The scheduler does this by serialising and metering out the time each task gets to spend on the CPU. Now if you enable hyperthreading on that CPU, the scheduler no longer serialises access to the CPU, but gives each of those tasks one logical "core" on the CPU, and you get an overall 25% increase in throughput. However of the total throughput, both the nice 0 and nice +19 task get precisely half. This would be fine if we had two real cores, but they're not, and the performance of both tasks is sacrificed to ~60% to achieve this. Which means that for this contrived but simple example, enabling hyperthreading slows down the overall execution speed of your nice 0 task when you run a nice +19 task much more than on a single core - it runs at 60% speed instead of 98%.

An even more dramatic example is what happens with realtime tasks, which these days most audio backends on linux use (usually through pulseaudio). Running a realtime task concurrently with a SCHED_NORMAL nice 0 task on a single core means the realtime task will get 100% CPU and the nice 0 task will get zero CPU time. Enable hyperthreading and suddenly the realtime task only runs at 60% of its normal speed even with a heavily niced +19 task running in the background.

Enter SMT-nice as I call it. This is not a new idea, and in fact my first iteration of it was for mainline 10(!) years ago. See here: SMT Nice 2.6.4-rc1-mm1

I actually had the patch removed myself from mainline for criticism regarding throughput reasons, though I still argue that worrying about the last percentage points of throughput are not relevant if you break a mechanism as valuable as nice and scheduling policies, but I had lost the energy for defending it which is why I pushed it be removed myself. Note that although throughput overall may be slightly decreased, the throughput of higher priority tasks is not only fairer with respect to low priority tasks, but enhanced because the low priority tasks will have less cache trashing effects.

What this does is it examines all hyperthread "siblings" to see what is running on them, and then decides whether the currently running or next running task should actually have access to the sibling or allow the sibling to go idle completely, allowing a higher priority task to have the actual true core and all its execution units to itself. I'd been meaning to create an equivalent patch for BFS for the longest time but CPUs got faster, cheaper, more cores, I got lazy etc... though I recently found more enthusiasm for hacking.

So here is a reincarnation of the SMT-nice concept for BFS, improved to work across multiple scheduling policies from realtime, iso down to idleprio, and I've made it a compile time option in case people feel they don't wish to sacrifice any throughput:

Patch for BFS449 with pending patches:
bfs449-smtnice-2.patch

And to make life easy, here's an all inclusive ck1+pending+smtnice patch:
3.15-ck1-smtnice2.patch

The TL;DR is: On Intel hyperthreaded CPUs, 'nice', realtime and sched idleprio works better, and background tasks interfere much less with the foreground tasks. Note: This patch does nothing if you don't have a hyperthreaded CPU.

If you wish to do testing to see how this works, try running with and without the patch and running two benchmarks concurrently, one at nice 0 and one at nice +19 (such as 'make -j2' on one kernel and 'nice -19 make -j2' on another kernel on a machine with 2 cores/4 threads) and compare times. Or run some jackd benchmarks of your choice to see  what it takes to get xruns etc.

This patch will almost certainly make its way into the next BFS in some form.

EDIT: It seems people have missed the point of this patch. It improves the performance of foreground applications at the expense of background ones. So your desktop/gui/applications will remain fast even if you run folding@home, mprime, seti@home etc., but those background tasks will slow down more. If you don't want it doing that, disable it in your build config.

---
Enjoy!
お楽しみ下さい
-ck

Thursday, 3 July 2014

BFS 0.448, 3.15-ck1

Announcing a resync and update of BFS for linux kernel 3.15.x. I'm currently on vacation but fortunately had enough downtime to hack this together in the evenings and pinged a few people to do some testing for me before releasing it since I only have my laptop with me and could not do the usual set of build and run tests on multiple configurations (thanks!).
 This is basically a resync of the last BFS along with trivial changes to stay in sync with the mainline kernel, along with some of the queued build fixes submitted by others on this blog (thanks!). Alas the users of ath9k with Tux On Ice that I pinged early on with a test patch have shown the same issue exists (which is not surprising since BFS has only been trivially changed in quite a few releases now) so I'm pretty sure whatever the interaction is was introduced somewhere between 3.13 and 3.14.
I have reviewed Alfred Chen's patches and for the time being have not included them in BFS, though I do like the direction his changes have taken. The first patch sets a flag that isn't used by BFS so it was not necessary. The other changes to resched_best_mask are sound and the only thing they're missing is an equivalent optimisation for compiled in support for MC and SMT schedulers on hardware that doesn't have one and/or the other.
 So here it is:

BFS by itself:
3.15-sched-bfs-448.patch

3.15-ck1 patchset directory:
3.15-ck1

Enjoy!
お楽しみください

Tuesday, 6 May 2014

BFS 0.447, 3.14-ck1

Announcing a resync and update of BFS for linux kernel 3.14.x:

This is mainly a resync from BFS 0.446, but with the addition of the patches as offered by the generous users as seen in the comments here, Alfred Chen and Oleksandr Natalenko. The changes are to fix a circular locking issue on bootup that rarely hit some people, a fix for kvm soft lockups in SMP mode, and to remove some config options that should not be used with BFS.
What's interesting about working on this latest BFS is that I ran into all sorts of instability due to the new kernel that ironically worked out to be a very serious bug in 3.14.0 and was fixed in 3.14.1 with this patch:
 
commit 8e58cd80d042569da7af501de897c5e0538d99b0
futex: avoid race between requeue and wake
As is often the case, BFS is exceptional at bringing out race conditions and my machine was almost unusable with any significantly multithreaded application such as firefox which kept hanging. This was a scenario where my delay at syncing up the code worked to my advantage as 3.14.2 is working fine.
So here it is:
BFS by itself:
3.14-sched-bfs-447.patch

CK branded BFS:
3.14-ck1

Somehow I still forgot to include PF's patch for uniprocessor builds, though it's so uncommon to come across a uniprocessor these days! His patch is still valid and can be grabbed here to be applied on top if you need it:

0001-ck-3.12-fix-BFS-compiling-with-CONFIG_SMP-n.patch


Enjoy!
お楽しみください

Monday, 3 March 2014

BFS 0.446, 3.13-ck1

Announcing a resync and update of BFS for linux kernel 3.13.x:

Apart from build fixes and synchronisation with new kernel changes, this is only trivially different to BFS 444. A build failure on 445, along with a desire to release only even numbers, prompted version 446.

BFS by itself:
3.13-sched-bfs-446.patch

CK branded BFS:
3.13-ck1


Apologies for the delay, but I simply swamped with my other projects, interests and work.


Enjoy!
お楽しみください

Tuesday, 3 December 2013

3.12-ck2, BFS 0.444

Here is an updated BFS patch, version 0.444:

3.12-sched-bfs-444.patch

And an updated ck tagged 3.12-ck2 patch:

3.12-ck2

The changes in this release, compared to version 0.443 and ck1 are the 2 extra patches I posted in my last announce which were designed to address various suspend to ram/disk and resume problems as discussed in previous posts. Thanks to the various people who posted bug reports and tested experimental patches along the way.

Being an even number, this is clearly a more stable patch than the last one ;)

Enjoy!
お楽しみください