Showing posts with label -ck. Show all posts
Showing posts with label -ck. Show all posts

Monday, 25 August 2014

BFS 453/454/455/456 and 3.16-ck2

Here is an updated set of BFS patches with the accumulated bugfixes as debugged on this blog for kernels 3.13 to 3.16 inclusive. The main obvious bug which affected people was the ath9k module which would hang on suspend/resume. However there were likely a number of subtle bugs across the board that most people would not be aware of and even I only noticed that kvm behaved much better after this applied bugfix which stretches back to every BFS after 3.12.

In order to make up for the fact that there are numerous kernels out there based on BFS across the different versions, I have updated BFS and numbered the versions according to which base kernel they are on. Note that there are no feature backports on the older kernels, only the bugfixes, so SMT nice is only on the 3.16 BFS.


And along with that an updated ck branded release for 3.16, 3.16-ck2:



Saturday, 16 August 2014

BFS 450, 3.16-ck1

Announcing a resync and update of BFS for linux kernel 3.16.x. Coding has proven a nice distraction from unpleasant life events so I've been able to bring the patch up to date with the latest kernel.

A number of minor fixes as queued up post 3.15-ck1 made their way into this patchset, along with some changes inspired by the development work of Alfred Chen (thanks!).

The major feature upgrade in this one is the inclusion of SMT nice as discussed at length on this blog. This version of BFS includes an updated version of SMT nice beyond version 6 posted here with one change - 25% of the CPU time of any nice level of SCHED_NORMAL tasks can be shared with any other nice level over and above the nice-based CPU distribution. This is to capitalise on the slightly increased throughput that is available by using the sibling CPU concurrently without too dramatically affecting higher priority process CPU loss. In addition it dramatically reduces the massive latencies that can sometimes otherwise be seen by heavily niced tasks with SMT nice enabled by dithering the metering out of CPU instead of giving it all as a burst only when it's entitled to CPU.

Making SMT nice configurable means users can get to choose if they still want the standard behaviour. The config option will recommend users who enable the SMT scheduler option also enable the SMT nice option. I believe this to be a good default choice for virtually all desktop users, and selectively for server users if they depend heavily on the use of 'nice' or scheduling policies for their work cases (but otherwise it should be disabled).

BFS by itself:
3.16-ck1 branded BFS patchset directory:

EDIT: A build fix for non SMT enabled kernels to prevent it being possible to enable SMT nice is here:
Just disabling SMT nice will achieve the same thing for those affected.


Friday, 1 August 2014

SMT/Hyperthreading, nice and scheduling policies

The concept of symmetric multi-threading, which Intel called "Hyperthreading" and introduced into their commodity CPUs first around 2001, is not remotely a new one and goes back a long way before Intel introduced it into the mainstream market. I suspect the introduction of it back then by Intel was them easing the concept of increasing threads and cores for marketing reasons with the imminent walls they'd soon hit with CPU heat and power requirements that would stop the pursuit for higher and higher single CPU frequencies. The idea is that, since a lot of the CPU sits unused even when something is running as fast as it can on part of it, with a bit of extra logic and architecture, you could throw another "virtual core" at some of the unused execution units and behave like 2 (or more) CPUs, putting more of the CPU to good use. These days the vast majority of CPUs sold by Intel have hyperthreading on them, thus doubling the virtual or "logical" cores the CPU has, including even their low power atom offerings.

There have been numerous benchmarks, in-field tests, workloads etc., where people have tried to find whether hyperthreading is better or not. With a bit of knowledge of the workings of hyperthreading, it's pretty easy to know what the answer is, and not surprisingly, it's the frustrating answer of "it depends". And that's the most accurate answer by far, but I'd go further than that and say that if you have any kind of mixed workload, hyperthreading is always going to be better, whereas if you have precisely one workload , then you have to define exactly how it's going to work and whether hyperthreading will be better or not. Which means that in my opinion at least, hyperthreading is advantageous on a desktop, laptop, tablet and even phone since by design they're nothing but mixed workloads. I won't spend much longer on this discussion, but suffice to say that I think about 4 threads (at the moment) is about optimal for most real world desktop(y) workloads.

Imagine for a moment you have a single core CPU which you can run as is, or enable hyperthreading to run as a 2 thread CPU. If you were to run your CPU in single core only mode, then when you run one task at a time it will always use the full power of the CPU, but if you run two tasks, each task runs at 50% the speed and completes in double the time. If you enable hyperthreading, then if you have two mixed workloads that actually use different parts of the CPU, you can actually get effectively (at best) about 140% of the performance of running the CPU in single core mode. This means that instead of the two tasks running at 50% speed when run concurrently, they run at 70% speed. In practice, the actual performance benefit is rarely 40% but it is often on the order of 25%, so each task tends to run about 60% speed instead of 50% speed. Still a nice speedup for "free".

One thing has always troubled me about hyperthreading, though, and that is the way it tends to break priority support in the scheduler. By priority support, I refer to the use of 'nice' and other scheduling policies, such as realtime, sched idleprio etc.

If you have a single core CPU and run a nice 0 task concurrently with a nice +19 task, the nice 0 task will get about 98% of the CPU time and the nice +19 task only about 2%. The scheduler does this by serialising and metering out the time each task gets to spend on the CPU. Now if you enable hyperthreading on that CPU, the scheduler no longer serialises access to the CPU, but gives each of those tasks one logical "core" on the CPU, and you get an overall 25% increase in throughput. However of the total throughput, both the nice 0 and nice +19 task get precisely half. This would be fine if we had two real cores, but they're not, and the performance of both tasks is sacrificed to ~60% to achieve this. Which means that for this contrived but simple example, enabling hyperthreading slows down the overall execution speed of your nice 0 task when you run a nice +19 task much more than on a single core - it runs at 60% speed instead of 98%.

An even more dramatic example is what happens with realtime tasks, which these days most audio backends on linux use (usually through pulseaudio). Running a realtime task concurrently with a SCHED_NORMAL nice 0 task on a single core means the realtime task will get 100% CPU and the nice 0 task will get zero CPU time. Enable hyperthreading and suddenly the realtime task only runs at 60% of its normal speed even with a heavily niced +19 task running in the background.

Enter SMT-nice as I call it. This is not a new idea, and in fact my first iteration of it was for mainline 10(!) years ago. See here: SMT Nice 2.6.4-rc1-mm1

I actually had the patch removed myself from mainline for criticism regarding throughput reasons, though I still argue that worrying about the last percentage points of throughput are not relevant if you break a mechanism as valuable as nice and scheduling policies, but I had lost the energy for defending it which is why I pushed it be removed myself. Note that although throughput overall may be slightly decreased, the throughput of higher priority tasks is not only fairer with respect to low priority tasks, but enhanced because the low priority tasks will have less cache trashing effects.

What this does is it examines all hyperthread "siblings" to see what is running on them, and then decides whether the currently running or next running task should actually have access to the sibling or allow the sibling to go idle completely, allowing a higher priority task to have the actual true core and all its execution units to itself. I'd been meaning to create an equivalent patch for BFS for the longest time but CPUs got faster, cheaper, more cores, I got lazy etc... though I recently found more enthusiasm for hacking.

So here is a reincarnation of the SMT-nice concept for BFS, improved to work across multiple scheduling policies from realtime, iso down to idleprio, and I've made it a compile time option in case people feel they don't wish to sacrifice any throughput:

Patch for BFS449 with pending patches:

And to make life easy, here's an all inclusive ck1+pending+smtnice patch:

The TL;DR is: On Intel hyperthreaded CPUs, 'nice', realtime and sched idleprio works better, and background tasks interfere much less with the foreground tasks. Note: This patch does nothing if you don't have a hyperthreaded CPU.

If you wish to do testing to see how this works, try running with and without the patch and running two benchmarks concurrently, one at nice 0 and one at nice +19 (such as 'make -j2' on one kernel and 'nice -19 make -j2' on another kernel on a machine with 2 cores/4 threads) and compare times. Or run some jackd benchmarks of your choice to see  what it takes to get xruns etc.

This patch will almost certainly make its way into the next BFS in some form.

EDIT: It seems people have missed the point of this patch. It improves the performance of foreground applications at the expense of background ones. So your desktop/gui/applications will remain fast even if you run folding@home, mprime, seti@home etc., but those background tasks will slow down more. If you don't want it doing that, disable it in your build config.


Thursday, 3 July 2014

BFS 0.448, 3.15-ck1

Announcing a resync and update of BFS for linux kernel 3.15.x. I'm currently on vacation but fortunately had enough downtime to hack this together in the evenings and pinged a few people to do some testing for me before releasing it since I only have my laptop with me and could not do the usual set of build and run tests on multiple configurations (thanks!).
 This is basically a resync of the last BFS along with trivial changes to stay in sync with the mainline kernel, along with some of the queued build fixes submitted by others on this blog (thanks!). Alas the users of ath9k with Tux On Ice that I pinged early on with a test patch have shown the same issue exists (which is not surprising since BFS has only been trivially changed in quite a few releases now) so I'm pretty sure whatever the interaction is was introduced somewhere between 3.13 and 3.14.
I have reviewed Alfred Chen's patches and for the time being have not included them in BFS, though I do like the direction his changes have taken. The first patch sets a flag that isn't used by BFS so it was not necessary. The other changes to resched_best_mask are sound and the only thing they're missing is an equivalent optimisation for compiled in support for MC and SMT schedulers on hardware that doesn't have one and/or the other.
 So here it is:

BFS by itself:

3.15-ck1 patchset directory:


Tuesday, 6 May 2014

BFS 0.447, 3.14-ck1

Announcing a resync and update of BFS for linux kernel 3.14.x:

This is mainly a resync from BFS 0.446, but with the addition of the patches as offered by the generous users as seen in the comments here, Alfred Chen and Oleksandr Natalenko. The changes are to fix a circular locking issue on bootup that rarely hit some people, a fix for kvm soft lockups in SMP mode, and to remove some config options that should not be used with BFS.
What's interesting about working on this latest BFS is that I ran into all sorts of instability due to the new kernel that ironically worked out to be a very serious bug in 3.14.0 and was fixed in 3.14.1 with this patch:
commit 8e58cd80d042569da7af501de897c5e0538d99b0
futex: avoid race between requeue and wake
As is often the case, BFS is exceptional at bringing out race conditions and my machine was almost unusable with any significantly multithreaded application such as firefox which kept hanging. This was a scenario where my delay at syncing up the code worked to my advantage as 3.14.2 is working fine.
So here it is:
BFS by itself:

CK branded BFS:

Somehow I still forgot to include PF's patch for uniprocessor builds, though it's so uncommon to come across a uniprocessor these days! His patch is still valid and can be grabbed here to be applied on top if you need it:



Monday, 3 March 2014

BFS 0.446, 3.13-ck1

Announcing a resync and update of BFS for linux kernel 3.13.x:

Apart from build fixes and synchronisation with new kernel changes, this is only trivially different to BFS 444. A build failure on 445, along with a desire to release only even numbers, prompted version 446.

BFS by itself:

CK branded BFS:

Apologies for the delay, but I simply swamped with my other projects, interests and work.


Tuesday, 3 December 2013

3.12-ck2, BFS 0.444

Here is an updated BFS patch, version 0.444:


And an updated ck tagged 3.12-ck2 patch:


The changes in this release, compared to version 0.443 and ck1 are the 2 extra patches I posted in my last announce which were designed to address various suspend to ram/disk and resume problems as discussed in previous posts. Thanks to the various people who posted bug reports and tested experimental patches along the way.

Being an even number, this is clearly a more stable patch than the last one ;)


Monday, 18 November 2013

BFS 0.443, 3.12-ck1

Announcing a resync and update of the BFS CPU scheduler for linux-3.12

BFS by itself:

CK branded BFS:

Apologies for the delays. I've been swamped by other projects (o.k. I lie, mainly just cgminer). The changes in this new version, apart from the obvious resync with mainline, are some timing fixes courtesy of Olivier Langlois (Thanks!) and a concerted effort to make suspend to RAM/resume work properly.


Monday, 9 September 2013

BFS 0.441, 3.11-ck1

Announcing a resync and update of the BFS CPU scheduler for linux-3.11

BFS by itself:

Full -ck1 patchset including separate patches:

Apart from the usual resync to keep up with the mainline churn, there are a few additions from BFS 0.440. A number of changes dealing with wake lists as done by mainline were added that were missing from the previous code. There is a good chance that these were responsible for a large proportion of the suspend/resume issues people were having with BFS post linux 3.8. Of course I can't guarantee that all issues have been resolved, but it has been far more stable in my testing so far.

The other significant change is to check for throttled CPUs when choosing an idle CPU to move a process to, which should impact the behaviour and possibly throughput when using a scaling CPU governor, such as ondemand.

Those of you still using the evil proprietary Nvidia binary driver (as I still do) will encounter some issues and will need to use a patched pre-release driver from them if you build it yourself, until they release a new driver.

That is all for now.


Wednesday, 10 July 2013

BFS 0.440, -ck1 for linux-3.10

I finally managed to set up some 3g wireless internet in this remote mountain village I'm staying in (probably the first to ever do so). After a few revisions I was able to bring BFS into line with mainline. There are no significant changes to the design itself, but hopefully a few minor fixes have come along as a result of the resync as I also carved out bits of code not relevant to BFS and tinkered with the shutdown mechanism a bit more. As for the new tickless on busy CPU feature from mainline, it is not being offered in BFS as it is quite orthogonal to a design that so easily moves tasks from one CPU to another, and it provides no advantage for desktop/laptop/tablet/PDA/mobile device/phone/router etc. which BFS is targeted towards.

Some of the configuration code was also changed since the last version allowed you to generate an invalid configuration. You might get some strange warnings about the IRQ TIME ACCOUNTING configuration option but it should be harmless.

Get BFS by itself for 3.10.0 here:

 After careful consideration, I've decided to remove the remaining -ck patches and just make the -ck patchset BFS with some extra default config options and the -ck tag. As I've said previously, those other patches were from long ago, the kernel has changed a lot since then, and I've been unable to confirm they do anything useful any more, whereas there have been reports of regressions with them.

Get the -ck tagged patchset here:


Tuesday, 7 May 2013

BFS 0.430, -ck1 for linux-3.9.x

Announcing a resync/update of the BFS and -ck patchsets for linux-3.9

Full ck patch:

BFS only patch:

The full set of incremental patches is here:

The changes to BFS include a resync from BFS 0.428, updated to work with changes from the latest mainline kernel, and numerous CPU accounting improvements courtesy of Olivier Langlois (thanks again!).

For those who tried the -ck1 release candidate patch I posted, this patch is unchanged. The only issue that showed up was a mostly cosmetic quirk with not being able to change the CPU accounting type, even though it appears you should be able to. BFS mandates high res IRQ accounting so there is no point trying to change it.

Lately my VPS provider (rapidxen) has been nothing short of appalling with incredible amounts of downtime, packet loss and IP changes without notification. They also repeatedly send me abuse complaints that  I have to respond to for my software being (falsely) tagged as viruses. Luckily I have a move planned in the near future - including where and how - when time permits, but if you find my server doesn't respond, apologies.


EDIT: There were some fairly dramatic CPU offline code changes to mainline (YET AGAIN!) and the changes to BFS to make it work were fairly significant so there may once again be issues with power off/reboot/suspend/hibernate. It gets tiresome watching the same code being rehashed in many different ways... "because this time we'll do it right".

Monday, 4 March 2013

BFS 0.428 for linux-3.8.x

Announcing a resync of the BFS and -ck patchsets for linux-3.8

Full ck patch:

BFS only patch:

The full set of incremental patches is here:

The only changes to BFS include a resync from BFS 0.427, and a micro-optimisation to the CPU accounting courtesy of Olivier Langlois (thanks!). See the incremental patch for details.

As for the -ck patchset, I am dropping the patches that no longer seem to reliably work that set sysctl values since distributions seem to change them, along with removing patches of dubious utility.


Saturday, 15 December 2012

3.7-ck1, BFS 426 for linux-3.7

Some degree of normality has returned to my life, so I bring to you a resync of the BFS cpu scheduler for 3.7, along with the -ck patches to date.

Apply to 3.7.x:

Broken out tarball:

Discrete patches:

Latest BFS by itself:

People often ask me why I don't maintain a git tree of my patches or at least BFS and make it easier on myself and those who download it. As it turns out, it is actually less work only for those who download it to have a git tree and would actually be more work for me to maintain a git tree.

While I'm sure most people are shaking their head and thinking I'm just some kind of git-phobe, I'll try to explain (Note that I maintain git trees for lrzip and cgminer

I do NOT keep track of the linux kernel patches as they come in during the development phase prior to the latest stable release. Unfortunately I simply do not have the time nor the inclination to care on that level any more about linux kernel. However I still do believe quite a lot in what BFS has to offer. If I watched each patch as it came into git, I could simply keep my fork with BFS and merge the linux kernel patches as they came in, resyncing and modifying as it went along with the changes. When new patches go into the kernel, there is a common pattern of many changes occurring shortly after they're merged, with a few fixes going in, some files being moved around a few times, and occasionally the patch backed out when it's found the patch introduces some nasty regression that proves a showstopper to it being released. Each one of these changes - fixes, moves, renames, removal, require a resync if you are maintaining a fork.

The way I've coded up the actual BFS patch itself is to be as unobtrusive as possible - it does not actually replace large chunks of code en bloc, just adding files and redirecting builds to use those new files instead of the mainline files. This is done to minimise how much effort it is to resync when new changes come. The vast majority of the time, only trivial changes need to be made for the patch to even apply. Thus applying an old patch to a new kernel just needs fixes to apply (even if it doesn't build). This is usually the first step I do in syncing BFS, and I end up with something like this after fixing the rejects:

This patch is only the 3.6 patch fixing any chunks that don't apply.

After that, I go through the incremental changes from mainline 3.6 to 3.7 to see any scheduler related changes that should be applied to BFS to 1. make it build with API changes in mainline and 2. benefit from any new features going into mainline that are relevant to the scheduler in general. I manually add the changes and end up with an incremental patch like this:
This patch is only merging 3.6->3.7 changes into BFS itself

Finally I actually apply any new changes to BFS since the last major release, bugfixes or improvements as the case may be, as per this patch here:

Git is an excellent source control tool, but provides me with almost nothing for this sort of process where a patch is synced up after 3 months of development. If I were to have my fork and then start merging all patches between 3.6 and 3.7, it would fail to merge new changes probably dozens and potentially hundreds of times along the way, each requiring manual correction. While merge conflicts are just as easy to resolve with git as they are with patch, they aren't actually easier, and instead of there being conflicts precisely once in the development process, there are likely many with this approach.

However git also does not provide me with any way to port new changes from mainline to the BFS patch itself. They still need to be applied manually, and if changes occur along the way between 3.6 stable through 3.7-rc unstable to 3.7 stable, each time a change occurs to mainline, the change needs to be done to BFS. Thus I end up reproducing all the bugfixes, moves, renames and back-outs that mainline does along the way, instead of just doing it once.

Hopefully this gives some insight into the process and why git is actually counter-productive to BFS syncing.

Enjoy 3.7 BFS.

Thursday, 16 August 2012

3.5-ck1, BFS 424 for linux-3.5

Thanks to those who have been providing interim patches porting BFS to linux 3.5 while I've been busy! Finally I found some downtime from my current coding contract work to port BFS and -ck to linux 3.5, and here is the announce below:
These are patches designed to improve system responsiveness and
interactivity with specific emphasis on the desktop, but suitable to
any commodity hardware workload.

Apply to 3.5.x:

Broken out tarball:

Discrete patches:

Latest BFS by itself:


Code blog when I feel like it:

This is a resync from 3.4-ck3. However, the broken out tarballs above also 
include the upgradeable rwlocks patch, and a modification of the global 
runqueue in BFS to use the urwlocks. These are NOT applied in the -ck1 patch, 
but can be applied manually at the  end of the series as indicated by the 
series file. It is currently of no demonstrable performance advantage OR 
detriment in its current state, but is code for future development.



Tuesday, 31 July 2012

BFS and -ck delays for linux-3.5.0

Once again I find myself writing a post saying there will be delays with the resync of BFS and -ck for the new linux kernel. This time the reason for most people would be a quite unexpected development. As you may have read on this blog last year, I got invited to interview with Google for a job as a software engineer and then in the end I got turned down due to lack of adequate breadth of knowledge. This was probably for the best for me anyway since I have a full time unrelated career and the jump would have been too great. Anyway a small company noticed the work I had done on cgminer with bitcoin and openCL work and asked if I was interested in writing some software for them. The work involves writing openCL frameworks so they can provide distributed computing capability to clients. They were quire happy to forego any of the regular interview details or pretty much anything that is normally involved in employing someone and before long we started talking contracts instead. Since the work itself actually looked like a lot of fun, I decided to go with the opportunity.

Anyway, long story short, I'm doing a little bit of contract work for them and my kernel work will take a slightly lower  priority in the meantime. I'm not abandoning it, but it will be delayed some more before the next release. Apologies for any inconvenience this may cause in the interim.

Tuesday, 3 July 2012

BFS 424, linux-3.4-ck3

As seen on this blog previously, a bug showed up in 3.4-ck2/BFS 423 to do with unplugged I/O management that would lead to severe stalls/hangs. I'm releasing BFS 424 officially and upgrading 3.4-ck2 to 3.4-ck3, incorporating just this one change.

BFS 424:


Others on -ck2 can simply apply the incremental patch to be up to date.


Sunday, 1 July 2012

BFS 424 test

A couple of bug reports mostly related to disk I/O seem to have cropped up with BFS 423/3.4-ck2. The likely culprit seems to be the plugged I/O management within schedule() that I modified going from BFS 420 to BFS 423, when I adopted mainline's approach to managing the plugged I/O. It appears that the mechanism I had put in place for BFS was the correct one, and mainline's approach does not work (for BFS) so I've backed out that change and increased the version number. Here is the test patch:


Those with issues of any sort related to BFS 423 or ck2, please test this patch on top of the previous BFS patched kernel. Thanks!

Monday, 11 June 2012

bfs 0.423, 3.4-ck2

A couple of issues showed up with BFS 0.422, one being the "0 load" bug and the other being a build issue on non-hotplug releases. So here is BFS 0.423 and 3.4-ck2 (which is just ck1 with the BFS update) which should fix those:



and the increment only:



Saturday, 2 June 2012

BFS 0.422, 3.4.0-ck1

Announcing the release of BFS for 3.4, along with the complete -ck1 patch.

BFS alone:

Full 3.4-ck1 patches:

Alas I was unable to keep the 420 number for BFS due to a number of minor changes. I also incremented the number beyond the unofficial 421 patch put to lkml so there was no confusion. The only changes are that some trivial display accounting fixes were added, along with forcing SLUB in the config by default as other SLAB allocators crash with BFS (you should all be using SLUB anyway). The rest of the BFS changes are a resync with the new code going into linux 3.4, along with more merging of code from mainline into BFS where suitable. Note that I have adopted the mainline approach of dealing with unplugged I/O. Previously I had spent a lot of time making it work with BFS for those who remember that period of instability, so hopefully the mainline approach will work seamlessly now (since mainline ended up having the same bug but it was harder to reproduce).

3.4-ck1 is just a resync of the remainder of the patches from 3.3-ck1.


EDIT: If you build on SMP without enabling CPU hotplug you will need this patch on top for BFS to build:

Saturday, 24 March 2012


New -ck version for the latest mainline linux kernel, 3.3.0:


Changes since 3.2.0-ck1:

New BFS version 0.420 AKA smoking as discussed here:

Includes one build bugfix for UP compared to that first release candidate.

Other changes:
These patches have been dropped:

The Virtual Memory subsystem has changed so much it's hard to know if these patches do what they originally intended to do, nor if they are helpful any more. In the absence of being able to test their validity, it seems safer to just drop them.

The rest of the patchset is just a resync.