Thursday, 30 December 2010

BFS for 2.6.37-rc8

So we approach yet another "stable" release with 2.6.37 just around the corner now that -pre8 is out (yes I'm old fashioned, it is still a pre-release and not a release candidate in my eyes). I usually use pre8 as the flag to port BFS to the new kernel so that I have it ready in time for the "stable" 3 point release.

I've ported the current BFS release v0.360 to the latest kernel. Of significance in the new kernel are some changes yet again to the way CPU offline code occurs (which happens just before suspend to ram and disk), some new way to improve accounting of CPU attribution during interrupt handling, and improving the reporting of CPU load during idle periods with NOHZ enabled. There are also some other architectural changes that have no cosmetic effect. None of these will have any effect on "performance" for the end user so don't expect any surprises there.

So I've ported BFS with only the changes necessary to get it working on the new kernel. I've not yet added the support for interrupt accounting, and I haven't changed the way total load is calculated on NOHZ configured kernels. The only change of any significance is the way I offline CPUs in the new BFS. I have been fighting with that for a while now since there really is no right way to do it, and it changes so frequently in mainline that I often have trouble keeping up.

For those already on version 0.360 of BFS, the only significant changes can be had now in this patch:

For the dirty details in the CPU offline code, what happens is that a worker thread runs at ultra high priority on the CPU it's about to offline, and half way through its work it turns off that CPU and then needs to run on another CPU to die itself. Till now, BFS has looked for tasks that had affinity for CPUs that could no longer run on any alive CPUs and then would be happy to run them anywhere. With this latest bfs360-test.patch it does more what the mainline kernel does and formally breaks affinity for tasks that no longer have anywhere they're allowed to run. It only has a negligible improvement in overhead but the main reason for doing it is to make the offline code more robust.

For those looking for the first BFS release on 2.6.37-rc8, here's the test patch:

I've bumped the version number up simply because it includes the test changes above, and has had other architectural changes to be in sync with the mainline kernel. The main thing new in mainline is that there is a new "class" of realtime scheduling used only internally by the CPU offlining code called a stop class. Instead of using a new scheduling class, BFS simply adds one more realtime priority that can be used by kernel threads and is only ever applied to the CPU offline thread (kmigration). I did this to make it add zero overhead to the existing design, yet support the concept that it is a thread that nothing else can preempt.

Almost certainly I will be adding more code to the final version of BFS that I use when 2.6.37 comes out, to support the new interrupt accounting code and the global load reported, but these will only have cosmetic effects. This patch has only been lightly tested and not compile tested for multiple configurations just yet, but seems to be working nicely. For now you can grab it here:

Happy New Year everyone :)


  1. Thanks for this new version! And Happy New Year to you too!

  2. Thanks for this new version and thx for bfs in general! And Happy New Year too!

  3. Thanks for continued support in BFS (and -ck). And a Happy New Year!

    PS: I test at the moment the BFS on RC8. Question about make configuration. Shouldn't be the cgroups be disabled if BFS is enabled, because BFS doesn't support (at the moment ;) ) cgroups?

    CU sysitos

  4. The cgroup features that aren't available with BFS will not appear when BFS is enabled. The rest of the cgroup features still work.

  5. Thanks and a Happy New Year!