Friday 8 October 2010

Hierarchical tree-based penalty, interactivity at massive load, updated.

I've updated the patch for fork_depth based penalty with some minor tweaks. The default fork depth is now allowed to be zero for init, thus making the base system have no fork depth penalty. Userspace reporting of "PRI" to top, ps etc was modified to make more sense when penalty was enabled. Thread group accounting was fixed to only offset by absolute deadline, instead of already penalised by fork_depth deadline. Updated changelog follows:
Make it possible to have interactivity and responsiveness at very high load
levels by making deadlines offset by the fork depth from init. This has a
similar effect to 'nice'ing loads that are fork heavy. 'make' is a perfect
example of this and will, with fork_depth_penalty enabled, be felt as much
at 'make -j24' as it normally would be with just 'make'.

Note that this drastically affects CPU distribution, and also has the
indirect side effect of partitioning CPU entitlement to different users as
well. No assumption as to CPU distribution should be made based on past

This is achieved by separating out forks to new processes vs new threads.
When a new process is detected, its fork depth is inherited from its parent
across fork() and then is incremented by one. That fork_depth is then used
to cause a relative offset of its deadline.

This feature is enabled in this patch by default and can be optionally

Threads are kept at the same fork_depth as their parent process, and can
optionally have their CPU entitlement all managed as one process together
by enabling the group_thread_accounting feature. This feature is disabled
by default in this patch, as many desktop applications such as firefox,
amarok, etc are multithreaded. By disabling this feature and enabling the
fork_depth_penalty feature (default) it favours CPU towards desktop

Extensive testing is required to ensure this does not cause regressions in
common workloads.

There are two sysctls to enable/disable these features.

They are in /proc/sys/kernel/

group_thread_accounting - groups CPU accounting by threads
fork_depth_penalty - penalises according to depth of forking from init

An updated patch for 2.6.36-rc7-ck1 follows, though it should apply to a BFS357 patched kernel with offsets:

EDIT: Here is a patch that can be applied to vanilla which gives you BFS + this change:

EDIT2: I notice some people are trying this patch and the earlier released group as entities patch and trying to compare them. Let me make it clear, this patch REPLACES the group as entities patch and does exactly the same thing, only updated.

I am still after feedback on this approach as for my workloads it's only advantageous so I'd love it if more people would report back their experiences, either in the comments here or via email to me at . Thanks!


  1. Installed and working perfectly on Gentoo x64

    layman -a pross
    emerge pk-sources

  2. Can the last two patches be applied to 2.6.35 or are they only for 2.6.36?

  3. This patch will apply to 2.6.35 that has been patched with bfs357 as well (check the bfs/2.6.35 directory). There will be offsets when you patch but they're harmless.

  4. Ah yes, I was trying to apply the latest two after applying the "". Seems like it works when applying the "penalise..." after "...sched-bfs-357.patch". Cheers!
    Thanks for your amazing work, Con! :)

  5. You're welcome. To avoid confusion I've put a full patch in this blogpost as well that can be applied to I look forward to more feedback on this code as I'd love to include it in the next -ck release :)

  6. This testing may be over my head, but I built with today's patch:

    I used the pclinuxos .config for and accepted the defaults for all of the offered options in make oldconfig. Seems to be very smooth so far. If there is anything specific you would like a kernel newbie to try to test, just say so.


  7. Thanks Galen. Just use your machine the way you normally would and tell me if you notice anything different, good or bad :)

  8. The first test that I did was to play:

    Herbie Hancock - Jazz Fusion Cantelope Island

    I stopped the video then scrolled down to the bottom of the rss news feed for freshmeat and selected 'open all in tabs' which opened 100 pages all at once. While these tabs were starting, I opened and closed Synaptic, Thunderbird, Konqueror with no noticeable delays. I went back to Firefox and started deleting the tabs and each one closed in 1-2 seconds all the way to 1 remaining. I don't believe I was able to do this before. This is a laptop and I use the touchpad. I've always thought that the sensitivity was slightly 'off', but the mouse cursor slides smoothly at all times.

    Just to get a little crazy, I tried opening all 100 links while the jazz video playing. Interestingly, when firefox opened the 100 tabs, the video stopped but the music continued to play without a single glitch for about 30+ seconds. At this point the music had long 1-2 seconds pauses and eventually stopped. I then closed each of the tabs, one by one, and firefox is still running to post this response. When all of the tabs were closed, the video tab must have crashed because it was gone.

    I thought this would be a good test, as these links are seperate threads, but with the standard scheduler and earlier bfs schedulers, interactive became sluggish or completely froze for quite a while in this type of scenario. I think you've hit upon something very good.


  9. Just to clarify the previous post - the second test was a success! The mouse remained responsive, and other processes would start and stop quickly, even though firefox was not able to keep the video going. The video would not have continued with any other scheduler, either!


  10. I've been running this (including the updated patch) for a while without reboot now. I've been running two virtual machines (Windows 7 and and Ubuntu) at the same time, and building some stuff on the host (cross-platform developing is the reason I need VMs) and listening to some tunes in Amarok (who can work without tunes :-P). I also have Mumble (VoiP) running since occasionally I need to talk to some other people involved. Everything works OK, no issues yet. KDE4's desktop effects are still smooth as if there was no load. (Though the mainline scheduler is also pretty good at keeping interactivity high, especially since .35).

  11. Thank you very much for your feedback. I'm assuming you both used this patch with the default settings. The test results are very promising. With 2.6.36 just around the corner, I'm tempted to include this feature in the release of 2.6.36-ck1. However it's too early to tell if there will be real regressions from having this enabled by default, so I guess I could include the feature but not have it enabled. With these features disabled, it should behave no differently to BFS 357.

    Thanks again, and I look forward to more feedback from others!

  12. Yes, I left everything as default with

    Tex is having alsa problems building the .34, .35 and .36 kernels. I didn't have that problem, but I built from the command line and not the rpm spec file. When this has been solved, I will request a kernel go into testing and will encourage others to try it out, to see if anyone encounters any problems.