Friday, 1 April 2011

BFS 0.372 test patch

Another day, another BFS test release. This one builds on the ideas in the 0.371-test3 patch I posted about since they're proving very positive. No April fools here. This looks like it's kicking arse.

Apply to a BFS 0.363 patched kernel such as 2.6.38-ck1:

bfs363-372-test.patch

Changelog from the patch:
---
Add a "sticky" flag for one CPU bound task per runqueue which is used to flag
the last cache warm task per CPU. Use this flag to softly affine the task to
the CPU by not allowing it to move to another CPU when a scaling CPU frequency
governor is in use. This significantly improves throughput at lower loads by
allowing tasks to cluster on CPUs, thereby allowing the scaling governor to
speed up only that CPU. This should aslo save power. Use the sticky flag to
determine cache distance in earliest_deadline_task task and abolish the
cache_distance function entirely. This is proven as, if not more, effective.

Add helpers to the 3 scaling governors to tell the scheduler when a CPU is
scaling.

Replace the frequent use of num_online_cpus() with a grq.noc variable that
is only updated when the number of online cpus changes.

Simplify resched_best_idle by removing the open coded for_each_cpu_mask as
it was not of proven benefit.

Remove warnings in try_to_wakeup_local that are harmless or never hit.

Clear the cpuidle_map bit only when edt doesn't return the idle task.

Abolish the scaled rr_interval by number of CPUs and now just use a fixed
nominal 6ms everywhere. The improved cache warmth of the sticky flag makes
this unnecessary and allows us to lower overall latencies on SMP by doing so.
---

Please test this one thoroughly. It's very stable and now heavily tested, but I won't announce any new "release" till it's been tested for maybe 5 days or more. It appears better in all workloads and with and without cpu frequency governors. Again, only SMP will benefit from this patch, but it should change behaviour in all SMP now, not just with ondemand. However, the scaling governors should show the most improvement.

The best example (with ondemand) was a single threaded cpu bound workload that took 126 seconds to complete on an i7 2 core/4 thread machine that now takes 91.5 seconds. The 2 threaded workload dropped from 66 seconds to 51.5 seconds. Note that this more or less addresses a regression in BFS behaviour with cpu frequency scaling on SMP, but it's also been an opportunity to improve behaviour elsewhere.

BFS 0.371 test3

TL;DR I'd like more testing.

Here's a new lightly tested patch trying another simpler and cheaper approach to improving throughput with scaling CPU frequency governors (like ondemand) without the flaws of the previous approach.

I've enabled the changes at all times, not just when the ondemand governor is run, but again this change only affects SMP users. This test patch is only lightly tested, but I'd appreciate it if people gave it a bit of a run. Apply to a BFS 363 based kernel such as 2.6.38-ck1.

bfs363-371-test3.patch

Too tired to describe what it does right now... Zzzz

Tuesday, 29 March 2011

2.6.38-ck1, BFS 0.363

So I screwed up. Sorry!

BFS 370 causes some strange regressions as per this blog and offlist. It appears that F&H doesn't, for example, scale to multiple CPUs under BFS 370. Also some latency regressions were reported here (and elsewhere). So I've decided to pull BFS 370 and 2.6.38-ck2, pending further investigation. There's only so much I can do without lots of people testing, I'm afraid, and often I get 1 maybe 2 people testing before a "stable" release, and then get about 10,000 downloads once the stable release comes out. So it was with test2/BFS370. Anyway the point is, go back to 2.6.38-ck1 or BFS 363 till I figure out what the problem was and decide whether it's worth pursuing this avenue or not.

2.6.38-ck2, BFS 0.370

EDIT EDIT EDIT: This patch causes bizarre regressions and has been backed out. Consider 2.6.38-ck1 and BFS0.363 the stable releases, SORRY!

After more testing and cleaning up of the patch posted here earlier (test2), I've put it together as a new BFS release with almost trivial changes since that test2 patch. The changes are cosmetic only apart from a removal of the warning which is hit occasionally and is now harmless on BFS.

Just to reiterate, unless you are on an SMP machine (2 or more threads or cores) AND are using a scaling CPU frequency governor (e.g. ondemand), then there will be no significant performance advantage to upgrading to BFS 370 or ck2. For those with that combination, what you can expect to see is an increase in throughput with lightly loaded machines (single threaded apps most affected) and likely an increase in battery life. Overall latency is unlikely to be affected keeping interactivity relatively the same but responsiveness should also increase. If you are unsure of the difference, read this summary I wrote for interbench:
readme.interactivity

When the kernel mirrors sync up, ck2 will be found here:
2.6.38-ck2
It applies with some minor offsets to 2.6.38.2 so you can safely apply it to that kernel if you like.

BFS is available here:
BFS

And Ubuntu packages of 2.6.35.11-ck2 and 2.6.38.2-ck1 which have the new BFS are now available here:
Ubuntu Packages

EDIT: People keep asking me why I've "optimised" only for SMP and ondemand. This is not the case at all. This patch addresses a performance regression that only affects that combination.

EDIT2: SEE ABOVE NOTICE! PATCH CONSIDERED BAD, GO BACK TO 2.6.38-ck1 and BFS 0.363 PLEASE!