-ck hacking: 3.7-ck1, BFS 426 for linux-3.7

Saturday, 15 December 2012

3.7-ck1, BFS 426 for linux-3.7

Some degree of normality has returned to my life, so I bring to you a resync of the BFS cpu scheduler for 3.7, along with the -ck patches to date.

Apply to 3.7.x:
patch-3.7-ck1.bz2
or
patch-3.7-ck1.lrz

Broken out tarball:
3.7-ck1-broken-out.tar.bz2
or
3.7-ck1-broken-out.tar.lrz

Discrete patches:
patches

Latest BFS by itself:
3.7-sched-bfs-426.patch

People often ask me why I don't maintain a git tree of my patches or at least BFS and make it easier on myself and those who download it. As it turns out, it is actually less work only for those who download it to have a git tree and would actually be more work for me to maintain a git tree.

While I'm sure most people are shaking their head and thinking I'm just some kind of git-phobe, I'll try to explain (Note that I maintain git trees for lrzip https://github.com/ckolivas/lrzip and cgminer https://github.com/ckolivas/cgminer).

I do NOT keep track of the linux kernel patches as they come in during the development phase prior to the latest stable release. Unfortunately I simply do not have the time nor the inclination to care on that level any more about linux kernel. However I still do believe quite a lot in what BFS has to offer. If I watched each patch as it came into git, I could simply keep my fork with BFS and merge the linux kernel patches as they came in, resyncing and modifying as it went along with the changes. When new patches go into the kernel, there is a common pattern of many changes occurring shortly after they're merged, with a few fixes going in, some files being moved around a few times, and occasionally the patch backed out when it's found the patch introduces some nasty regression that proves a showstopper to it being released. Each one of these changes - fixes, moves, renames, removal, require a resync if you are maintaining a fork.

The way I've coded up the actual BFS patch itself is to be as unobtrusive as possible - it does not actually replace large chunks of code en bloc, just adding files and redirecting builds to use those new files instead of the mainline files. This is done to minimise how much effort it is to resync when new changes come. The vast majority of the time, only trivial changes need to be made for the patch to even apply. Thus applying an old patch to a new kernel just needs fixes to apply (even if it doesn't build). This is usually the first step I do in syncing BFS, and I end up with something like this after fixing the rejects:

http://ck.kolivas.org/patches/bfs/3.0/3.7/incremental/3.7-sched-bfs-425.patch

This patch is only the 3.6 patch fixing any chunks that don't apply.

After that, I go through the incremental changes from mainline 3.6 to 3.7 to see any scheduler related changes that should be applied to BFS to 1. make it build with API changes in mainline and 2. benefit from any new features going into mainline that are relevant to the scheduler in general. I manually add the changes and end up with an incremental patch like this:

http://ck.kolivas.org/patches/bfs/3.0/3.7/incremental/bfs425-merge.patch
This patch is only merging 3.6->3.7 changes into BFS itself

Finally I actually apply any new changes to BFS since the last major release, bugfixes or improvements as the case may be, as per this patch here:
http://ck.kolivas.org/patches/bfs/3.0/3.7/incremental/bfs425-updates.patch

Git is an excellent source control tool, but provides me with almost nothing for this sort of process where a patch is synced up after 3 months of development. If I were to have my fork and then start merging all patches between 3.6 and 3.7, it would fail to merge new changes probably dozens and potentially hundreds of times along the way, each requiring manual correction. While merge conflicts are just as easy to resolve with git as they are with patch, they aren't actually easier, and instead of there being conflicts precisely once in the development process, there are likely many with this approach.

However git also does not provide me with any way to port new changes from mainline to the BFS patch itself. They still need to be applied manually, and if changes occur along the way between 3.6 stable through 3.7-rc unstable to 3.7 stable, each time a change occurs to mainline, the change needs to be done to BFS. Thus I end up reproducing all the bugfixes, moves, renames and back-outs that mainline does along the way, instead of just doing it once.

Hopefully this gives some insight into the process and why git is actually counter-productive to BFS syncing.

Enjoy 3.7 BFS.
お楽しみください

74 comments:

Anonymous16 December 2012 at 00:17
Thank you Con.
Btw, broken links: the "3.71" in the URLs should be "3.7-ck1" instead. The full address is http://ck.kolivas.org/patches/3.0/3.7/3.7-ck1/
ReplyDelete
Replies
ck16 December 2012 at 00:31
Thanks. I always seem to screw up links on blogspot. Links fixed.
ReplyDelete
Replies
graysky16 December 2012 at 12:51
Thanks as usual CK. For those interested readers, here are the tests confirming the performance of bfs v0.425 to bfs v0.426 as I usually provide. A reminder that these are my `make bzImage` test on my workstation comparing 4 kernels: 3.6.9, 3.6.9-bfs, 3.7.0, and 3.7.0-bfs. The script is on my github linked below.

Data: http://s19.postimage.org/clypopxpv/bfs_v_cfs.png
Script: https://github.com/graysky2/bin/blob/master/make_bench
ReplyDelete
Replies
graysky16 December 2012 at 13:43
If you haven't seen this yet, a branch of BFS:

http://lkml.indiana.edu/hypermail/linux/kernel/1212.1/03729.html
ReplyDelete
Replies
Anonymous16 December 2012 at 13:55
o tanoshimi i suppose? Should learn my kankis...
ReplyDelete
Replies
Anonymous16 December 2012 at 17:42
Thanks for your benchmark, graysky. For those of us (e.g. me) who can't read graphs and don't know what a quantile is, could you please sum it up in plain english? ;-) Bfs vs cfs, who wins this round?
ReplyDelete
Replies
graysky17 December 2012 at 00:52
That graphic is overly complicated for people who don't do statistics. But since CK has a background in stats, I included them. In plain English, both version of the BFS gave statistically significant DECREASES in compile time compared to the CFS. Less time = faster compile.

Look at the median for each group. BFS v0.426 (the current one that patches into the linux 3.7 series) was around 350 ms faster than the corresponding mainline scheduler (CFS).
ReplyDelete
Replies
Anonymous17 December 2012 at 12:56
actually using git is *much easier*

all you need to do is git rebase v3.7 and resolve little conflects.
ReplyDelete
Replies
Anonymous17 December 2012 at 13:26
Con Kolivas please look at what this is doing.
http://lkml.indiana.edu/hypermail/linux/kernel/1212.1/03729.html

I do not mean to be mean here. But this person is at long last addressing the faults that were raised against BFS you did not want to hear a while back when you stormed away for the Linux kernel main developers and made an ass out of yourself.

As the maintainer of CFS said at the time BFS was not scaling properly and you response was that those system were more complex than desktop. We are now getting desktops 8 cores and more.

It would be good to get you back in the main-line development Con Kolivas. Hopefully big enough lesson not to be pig headed even if the other person appears to be. If it does not work on large machines sooner or latter those large machines will be the general desktop/mobile phone machines. Yes there is a 8 core mobile phone due out as well. The sub 4 core gains are over.

So scaling well as number of cores increases has become critical.

The one thing CFS has had always over BFS is better handling on large systems by avoiding cpus having to lock to access data as much.

http://cs.unm.edu/~eschulte/classes/cs587/data/bfs-v-cfs_groves-knockel-schulte.pdf
"BFS takes a diferent approach than both the O(1) scheduler and CFS. BFS uses runqueues like O(1); however, unlike O(1), which has both an active and an expired runqueue per CPU, BFS has only one system-wide runqueue containing all non-running tasks."

Yes the place BFS broke from a O(1) scheduler. Comes back and hurts you as you get more and more cores needing to talk to the 1 single system-wide runqueue.

Now a wise person would have investigated why. Worst locking is memory controller to memory controller for performance cost. So a run-queue per memory controller. Possible in a cgroup of processes assigned to that physical cores connected to that memory control the cost would be min. And you would avoid having to perform load balancing.

Now you also have stated your hate for cgroups around processes. This now forces you down the path of runqueue per core that now leads back to costly load balancing.

Basically the problem you put off along time ago Con is back with for revenge.
ReplyDelete
Replies
Aaron Riekenberg17 December 2012 at 13:33
I read your post but I'm confused - why don't you just clone kernel.org's repository and create your own branch in your repo (call it "ck")?

Then when you want to merge with a new kernel from kernel.org, just do a pull on the master branch followed by a single "git merge" into your "ck" branch.

Commit the merge to "ck" and repeat as needed.
ReplyDelete
Replies
tvall17 December 2012 at 13:36
Thanks for the detailed explanation. I always wondered why you don't use git for this. I didn't consider how different it would be to do countless smaller changes throughout the process of a kernel release instead of one larger rebase of your code.

Time to put your new ck patch to work on my shiny new (to me) core 2 duo. It always helped dramatically on my old pentium 4 boxes
ReplyDelete
Replies
Anonymous19 December 2012 at 19:04
Hey Con,

Just wanted to say thank you, my machine feels so much snappier with the CK patch I actually thought I left it on the performance governor instead of ondemand :)

Gabor
ReplyDelete
Replies
Unknown22 December 2012 at 16:18
Thank you for the updated BFS, Con. I've been running it for a week now on a heavily used Core i5 with no issues whatsoever. Looks solid.
ReplyDelete
Replies
Anonymous24 December 2012 at 23:49
I've compiled (x86_64) 3.7.1 based on Ubuntu's kernel git repo both with CK1 and BFQ: http://narod.ru/disk/64736436001.f1d7509f20fd047f0c3be8d64190cdc9/linux-image-3.7.1-ck1s1_3.7.1-ck1s1-10.00.Custom_amd64.deb.html

-poige
ReplyDelete
Replies
Anonymous30 December 2012 at 02:13
Thanks for bfs for kernel 3.7. Hope you'll resume your work on it soon, hacking and improving bfs' features for even better desktop responsivity :)
ReplyDelete
Replies
Anonymous30 December 2012 at 07:37
ck patches I dont know - they are pointless or even harmful if you are using newer kernel features ...

3.7-sched-bfs-426.patch applies cleanly with linux-3.7.1
Yours isn't any different ...

Ciao from Hamburg, Ralph Ulrich
ReplyDelete
Replies
Anonymous1 January 2013 at 00:20
Unfortunately the new BFS does make accounts the process times wrong again :(
- I had this issue with linux-3.4.x-bfs
- came around with higher RCU boost with linux-3.5.x-bfs
- I had no thus issues with linux-3.6.x-bfs
- now manipulating RCU .config doesn't help with linux-3.7.1-bfs

As a simple user I feel dependend observing htop -ordering by time- for controlling functioning of my system(d). Also I feel very unsecure observing some 50 million hours on some tasks :(

Ciao from Hamburg, Ralph Ulrich
ReplyDelete
Replies
Anonymous2 January 2013 at 07:07
At this point I want to add some experiences I've made with the 3.6.x series + ck/bfs + bfq. At some time suspend-to-disk broke. I tried to investigate further and came to the conclusion that my switch from pure bfs+bfq to ck+bfq made the difference. So... long story short...
On my old machine with 1.4 GHz CPU, 2GB RAM, 3GB shm, 4 GB swap my normal operations AND suspend-to-disk+resume worked again, when
leaving swappiness = 60 (openSUSE kernel-source default; ck = 10 always distorted my desktop latency),
setting dirty_ratio = 6 (default = 20; ck = 1),
setting dirty_background_ratio = 2 (default = 10; ck = 1),
[the last both settings are the lowest known working with suspend +1 to be on the safe side].

I haven't retested this on 3.7.1 as with 3.6.x so far, but it's working fine for 5 days of uptime with 3.7.1 and regular nightly suspends!
Is this difference due to my outdated computer?

Greets and my very best wishes for your 2013 to all of you,

Manuel Krause
ReplyDelete
Replies
Alfred Chen8 January 2013 at 14:03
Hi ck, just want to report back that the following issue still existed in 3.7 bfs, I have reported it in 3.5&3.6

[ 0.126666] kernel/sched/bfs.c:7171 suspicious rcu_dereference_check() usage!

It's not impact bfs functionality, you said you would fix it in next release but may forget to mark it down.

Thanks again for the effect in ck patch, it is running well in 3.7.1.
ReplyDelete
Replies
Anonymous10 January 2013 at 00:39
PS: I have perfected the low-jitter config on linux, for mainline kernel. Please see the jitter links on my blog. www.paradoxuncreated.com

I also tested BFS during this. It seems general experience of jitter, (lost frames, poor frametiming) is worse with BFS.). So if I was you, I´d drop it. Get an Intel E5 workstation on top of this, and even windows won´t stutter.

Unless ofcourse you have some idea you want to realize. But then somne measure of fairness seems quite good.

Peace Be With You.
ReplyDelete
Replies
Anonymous23 January 2013 at 07:56
bad performance on video encoding with bfs (i have never converted a video before on linux)
(data0001.ts = 487,2MB)

time HandBrakeCLI -i "data0001.ts" -o "data0001.ts.mp4" -E copy:ac3 -e x264 -m -4 -f mp4 -q 20.0 -r 25 --decomb --loose-anamorphic -x ref=4:bframes=4:b-adapt=2:rc-lookahead=60:analyse=all

with bfs
real 9m20.533s
user 35m22.633s
sys 0m3.776s

without bfs
real 8m10.895s
user 30m47.550s
sys 0m13.350s
the cpu usage is higher and the cpu gets ~2°C warmer without bfs

can anybody confirm this? (sorry, bad english....)

regards
ReplyDelete
Replies
Anonymous24 January 2013 at 01:35
Hello,
I'm testing multithreading behavoir of BFS especially the Hyperthreading awareness.
Here is a sample with the Whetstone MP Benchmark

normal run 4 4 threads:
MWIPS 9775
with taskset -c 0,1,2,3:
MWIPS 11449

As you can see the taskset variant is ~17% percent faster.
My question is now is there any way to let BFS prioritize the physical cores until we have more than 4 active threads and then for each new thread use virtual core 4,5,6,7.
So basically handle my processor as quad core until we have more than 4 active threads. A behavoir like this would be nice cause most of the time we don't use more than 4 cores and the normal BFS behavoir diminishes performance.
ReplyDelete
Replies
ck24 January 2013 at 16:25
I have seen this benchmark a long time ago (note he calls CFS CFQ by mistake) and it is a one off test for one particular workload. "Wrong" is too strong a word to describe this behaviour, because it depends largely on what endpoint you're measuring. BFS prioritises latency over throughput and in the relatively unloaded CPU case, BFS shines at its ability to find the earliest available CPU to minimise latency. Doing this sacrifices throughput -slightly- of cache bound throughput intensive workloads. BFS is a scheduler designed to optimise interactivity and responsiveness primarily and to maintain good throughput secondarily. The fact that BFS does better at any throughput benchmark compared to the mainline scheduler is a bonus.
ReplyDelete
Replies
Anonymous26 January 2013 at 03:19
Here is a lengthy article of what I actually meant with "real" cores.
http://software.intel.com/en-us/articles/performance-insights-to-intel-hyper-threading-technology
ReplyDelete
Replies
mrasero26 January 2013 at 04:19
I am having in the last versions, from 3.6.x upwards i think, problems making backups with rsnapshot (a rsync based backup solution) to my mdadm software raid 5 xfs filesystem, here you can see a thread where i reported the problem to xfs mailing list, but seems to be related to the kernel, could this be a bfs problem?

http://oss.sgi.com/archives/xfs/2013-01/msg00081.html
ReplyDelete
Replies
Anonymous28 January 2013 at 03:34
normaly i can choose between TICK_CPU_ACCOUNTING or IRQ_TIME_ACCOUNTING

With bfs, both! will be enabled in .config

CONFIG_TICK_CPU_ACCOUNTING=y
CONFIG_IRQ_TIME_ACCOUNTING=y

ReplyDelete
Replies
Anonymous29 January 2013 at 08:40
ck, thank you for your continued contributions.
ReplyDelete
Replies
kjpetrie17 May 2013 at 21:45
I tend to find BFS doesn't play nice with nice. When I run a processor-intensive task such as a backup or compile in the background niced so I can keep working on a responsive machine, BFS seems to thrash, pushing the load up, grinding the desktop to a halt, and taking around 3 times as long to complete the task as the normal kernel scheduler.

For this reason I have always avoided BFS-enabled kernels, but the distro I use (PCLinuxOS) has now removed the non-BFS kernels from its repository, so it looks as if I will be faced with a choice between BFS kernels or compiling my own in future.

One of the kernel packagers has suggested I report the problem here.
ReplyDelete
Replies

Add comment