-ck hacking: BFS 0.403 test for 2.6.39-rc7

Saturday, 14 May 2011

BFS 0.403 test for 2.6.39-rc7

BFS 0.402 test has proven very stable on 2.6.39-rc7 but a minor issue came up with respect to the new accurate IRQ accounting where some CPU time did not get accounted. So I went in and revised the way it worked to be cheaper and more accurate. There has also been a problem in the accounting that the total cpu did not always add up to 100%. The reason for this was the small inaccuracies of each respective CPU usage (user, system, wait etc.) all were exacerbated when added together. I've put in a total CPU percentage counter that checks the total adds up to 100 and if not, it rounds the values up so they should add up to 100%.

There was also a change I considered doing with the sticky flag that is used to minimise task movement to different CPUs that I've committed to 403 test. Instead of it being a binary on/off flag, I made it a stepped flag going from CACHE_COLD through CACHE_WARM to CACHE_HOT. Basically any task that is knocked off a CPU but is still waiting for more CPU is immediately labelled hot. Only one task is considered hot and previously as soon as a new cache hot task appeared, the sticky flag was cleared. Now, instead of it being cleared, it is set to warm, and only cleared to cold when the task sleeps. Forked child processes are now also labelled cache warm since they share many structures with their parent process. Any task that is cache warm or cache hot is biased against moving to another cpu by offsetting its relative deadline. Any task that is cache hot will not move cpu to a different cpu if that different one is scaled down in speed (as for example when ondemand cpu frequency governor slows it down). Basically this new change should improve throughput more in the overloaded case (when jobs > CPUs), but that's just a generic comment as I haven't benchmarked it yet.

Anyway give the new BFS a try. Everything appears to be running nice and stable, and as a bonus, my feel-good-o-meter is reading quite high with the upcoming 2.6.39! The magnitude of changes going into it seemed a lot less than previous kernels and I've had no issues with the -rc7 version so far.

As per previously, I've compressed the patch with lrzip as part of my evil plot to force you all to use it. Get it here:
2.6.39-rc7-sched-bfs-403-test.patch.lrz

Enjoy, and please report back if you try it!

17 comments:

Anonymous14 May 2011 at 13:38
Here in my Core 2 Duo system with only 1GB ram, everything is working fine until now!
ReplyDelete
Replies
Anonymous14 May 2011 at 23:57
some gcc 4.6 messages:
In file included from kernel/sched.c:2:0:
kernel/sched_bfs.c: In function ‘sched_getaffinity’:
kernel/sched_bfs.c:4371:13: warning: variable ‘rq’ set but not used [-Wunused-but-set-variable]
kernel/sched_bfs.c: In function ‘sys_sched_yield’:
kernel/sched_bfs.c:4441:13: warning: variable ‘rq’ set but not used [-Wunused-but-set-variable]
kernel/sched_bfs.c: In function ‘sys_sched_rr_get_interval’:
kernel/sched_bfs.c:4686:13: warning: variable ‘rq’ set but not used [-Wunused-but-set-variable]
ReplyDelete
Replies
Unknown15 May 2011 at 01:11
(Not sure if this is a BFS vs mainline thing.) I noticed that 'top' in my system reports CPU% as an integer number (1, 2, etc) while on a system running mainline it reports 0.3, 1.5, etc. Is this normal?
ReplyDelete
Replies
ck15 May 2011 at 08:13
@RealNC : I don't see what you're reporting. I get floats on my machine?
ReplyDelete
Replies
ck15 May 2011 at 22:51
It seems this one causes a regression with the whole cache warm thing. Hang in there and prepare for a 404 that actually exists.
ReplyDelete
Replies
Ralph Ulrich16 May 2011 at 00:14
403 works here without issues:
2.6.39-rc7-git8-bfs403
ReplyDelete
Replies
Ralph Ulrich16 May 2011 at 00:31
Can't find your patch 404. Probably your lrzip compressor is too slow :/
ReplyDelete
Replies
ck16 May 2011 at 00:33
Lol, I haven't even started making the 404. You'll likely hit a 404 if you try clicking on the non-existent 404. The 404 is just a concept for now, and is nothing but a 404 till then.
ReplyDelete
Replies
Ralph Ulrich16 May 2011 at 00:41
Obviously my translator to german couldn't correctly work with your sentence:
"and prepare for a 404 that actually exists."

Con, another question if you like to talk:
You mentioned in your blog above the slowing down of linux kernel development. Do you think linux has reached a point of maturity where nothing exciting will happen any more?

Perhaps, if we get some new hardware like quantum processors ...
ReplyDelete
Replies
ck16 May 2011 at 00:44
The "404 that actually exists" was a joke about the 404 that is normally "this page doesn't exist", sorry.

No, I don't think linux kernel has matured, we are just in a relatively quiet period and I expect the usual frantic pace of development in the near future.
ReplyDelete
Replies
Ralph Ulrich16 May 2011 at 00:53
I don't think. It is all done: BKL, drm, CFS, etc ...

But if you would try again the inclusion of a scheduler plugin infrastructure into mainline. Despite your bad feeling about the matter it is the times to do.
ReplyDelete
Replies
Unknown16 May 2011 at 12:52
@ck
> I don't see what you're reporting. I
> get floats on my machine?

Weird. Here, I get integers :-P

http://i56.tinypic.com/b7fcj8.png

Since you get floats, then I guess it's a userland configuration thing. I'll need to build a mainline kernel and boot with it to veriry.
ReplyDelete
Replies
ck16 May 2011 at 13:11
Try pressing H to enable thread view.
ReplyDelete
Replies
Anonymous17 May 2011 at 03:30
%ck
> the total cpu did not always add up to 100%.

I'm suspicious that that you are hiding a bug. As long as you are not brushing it under the rug:

wait = 100% - system - user - etc, by definition. No?

I mean, do you really need to count total time?
Can't you just look at the clock, so to speak?
ReplyDelete
Replies
ck17 May 2011 at 08:20
Mainline certainly works by adding up what's left to get the rest of the CPU on each tick. However what BFS does is add up CPU time every time some accumulates to each respective component. Then on every tick it looks at the current running total. Thus due to rounding down and slight discrepancies between the total and when the actual tick fires it often adds up to 1 or 2% less than 100% of that tick.
ReplyDelete
Replies
Anonymous18 May 2011 at 00:20
Still, scaling is a brute force approach. It will always "work" to hide any bug.

Are you against adding the unaccounted time to the largest task or the idle time. A task that is already using a lot will not suffer much from 1-2% on top. It could be simpler to implement too.
ReplyDelete
Replies
ck18 May 2011 at 00:49
I am just adding it.
ReplyDelete
Replies

Add comment