-ck hacking: BFS 0.413, BFS benchmarks, (and more cksort stuff)

Monday, 17 October 2011

BFS 0.413, BFS benchmarks, (and more cksort stuff)

Finally some BFS news, yay \o/

TL;DR: Minor unnoticeable tweaks for BFS.

Incremental:
3.0-bfs-406-413.patch

Incremental for 3.0.x-ck1:
3.0-ck1-bfs-406-413.patch

Full patch for 3.0.x:
3.0-sched-bfs-413.patch

Alas there isn't anything earth shattering here. The skiplist implementation experiment failed to be of any benefit despite my best intentions, so I've abandoned the idea for the time being. Meanwhile, there were a number of minor improvements that I had considered for BFS (and I mean really really minor) and a few cleanups that had been waiting. So here's the changelog for the new BFS for 3.0:

Improve the preempt call by explicitly setting a task that will be used on
lookup in earliest_deadline_task to avoid iterating over all tasks. Set the
effective runqueue priority for further calls to try_preempt based on the
preempting task's priority. Thus on the small chance that something else
tries to preempt the runqueue before the new tasks gets the CPU, it will
compare to the previously successful preempting task. This should avoid a
preempting task from rescheduling a runqueue and then the CPU deciding to
take a different task instead of the preempting one.

Clean up a number of variables from unnecessary longs to ints and ints to
bools.

Microoptimise around try preempt by settings highest_prio to that of the
task that's trying to preempt to avoid unnecessary comparisons.

Break sole affinity on cpu offline -after- the cpu has been set to be offline.

I'd be surprised if anyone noticed anything different with these changes, but then, there have been some surprises from apparently innocuous changes in the past, so I'm keen to get feedback from testers. There is a small chance that the preempt changes can improve throughput by avoiding unnecessary rescheduling, and that the cpu offline changes might help rare scenarios where a machine will fail to suspend to ram.

Meanwhile, graysky who has been maintaining the unofficial ck repos for archlinux has very kindly put together a number of throughput benchmarks comparing BFS to mainline on machines from 2x up to 16x in a very nicely put together PDF which you can download and read here:

repo-ck.com/bench/benchmark.pdf

On the cksort front, it's clear to me that sorting algorithms are pretty boring to most people, despite me finding transient interest in them and developing my own. Nevertheless, the cksort algorithm has been further improved. After playing with it some more, and learning from the legendary work of the past, I modified the merge sort that cksort "tips" to to include an insertion sort as well, making the algorithm now a blended insertion+merge sort once it tips away from the presort+nodemerge algorithm which is basically the only unique part of cksort. This capitalises on the fact that insertion sort is pretty much the fastest way to sort up to 32 different entries by splitting up the merge into up-to-32 entry sized groups and doing insertion sort on all of them, followed by using a non-recursive merge sort on each group.

This has further improved cksort to now be faster than qsort for up to 1500 entries, and now equally fast with library qsort() up to 10000 (uniquely random)entries. Beyond that it is still slower than qsort, but it has -no- corner cases where it will ever become O(n^2) and will always be O(n log n) bound, unlike qsort.

Version 0.44 of cksort is here:
http://ck.kolivas.org/apps/cksort/cksort.c

And in response to requests for the benchmarking code vs qsort it is here:
http://ck.kolivas.org/apps/cksort/benchsort.c

Now Aldo Cortesi has a very interesting static view of comparison sort algorithms here:
http://sortvis.org/

After contacting him, he was kind enough to modify his code such that it would take output from something like the cksort to generate a graph from any sorting algorithm. Alas cksort uses a separate array for its work and looking at the array of the final data does not tell the entire story of what's going on, but it does help further to visualise how cksort works.

Here is a view of 20 random variables being sorted and you can see the 2 phases of cksort where it builds up the nodes first, and the 2nd phase where it merges the nodes (click to enlarge):

It's important to note that the 2nd stage graph is fairly arbitrary since the data from the unmerged nodes is not actually written back to the original array until it's merged, but you should be able to make out the final sorted data appearing from the outside in.

Sorting non-random entries is really what makes cksort interesting and here is a (rather complex) visualisation of the 50 mostly-sorted example in the animated form I showed on my blog previously:

Anyway it's been a lot of fun playing with this stuff, and maybe one day I'll come up with something more useful from the algorithm. I played with an in-place version of the code which didn't need a second array and the amount of memory moving required made the algorithm slower and more complex so I soon lost interest in developing that for the time being. I'd like to find a way to make the presort+node merging aspect scale to any size instead of tipping to a different algorithm but the lack of any pattern in large random data sets means you can't really find a pattern in it. I still think there may be something like a heap structure that would help it but haven't found anything that works in practice. Maybe I'll come back to this in the future. Meanwhile it works extremely well for the smaller data sets, partially ordered data, reverse ordered data, and few unique data sets. Interestingly it coincidentally shares a lot in its ideas with Timsort. All I can say after all this is that the work of the legends of the past, and coming up with something like Smoothsort is truly amazing.

39 comments:

Anonymous18 October 2011 at 03:08
Yay! new bfs patch!
ReplyDelete
Replies
Anonymous19 October 2011 at 07:34
Ralph Ulrich

bfs-413 fails in line 250 of posix-cpu-timers.c

due to linux-3.0.7 series
posix-cpu-timers-cure-smp-wobbles.patch
ReplyDelete
Replies
Anonymous19 October 2011 at 10:27
Ralph Ulrich

Bfs-413 runs perfectly with linux-3.0.7 (if BFS patch is adjusted, see above)! Thank you Con!
ReplyDelete
Replies
graysky19 October 2011 at 11:05
Here is the 3.0 --> 3.0.7 patch:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commitdiff_plain;h=v3.0.7;hp=v3.0

Problem is that ck1 patchset doesn't apply to it so I can't even try to patch 0.406 --> 0.413.

patching file arch/powerpc/platforms/cell/spufs/sched.c
patching file Documentation/scheduler/sched-BFS.txt
patching file Documentation/sysctl/kernel.txt
patching file fs/proc/base.c
patching file include/linux/init_task.h
patching file include/linux/ioprio.h
patching file include/linux/sched.h
Hunk #7 succeeded at 2035 with fuzz 1 (offset -1 lines).
Hunk #8 succeeded at 2792 (offset -1 lines).
patching file init/Kconfig
patching file init/main.c
patching file kernel/delayacct.c
patching file kernel/exit.c
patching file kernel/posix-cpu-timers.c
Hunk #1 FAILED at 250.
Hunk #2 succeeded at 511 (offset 1 line).
Hunk #3 succeeded at 521 (offset 1 line).
Hunk #4 succeeded at 952 (offset 1 line).
Hunk #5 succeeded at 969 (offset 1 line).
Hunk #6 succeeded at 977 (offset 1 line).
Hunk #7 succeeded at 1279 (offset 1 line).
1 out of 7 hunks FAILED -- saving rejects to file kernel/posix-cpu-timers.c.rej
patching file kernel/sched_bfs.c
patching file kernel/sched.c
Hunk #2 succeeded at 9323 (offset -13 lines).
patching file kernel/sysctl.c
patching file lib/Kconfig.debug
patching file include/linux/jiffies.h
patching file drivers/cpufreq/cpufreq.c
patching file mm/vmscan.c
Hunk #4 succeeded at 1987 (offset 6 lines).
Hunk #5 succeeded at 2736 (offset 6 lines).
Hunk #6 succeeded at 2789 (offset 6 lines).
Hunk #7 succeeded at 2841 (offset 6 lines).
Hunk #8 succeeded at 2853 (offset 6 lines).
Hunk #9 succeeded at 2968 (offset 6 lines).
patching file include/linux/swap.h
patching file mm/memory.c
Hunk #1 succeeded at 2995 (offset 56 lines).
patching file mm/swapfile.c
patching file include/linux/mmzone.h
patching file include/linux/mm_inline.h
patching file mm/filemap.c
patching file mm/swap.c
patching file mm/readahead.c
patching file include/linux/pagemap.h
patching file mm/page-writeback.c
patching file arch/x86/Kconfig
patching file kernel/Kconfig.hz
patching file arch/x86/kernel/cpu/proc.c
patching file arch/x86/kernel/smpboot.c
patching file include/linux/nfsd/stats.h
patching file include/net/inet_timewait_sock.h
patching file init/calibrate.c
patching file kernel/Kconfig.preempt
patching file drivers/cpufreq/cpufreq_ondemand.c
patching file drivers/cpufreq/cpufreq_conservative.c
patching file Makefile
ReplyDelete
Replies
ck19 October 2011 at 11:13
There are limits to how much time I can spend syncing up with every tree.
ReplyDelete
Replies
graysky19 October 2011 at 11:19
CK - aside from those errors, I ran the same make benchmark I described in my pdf you linked on v0.406 and v0.413 on two machines (quad core = 4 cores and dual quad board = 16 cores). With n=27, there is no statistically significant difference between the two. If you power the analysis with higher n's (90) the means become a close to significant but still not yet. I'll bet if I run it out to like 400-500 n's they will separate....

Differences are on the order of 60 msec slower with 0.413 on the 16 core and 125 msec slower with the 4 core.

Quad:
http://s1.postimage.org/bo8yxdsyt/image.png

Hexadeca:
http://s1.postimage.org/bo9c5okx1/image.png
ReplyDelete
Replies
ck19 October 2011 at 14:22
That's great, thanks. That's good enough for me to be reassured there's no drastic problem in 413.
ReplyDelete
Replies
Anonymous19 October 2011 at 19:32
Ralph Ulrich

graysky,
I just patch to stable-queue using:

git://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git
consecutive: cd stable-queue && git pull

You'll get the singled patches,
anything against, not good?

ck,
there are distributions using your patch. Perhaps down stream there could be a maintainer adjusting your patches and care?
ReplyDelete
Replies
Anonymous19 October 2011 at 20:06
Ralph Ulrich
My comments on testing performance, 4 points important for me why I am using BFS:

1 minimal latency
2 minimal power consumption
3 good throughput performance
4 no cornercase where anything above breaks

If minimal less throughput performance but power or latency better, this is a step in good direction for me.
ReplyDelete
Replies
graysky20 October 2011 at 01:06
@CK - It has to be a time drain, that's for sure. I'd like to learn how to do it myself, but I can only guess. In this case, the offending line seems to be in 3.0.7's kernel/posix-cpu-timers.c

Seems like ck1-3.0.0 expects a line to be there but it is not. It wants the following to be removed:
- times->sum_exec_runtime += t->se.sum_exec_runtime;

But in 3.0.7 this line has been changed to:
times->utime = cputime_add(times->utime, t->utime);

I don't know if it's okay to make the substitution in the ck1 patch. Nor do I know if the line you want to replace it with is valid. In ck1-3.0.0 this line is:
+ tsk->utime, tsk->stime, tsk_seruntime(tsk));
ReplyDelete
Replies
graysky20 October 2011 at 01:14
Proposed patch to patch-3.0.0-ck1

--- patch-3.0.0-ck1 2011-10-19 15:44:26.792901029 +0200
+++ patch-3.0.0-ck1 2011-10-19 15:54:02.544703139 +0200
@@ -804,7 +804,7 @@ Index: linux-3.0.0-ck1/kernel/posix-cpu-
do {
times->utime = cputime_add(times->utime, t->utime);
times->stime = cputime_add(times->stime, t->stime);
-- times->sum_exec_runtime += t->se.sum_exec_runtime;
+- times->sum_exec_runtime += task_sched_runtime(t);
+ times->sum_exec_runtime += tsk_seruntime(t);
} while_each_thread(tsk, t);
out:
ReplyDelete
Replies
graysky20 October 2011 at 06:34
Ack... I just realized this problem affects the bfs patch of ck1, in other words, 3.0-sched-bfs-413.patch as well.
ReplyDelete
Replies
Unknown20 October 2011 at 07:44
The patch that affects BFS is quite small. Here is its description:

http://www.spinics.net/lists/stable-commits/msg13688.html
ReplyDelete
Replies
graysky20 October 2011 at 08:32
@RealNC - Thanks for that. I want to hear from CK to be sure it's "right."
ReplyDelete
Replies
Anonymous20 October 2011 at 20:29
Is anyone willing to put up .debs for Ubuntu 11.10?
ReplyDelete
Replies
graysky21 October 2011 at 06:58
Hi Con - can you make a ruling for me? There are two different patches that seem to work for linux-3.0.6 + bfs v0.413 and I am wondering which is more appropriate given up stream's change and your original bfs v0.314.

Here is upstream with the bolded line being what's new from v3.0.6
http://www.spinics.net/lists/stable-commits/msg13688.html

kernel/posix-cpu-timers.c
...around line 244...

rcu_read_lock();
/* make sure we can trust tsk->thread_group list */
if (!likely(pid_alive(tsk)))
goto out;

t = tsk;
do {
times->utime = cputime_add(times->utime, t->utime);
times->stime = cputime_add(times->stime, t->stime);
times->sum_exec_runtime += task_sched_runtime(t);
} while_each_thread(tsk, t);
out:
rcu_read_unlock();

Your bfs wants to remove the italic line and replace is with "times->sum_exec_runtime += tsk_seruntime(t);"

So my question for you is, which is the correct line?

Upstream: times->sum_exec_runtime += task_sched_runtime(t);
BFS v0.413: times->sum_exec_runtime += tsk_seruntime(t);

Intentionally cross-posted to your blog so others and learn from your recommendation.
ReplyDelete
Replies
graysky21 October 2011 at 07:51
Dammit, I messed up my previous post. To boil it down. Which of the following 2 potential patches to bfs is correct given upstream's change to this file?

http://www.spinics.net/lists/stable-commits/msg13688.html

Here is patched version #1 of your bfs v0.413, HUNK #1

Index: linux-3.0.0-bfs/kernel/posix-cpu-timers.c
===================================================================
--- linux-3.0.0-bfs.orig/kernel/posix-cpu-timers.c 2011-09-24 13:23:32.801284829 +1000
+++ linux-3.0.0-bfs/kernel/posix-cpu-timers.c 2011-09-24 13:29:06.497284777 +1000
@@ -250,7 +250,7 @@ void thread_group_cputime(struct task_st
do {
times->utime = cputime_add(times->utime, t->utime);
times->stime = cputime_add(times->stime, t->stime);
- times->sum_exec_runtime += task_sched_runtime(t);
+ times->sum_exec_runtime += tsk_seruntime(t);
} while_each_thread(tsk, t);
out:
rcu_read_unlock();

Here is patched version #2 of your bfs v0.413, HUNK #1

Index: linux-3.0.0-bfs/kernel/posix-cpu-timers.c
===================================================================
--- linux-3.0.0-bfs.orig/kernel/posix-cpu-timers.c 2011-09-24 13:23:32.801284829 +1000
+++ linux-3.0.0-bfs/kernel/posix-cpu-timers.c 2011-09-24 13:29:06.497284777 +1000
@@ -250,7 +250,7 @@ void thread_group_cputime(struct task_st
do {
times->utime = cputime_add(times->utime, t->utime);
times->stime = cputime_add(times->stime, t->stime);
- times->sum_exec_runtime += task_sched_runtime(t);
+ times->sum_exec_runtime += task_sched_runtime(t);
} while_each_thread(tsk, t);
out:
rcu_read_unlock();
ReplyDelete
Replies
Unknown21 October 2011 at 11:18
IMO, the correct (but probably still quick'n'dirty) way is:

(Damn, blogspot doesn't allow the CODE HTML tag.)

#ifdef CONFIG_SCHED_BFS
times->sum_exec_runtime += tsk_seruntime(t);
#else
times->sum_exec_runtime += task_sched_runtime(t);
#endif

That's because tsk_seruntime() is a macro defined as ((t)->sched_time) on BFS and ((t)->se.sum_exec_runtime) on CFS.
ReplyDelete
Replies
Anonymous21 October 2011 at 19:38
What are you suggesting the line gets change to?
ReplyDelete
Replies
Unknown21 October 2011 at 20:24
BFS is using t->sched_time() instead of t->se.sum_exec_runtime(), so whatever bug there was in sum_exec_runtime() was probably not affecting BFS to begin with. Changing it to use task_sched_runtime() doesn't look right to me, since BFS does sub-tick accounting of tasks and it looks like this is one of the places where this happens.

The patch against 3.0-sched-bfs-413.patch:

http://pastebin.com/raw.php?i=LBx1r4fX
ReplyDelete
Replies
graysky22 October 2011 at 05:56
Thanks for the patch and for the explanation, RealNC. I'm no programmer so the extra commented lines in your patch confuse me. What is their significance?

CK - can you "sign off" on RNC's patch as kosher for the bfs?
ReplyDelete
Replies
Unknown22 October 2011 at 16:01
If CONFIG_SCHED_BFS is enabled in the kernel config, tsk_seruntime() will be used. Otherwise, task_sched_runtime().
ReplyDelete
Replies
graysky22 October 2011 at 20:02
RealNC- I understand the construct of the statement, but didn't think that it would be applied since it was commented out.
ReplyDelete
Replies
Unknown22 October 2011 at 22:21
These are C preprocessor directives, not comments. In C, lines beginning with # aren't comments. This isn't a shell script :-)
ReplyDelete
Replies
graysky22 October 2011 at 23:35
Thanks for the explanation.
ReplyDelete
Replies
Mau23 October 2011 at 12:09
is there any kernel patched with bfs for Ubuntu 11.10?
ReplyDelete
Replies
Ren23 October 2011 at 16:32
@ Mau~x:
There are many floating around... e.g. you can find them in user-created Ubuntu PPAs, but since configs are based on their author's preferences / (in)experience, I'd consider compiling your own.
Anyway if you want to try you can find one for kernel 3.0 at https://launchpad.net/~chogydan/+archive/ppa
or you can try PCLinuxOS, a distro that has bfs already built in the kernel.
ReplyDelete
Replies
graysky23 October 2011 at 23:09
I ran ubuntu/debian a few years ago and compiled up my own kernels following this guide: http://technowizah.com/2005/12/debian-how-to-custom-kernel-compile.html
ReplyDelete
Replies
Anonymous25 October 2011 at 22:45
3.1 is out... ;)
ReplyDelete
Replies
Anonymous29 October 2011 at 00:10
When we are gonna see BFS for 3.1
ReplyDelete
Replies
graysky1 November 2011 at 06:03
Tried to update 3.0-sched-bfs-413.patch myself and noticed that if you remove the section of the patch that corrects the documentation, the only part that gets rejected is the same bit that stopped 3.0.7 from working, namely:

remove: times->sum_exec_runtime += t->se.sum_exec_runtime;
replace with: times->sum_exec_runtime += tsk_seruntime(t);

When I made this modification to the 3.0-sched-bfs-413.patch and tried to build against the patched 3.1 tree, I got some errors upon my "make -j4 bzImage"

...

CC arch/x86/kernel/mmconf-fam10h_64.o
CC arch/x86/kernel/vsmp_64.o
AS arch/x86/kernel/head_64.o
CC mm/page_io.o
CC arch/x86/kernel/head64.o
CC arch/x86/kernel/head.o
CC arch/x86/kernel/init_task.o
LDS arch/x86/kernel/vmlinux.lds
CC mm/swap_state.o
LD arch/x86/kernel/built-in.o
CC mm/swapfile.o
CC mm/thrash.o
arch/x86/kernel/init_task.c:31:8: error: unknown field ‘fs_excl’ specified in initializer
arch/x86/kernel/init_task.c:31:8: warning: braces around scalar initializer [enabled by default]
arch/x86/kernel/init_task.c:31:8: warning: (near initialization for ‘init_task.real_cred’) [enabled by default]
make[2]: *** [arch/x86/kernel/init_task.o] Error 1
make[1]: *** [arch/x86/kernel] Error 2
make: *** [arch/x86] Error 2
CC mm/dmapool.o
CC mm/hugetlb.o
CC mm/mempolicy.o
CC mm/sparse.o
CC mm/sparse-vmemmap.o
CC mm/compaction.o
CC mm/mmu_notifier.o
CC mm/ksm.o
CC mm/slub.o
CC mm/memory_hotplug.o
CC mm/migrate.o
CC mm/huge_memory.o
CC mm/memcontrol.o
CC mm/page_cgroup.o
CC mm/memory-failure.o
CC mm/cleancache.o
LD mm/built-in.o

I'm guessing this isn't a quick fix for ck? Perhaps a new revision of bfs?
ReplyDelete
Replies
codestation2 November 2011 at 02:37
@graysky

edit include/linux/init_task.h and remove the:
".fs_excl = ATOMIC_INIT(0),"

after patching. Looks like this part of the struct was removed on 3.1. I am using bfs on 3.1.0-rc9 since 2 weeks ago and haven't been hit with a kernel bug yet.
ReplyDelete
Replies
graysky2 November 2011 at 07:12
@codestation - can't find that line in the file you mentioned. I can't find it in the 3.1 tree anywhere!

$ find include/linux -type f -exec grep "fs_excl" {} \; -print

or

$ find include/linux -type f -exec grep "ATOMIC_INIT(0)," {} \; -print
.flushing = ATOMIC_INIT(0), \
include/linux/kthread.h
struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(0), func, data }
include/linux/interrupt.h
ReplyDelete
Replies
graysky2 November 2011 at 07:15
Jeeze - I'm so stupid, it's in ck's patch itself!
ReplyDelete
Replies
graysky2 November 2011 at 07:47
Errors persist upon a make -j4 bzImage

CC init/do_mounts_initrd.o
CC init/initramfs.o
In file included from kernel/sched.c:2:0:
kernel/sched_bfs.c: In function ‘set_task_cpu’:
kernel/sched_bfs.c:1014:3: warning: passing argument 3 of ‘perf_sw_event’ makes pointer from integer without a cast [enabled by default]
include/linux/perf_event.h:1049:1: note: expected ‘struct pt_regs *’ but argument is of type ‘int’
kernel/sched_bfs.c:1014:3: warning: passing argument 4 of ‘perf_sw_event’ makes integer from pointer without a cast [enabled by default]
include/linux/perf_event.h:1049:1: note: expected ‘u64’ but argument is of type ‘void *’
kernel/sched_bfs.c:1014:3: error: too many arguments to function ‘perf_sw_event’
include/linux/perf_event.h:1049:1: note: declared here
kernel/sched_bfs.c: In function ‘finish_task_switch’:
kernel/sched_bfs.c:1876:2: error: too few arguments to function ‘perf_event_task_sched_in’
include/linux/perf_event.h:1064:20: note: declared here
kernel/sched_bfs.c: In function ‘sched_init’:
kernel/sched_bfs.c:6900:2: error: implicit declaration of function ‘plist_head_init_raw’ [-Werror=implicit-function-declaration]
cc1: some warnings being treated as errors

make[1]: *** [kernel/sched.o] Error 1
make: *** [kernel] Error 2
make: *** Waiting for unfinished jobs....
ReplyDelete
Replies
codestation3 November 2011 at 05:40
sorry, i thought that you had fixed those too, here is my patch to be applied to 3.0-sched-bfs-413.patch, although i recommend to wait for ck to post a proper patch.

http://pastebin.com/0yBW0k13
ReplyDelete
Replies
graysky3 November 2011 at 07:08
I didn't fix shit apparently :) Thanks for your patch. It compiled up just fine. You said that you have been running with this patch for several weeks. Any ill effects?

You manage to patch ck1 to work with 3.1 by chance?
ReplyDelete
Replies
graysky3 November 2011 at 07:55
Nevermind. Only a few files needed some minor tweaks. This allowed me to build up a kernel just fine.

Apply to patch-3.0.0-ck1 - http://pastebin.com/Wy7T8CSg
ReplyDelete
Replies
Anonymous28 May 2013 at 17:39
Hello, I try to compile kernel 3.9.4 in ubuntu 12.10 with ck patchset and BFQ but get this error http://pastebin.com/rJSitgZb. Someonbe can help me? Thanks
ReplyDelete
Replies

Add comment