TL;DR: Minor unnoticeable tweaks for BFS.
Incremental:
3.0-bfs-406-413.patch
Incremental for 3.0.x-ck1:
3.0-ck1-bfs-406-413.patch
Full patch for 3.0.x:
3.0-sched-bfs-413.patch
Alas there isn't anything earth shattering here. The skiplist implementation experiment failed to be of any benefit despite my best intentions, so I've abandoned the idea for the time being. Meanwhile, there were a number of minor improvements that I had considered for BFS (and I mean really really minor) and a few cleanups that had been waiting. So here's the changelog for the new BFS for 3.0:
Improve the preempt call by explicitly setting a task that will be used on lookup in earliest_deadline_task to avoid iterating over all tasks. Set the effective runqueue priority for further calls to try_preempt based on the preempting task's priority. Thus on the small chance that something else tries to preempt the runqueue before the new tasks gets the CPU, it will compare to the previously successful preempting task. This should avoid a preempting task from rescheduling a runqueue and then the CPU deciding to take a different task instead of the preempting one. Clean up a number of variables from unnecessary longs to ints and ints to bools. Microoptimise around try preempt by settings highest_prio to that of the task that's trying to preempt to avoid unnecessary comparisons. Break sole affinity on cpu offline -after- the cpu has been set to be offline.
I'd be surprised if anyone noticed anything different with these changes, but then, there have been some surprises from apparently innocuous changes in the past, so I'm keen to get feedback from testers. There is a small chance that the preempt changes can improve throughput by avoiding unnecessary rescheduling, and that the cpu offline changes might help rare scenarios where a machine will fail to suspend to ram.
Meanwhile, graysky who has been maintaining the unofficial ck repos for archlinux has very kindly put together a number of throughput benchmarks comparing BFS to mainline on machines from 2x up to 16x in a very nicely put together PDF which you can download and read here:
repo-ck.com/bench/benchmark.pdf
On the cksort front, it's clear to me that sorting algorithms are pretty boring to most people, despite me finding transient interest in them and developing my own. Nevertheless, the cksort algorithm has been further improved. After playing with it some more, and learning from the legendary work of the past, I modified the merge sort that cksort "tips" to to include an insertion sort as well, making the algorithm now a blended insertion+merge sort once it tips away from the presort+nodemerge algorithm which is basically the only unique part of cksort. This capitalises on the fact that insertion sort is pretty much the fastest way to sort up to 32 different entries by splitting up the merge into up-to-32 entry sized groups and doing insertion sort on all of them, followed by using a non-recursive merge sort on each group.
This has further improved cksort to now be faster than qsort for up to 1500 entries, and now equally fast with library qsort() up to 10000 (uniquely random)entries. Beyond that it is still slower than qsort, but it has -no- corner cases where it will ever become O(n^2) and will always be O(n log n) bound, unlike qsort.
Version 0.44 of cksort is here:
http://ck.kolivas.org/apps/cksort/cksort.c
And in response to requests for the benchmarking code vs qsort it is here:
http://ck.kolivas.org/apps/cksort/benchsort.c
Now Aldo Cortesi has a very interesting static view of comparison sort algorithms here:
http://sortvis.org/
After contacting him, he was kind enough to modify his code such that it would take output from something like the cksort to generate a graph from any sorting algorithm. Alas cksort uses a separate array for its work and looking at the array of the final data does not tell the entire story of what's going on, but it does help further to visualise how cksort works.
Here is a view of 20 random variables being sorted and you can see the 2 phases of cksort where it builds up the nodes first, and the 2nd phase where it merges the nodes (click to enlarge):
It's important to note that the 2nd stage graph is fairly arbitrary since the data from the unmerged nodes is not actually written back to the original array until it's merged, but you should be able to make out the final sorted data appearing from the outside in.
Sorting non-random entries is really what makes cksort interesting and here is a (rather complex) visualisation of the 50 mostly-sorted example in the animated form I showed on my blog previously:
Anyway it's been a lot of fun playing with this stuff, and maybe one day I'll come up with something more useful from the algorithm. I played with an in-place version of the code which didn't need a second array and the amount of memory moving required made the algorithm slower and more complex so I soon lost interest in developing that for the time being. I'd like to find a way to make the presort+node merging aspect scale to any size instead of tipping to a different algorithm but the lack of any pattern in large random data sets means you can't really find a pattern in it. I still think there may be something like a heap structure that would help it but haven't found anything that works in practice. Maybe I'll come back to this in the future. Meanwhile it works extremely well for the smaller data sets, partially ordered data, reverse ordered data, and few unique data sets. Interestingly it coincidentally shares a lot in its ideas with Timsort. All I can say after all this is that the work of the legends of the past, and coming up with something like Smoothsort is truly amazing.