Just a quick post to tell you all what I've been up to with BFS.
Some of you may know about Skip lists as an alternative to balanced binary search trees. They feature O(log n) insertion, lookup and removal of table entries. Anyway I've been looking for some time at the O(n) lookup of BFS (which is O(1) insertion and removal) to try and find a solution that didn't cost us at the desktop level since O(n) of small numbers of n is very fast. The problem is of course at higher numbers of n (or server type loads), where it gets linearly slower, and the cache trashing aspect of scanning linked lists becomes expensive.
So to cut a long story short, I've implemented the first draft of a custom version of skip lists for BFS in place of the O(n) lookup. The insertion remains O(log n), but by sorting all tasks realtime, iso, normal, batch and idleprio in a way they all end up on the same table, I was able to do away with the regular linked lists and the bitmap priority lookup. Then I was able to utilise one of the features of the skip lists in that the first task on the "bottom" list is always sorted to be the highest priority. This means the lookup is O(1). Then I put an extra pointer into each entry to the previous entry (the original design normally only points to the next entry). Finally I placed a pointer to the entry in the task struct as a way of reverse looking up the entry without any search.
So what I'm left with is an O(log n) insertion, O(1) lookup, and O(k) removal (k is the "height" of the node in question, up to 16, but usually only 1-4).
I've implemented the sticky task used for CPU affinity by simply comparing the last sticky task to the first entry returned from the skip list. Alas I have not yet provided a good version of the sticky task being used to improve scaling governor behaviour. This means that this will likely perform worse with the ondemand governor at this stage. On the other hand, the performance governor seems to be working very nicely in my preliminary tests.
Anyway I'd love to post more, but I need to sleep, so here's some code instead (for a BFS patched 3.0.x kernel):
Full bfs406 + skiplists patch:
Try it out, see what you think. It seems to be running safely here but as always, there are no guarantees. All going well, you should notice pretty much no difference on a desktop. If you do any throughput benchmarks comparing before/after, I'd love to hear about them.
Here's the thread about it on lkml: https://lkml.org/lkml/2011/9/23/371
Note the benchmarks showing substantial speedup in the highly contended make -j case that I mention in the thread.
Elapsed Time 28.7
Elapsed Time 28.5
Elapsed Time 27.0