Saturday, 24 September 2011

BFS and skip lists

Hi all.

Just a quick post to tell you all what I've been up to with BFS.

Some of you may know about Skip lists as an alternative to balanced binary search trees. They feature O(log n) insertion, lookup and removal of table entries. Anyway I've been looking for some time at the O(n) lookup of BFS (which is O(1) insertion and removal) to try and find a solution that didn't cost us at the desktop level since O(n) of small numbers of n is very fast. The problem is of course at higher numbers of n (or server type loads), where it gets linearly slower, and the cache trashing aspect of scanning linked lists becomes expensive.

So to cut a long story short, I've implemented the first draft of a custom version of skip lists for BFS in place of the O(n) lookup. The insertion remains O(log n), but by sorting all tasks realtime, iso, normal, batch and idleprio in a way they all end up on the same table, I was able to do away with the regular linked lists and the bitmap priority lookup. Then I was able to utilise one of the features of the skip lists in that the first task on the "bottom" list is always sorted to be the highest priority. This means the lookup is O(1). Then I put an extra pointer into each entry to the previous entry (the original design normally only points to the next entry). Finally I placed a pointer to the entry in the task struct as a way of reverse looking up the entry without any search.

So what I'm left with is an O(log n) insertion, O(1) lookup, and O(k) removal (k is the "height" of the node in question, up to 16, but usually only 1-4).

I've implemented the sticky task used for CPU affinity by simply comparing the last sticky task to the first entry returned from the skip list. Alas I have not yet provided a good version of the sticky task being used to improve scaling governor behaviour. This means that this will likely perform worse with the ondemand governor at this stage. On the other hand, the performance governor seems to be working very nicely in my preliminary tests.

Anyway I'd love to post more, but I need to sleep, so here's some code instead (for a BFS patched 3.0.x kernel):

Full bfs406 + skiplists patch:

Try it out, see what you think. It seems to be running safely here but as always, there are no guarantees. All going well, you should notice pretty much no difference on a desktop. If you do any throughput benchmarks comparing before/after, I'd love to hear about them.

Here's the thread about it on lkml:

Note the benchmarks showing substantial speedup in the highly contended make -j case that I mention in the thread.

Elapsed Time 28.7

Elapsed Time 28.5

Elapsed Time 27.0


  1. Quite elegant :) And interesting. Thanks. Thank you for taking the spirit of open source in the admirable direction you're traveling, Con.

  2. I fear I don't know much about binary search trees, but you say skip lists are faster for small numbers of O(n), but linearly slower for higher numbers, and with about no difference on a standard desktop. I wonder what would be the benefits for bfs?

  3. The reason I settled on skip lists after much investigation, is that I can get the best of both worlds. At low numbers queued it is as fast as the current system for insertion, lookup and removal. As load grows, the insertion becomes slower than the current system (but at O(logn)) however this is offset by the fact that lookup remains O(1) and removal remains fast too. Even at desktop loads, this means equally fast at very low loads, and faster as the load gets higher.

  4. Con I had a problem with skiplist path!
    Kernel 3.0.4 patched ck1, bfq3 stable.
    With skiplist patch during an make -j2 compilation i have a lot of stuttering. In the kernel without skiplist (with the same .config) all works very fast as normal with bfs...
    If you need my .config or i have to do some other test ask without any problem. RESPECT!!! :)

    ps:my laptop:hp g62, turion2, 4 gb ram,ati hd4200

  5. Thanks. Were you running make as SCHED_IDLEPRIO? There is a bug in the skip list implementation that would make it misbehave.

  6. I dont know what is SCHED_IDLEPRIO but i dont find it in my kernel configuration.
    I try even with bfs410 but i have always great stuttering problems during compilation.
    Have mail with my .config

  7. PI : a Parallel in-memory skip list , has some repful discussion on optimization of skip list.

    1. Thanks for that interesting link. However what they're trying to optimise are look ups in the skip list. In my implementation for BFS and MuQSS, there is no look up overhead at all as we're only ever interested in the highest priority entry which means we always pick off the very first entry in an O(1) manner. There also is no need for balancing, tearing down and building up the lists since they're static in array size. These optimisations mean my skip list implementation can basically only be used by the scheduler.