Tuesday, 11 October 2011

cksort and O(n^2) tipping point

I've spent some time modelling the behaviour of the last cksort I posted here to investigate where and why it becomes O(n^2). It turns out the prescan which creates the nodes ends up scanning a large amount of the data repeatedly when the data is purely random. As mentioned here previously, when there are up to ~1500 unique random entries, it is faster than qsort and then gets slower. When there is already some sorted data or repetition, it remains faster even when the number of entries becomes very large. So I tried to model where the tipping point was to determine when to change the sort mode and it turns out that if it is unable to generate a node which is a decent proportion of the original data, it is likely to only generate yet smaller nodes on each pass. With the help of a bit of trial and error and weak modelling, I found that it would stay ahead of a classic O(n log n) search when it was able to reliably sort the data into 32 nodes or less. There are occasions where it is faster with more nodes, but it is always faster at less. So I integrated a classic merge sort into the sort algorithm and made it switch to classic merge sort should it fall below this threshold. This does not significantly increase the total amount of compares/swaps and then means there algorithm is always O(n log n) bound.

So the latest version of cksort now has some nice properties, being O(n) in certain ideal scenarios, being very adaptive to patterns in either direction or lots of repeated data, yet always being bound by O(n log n) in complexity.

Just about any O(n log n) algorithm could be used beyond the tipping point, but given the intrinsic nature of cksort uses a merge, it seemed intuitive to extend that to using classic merge as well. I experimented with using a heap sort once the data was pre-sorted since it almost became a heap structure naturally after the pre-sort, but 2 things became clear: First, the heap sort is actually slower than merge sort on average. And second, and more importantly, the pre-sort phase was actually the O(n^2) component in the original sort so it didn't make sense to complete this phase.

(EDIT updated version) Version 0.43:
http://ck.kolivas.org/apps/cksort/cksort.c

10k samples compared to library qsort (I checked the actual code and it's a highly optimised combination of quicksort and insertion sort). Again bear in mind there is little to no optimisation in the cksort code. Time is in microseconds:

fewunique:
qsort:  452
cksort: 114

random:
qsort:  573
cksort: 883

sorted:
qsort:  449
cksort: 34

reversed:
qsort:  451
cksort: 41

mostly sorted:
qsort:  459
cksort: 58

As you can see above, the random is still slower on cksort (but a heck of a lot faster than earlier implementations), and it seems to stay at about half the speed of qsort no matter how many samples which is good because it means it is bound by O(n log n) complexity. 10,000 samples was chosen because it accurately represented a size where the polynomial nature of an algorithm would be obvious. The reason it's half the speed of qsort is that an unoptimised merge sort does on average twice as much work as an optimised quicksort. I could reasonably just switch to a quicksort at the tipping point, but there are still some corner cases for quicksort where it becomes O(n^2) which mergesort never suffers from, and mergesort is very parallelisable if desired. I was tempted to thread the mergesort component to improve the benchmark numbers, but it would have skewed the results inappropriately.

As a generic sort there are still two issues with cksort. One is the extra space required, and at O(n) auxiliary space it is about as bad as it gets. An in-place algorithm could certainly be developed but would be tricky and require a lot more swaps during the presort phase. The tipping algorithm could be switched to a heap sort instead which works in place. The second disadvantage is that it is not stable (i.e. does not keep original order of same value entries). I'm not sure that's easy to improve on.

Again, I don't expect this will dramatically change the way anyone sorts anything but it's certainly been interesting to play with. I'll keep playing some more :)

EDIT: In response to requests for a range of numbers, here's a set of them for different types of data and number of samples, 10-100 million samples. Random (2^32), fewunique1(4 different values), fewunique2(25 different values), fewunique3(100 different values and > tipping point), mostly sorted(every 5th and 10th value swapped), already sorted and reverse sorted.




EDIT: In response to requests, this is the benchmark code based on 0.43 which is an improvement on this original post. Just edit the TOSORT value to try different sizes.

8 comments:

  1. have you done some benchmark comparing with timsort? http://en.wikipedia.org/wiki/Timsort

    ReplyDelete
  2. Not yet, but I was actually considering adding something like the runs of data aspect from timsort. This is by no means a completed project :)

    ReplyDelete
  3. This is great stuff. Keep sharing, it is appreciated. I've been interested in programming for a long time. This may just inspire me to get started!

    Galen

    ReplyDelete
  4. Would you mind sharing the benchmark code you use to compare cksort with other sorting variants ? Regards

    ReplyDelete
  5. Sure. Link will be placed in main post as well. Note it's based on cksort 0.43 which is further improved:

    http://ck.kolivas.org/apps/cksort/benchsort.c

    ReplyDelete
  6. Some good points about sorts here, especially the comments http://seven-degrees-of-freedom.blogspot.com/2010/07/question-of-sorts.html

    ReplyDelete