Sunday, 9 October 2011

cksort evolution

I've been tinkering with the cksort algorithm to try and improve its performance and worst case behaviour, while working on the existing design. Spending some time visualising what happens with the 3 bucket model I described, I experimented with using the high/low buckets as a pre-pass instead of being the actual sorting part. This turns a random array of data like this:

into what looks almost like a heap array in the following diagram:

Looking carefully at this, you can see that the range (lowest to highest values) of each "node" as I'm calling them, gets smaller as you progress from the bottom to the top. Also each node is a discrete sorted entity, and by combining the increasing and decreasing values into one node it halves the number of nodes generated, while being able to benefit from any existing partially arranged data in either direction.

After lots of tinkering at this point, I settled on using a merge from one node to the next, and used the narrower range of each successive node to my advantage to avoid re-scanning areas that are known to be in their final positions.

All this has done wonders to the performance of cksort in all areas, not only speeding up the sorts it was already fast at, but dramatically speeding up the random data example (20x faster) and fixing the corner case of the sawtooth overlapping data. A reminder where the inspiration for these sample data sets and graphs came from:

So here are the latest versions of the 50 value graphs that improved further:

Mostly sorted

Few unique

The code as is is faster than qsort up to 1500 uniquely random variables and then gets slower beyond that. It is faster than qsort for pretty much any number of variables for data that has any significant existing pattern/sorting or repeated data.

Again, I'm doing this purely as an intellectual exercise, but there may be some valid usage for such an algorithm. As for my comment about making the sort parallelisable, the merge component between nodes could easily be parallelised, beginning the merging before the presort component is completed, as each node comes in.

Current version (0.3) is available here:

Probably the next step is to find the upper limit on number of nodes for when it turns O(n^2) and then use the intrinsic heap like shape of it to actually generate a heap structure and use a classic heap sort.


  1. Really interesting algorithm. Keep up the good work my friend :)

  2. did you just say 'heap' or 'heapsort'?

  3. Both. You heapsort after generating a heap. I found the tipping point to be 32 nodes beyond which an O(n log n) sort is faster. Even with the heap like shape of the presort, it turns out that speeding up the generation of the heap itself does not significantly speed up heapsort, and it's better to abort out of the presort when it's going to add a lot of time. More soon. I've experimented with tipping from this algorithm into a mergesort instead.