Tuesday, 11 October 2011

cksort and O(n^2) tipping point

I've spent some time modelling the behaviour of the last cksort I posted here to investigate where and why it becomes O(n^2). It turns out the prescan which creates the nodes ends up scanning a large amount of the data repeatedly when the data is purely random. As mentioned here previously, when there are up to ~1500 unique random entries, it is faster than qsort and then gets slower. When there is already some sorted data or repetition, it remains faster even when the number of entries becomes very large. So I tried to model where the tipping point was to determine when to change the sort mode and it turns out that if it is unable to generate a node which is a decent proportion of the original data, it is likely to only generate yet smaller nodes on each pass. With the help of a bit of trial and error and weak modelling, I found that it would stay ahead of a classic O(n log n) search when it was able to reliably sort the data into 32 nodes or less. There are occasions where it is faster with more nodes, but it is always faster at less. So I integrated a classic merge sort into the sort algorithm and made it switch to classic merge sort should it fall below this threshold. This does not significantly increase the total amount of compares/swaps and then means there algorithm is always O(n log n) bound.

So the latest version of cksort now has some nice properties, being O(n) in certain ideal scenarios, being very adaptive to patterns in either direction or lots of repeated data, yet always being bound by O(n log n) in complexity.

Just about any O(n log n) algorithm could be used beyond the tipping point, but given the intrinsic nature of cksort uses a merge, it seemed intuitive to extend that to using classic merge as well. I experimented with using a heap sort once the data was pre-sorted since it almost became a heap structure naturally after the pre-sort, but 2 things became clear: First, the heap sort is actually slower than merge sort on average. And second, and more importantly, the pre-sort phase was actually the O(n^2) component in the original sort so it didn't make sense to complete this phase.

(EDIT updated version) Version 0.43:
http://ck.kolivas.org/apps/cksort/cksort.c

10k samples compared to library qsort (I checked the actual code and it's a highly optimised combination of quicksort and insertion sort). Again bear in mind there is little to no optimisation in the cksort code. Time is in microseconds:

fewunique:
qsort:  452
cksort: 114

random:
qsort:  573
cksort: 883

sorted:
qsort:  449
cksort: 34

reversed:
qsort:  451
cksort: 41

mostly sorted:
qsort:  459
cksort: 58

As you can see above, the random is still slower on cksort (but a heck of a lot faster than earlier implementations), and it seems to stay at about half the speed of qsort no matter how many samples which is good because it means it is bound by O(n log n) complexity. 10,000 samples was chosen because it accurately represented a size where the polynomial nature of an algorithm would be obvious. The reason it's half the speed of qsort is that an unoptimised merge sort does on average twice as much work as an optimised quicksort. I could reasonably just switch to a quicksort at the tipping point, but there are still some corner cases for quicksort where it becomes O(n^2) which mergesort never suffers from, and mergesort is very parallelisable if desired. I was tempted to thread the mergesort component to improve the benchmark numbers, but it would have skewed the results inappropriately.

As a generic sort there are still two issues with cksort. One is the extra space required, and at O(n) auxiliary space it is about as bad as it gets. An in-place algorithm could certainly be developed but would be tricky and require a lot more swaps during the presort phase. The tipping algorithm could be switched to a heap sort instead which works in place. The second disadvantage is that it is not stable (i.e. does not keep original order of same value entries). I'm not sure that's easy to improve on.

Again, I don't expect this will dramatically change the way anyone sorts anything but it's certainly been interesting to play with. I'll keep playing some more :)

EDIT: In response to requests for a range of numbers, here's a set of them for different types of data and number of samples, 10-100 million samples. Random (2^32), fewunique1(4 different values), fewunique2(25 different values), fewunique3(100 different values and > tipping point), mostly sorted(every 5th and 10th value swapped), already sorted and reverse sorted.




EDIT: In response to requests, this is the benchmark code based on 0.43 which is an improvement on this original post. Just edit the TOSORT value to try different sizes.

Sunday, 9 October 2011

cksort evolution

I've been tinkering with the cksort algorithm to try and improve its performance and worst case behaviour, while working on the existing design. Spending some time visualising what happens with the 3 bucket model I described, I experimented with using the high/low buckets as a pre-pass instead of being the actual sorting part. This turns a random array of data like this:


into what looks almost like a heap array in the following diagram:


Looking carefully at this, you can see that the range (lowest to highest values) of each "node" as I'm calling them, gets smaller as you progress from the bottom to the top. Also each node is a discrete sorted entity, and by combining the increasing and decreasing values into one node it halves the number of nodes generated, while being able to benefit from any existing partially arranged data in either direction.

After lots of tinkering at this point, I settled on using a merge from one node to the next, and used the narrower range of each successive node to my advantage to avoid re-scanning areas that are known to be in their final positions.

All this has done wonders to the performance of cksort in all areas, not only speeding up the sorts it was already fast at, but dramatically speeding up the random data example (20x faster) and fixing the corner case of the sawtooth overlapping data. A reminder where the inspiration for these sample data sets and graphs came from: http://sorting-algorithms.com

So here are the latest versions of the 50 value graphs that improved further:
Random


Mostly sorted


Few unique

The code as is is faster than qsort up to 1500 uniquely random variables and then gets slower beyond that. It is faster than qsort for pretty much any number of variables for data that has any significant existing pattern/sorting or repeated data.

Again, I'm doing this purely as an intellectual exercise, but there may be some valid usage for such an algorithm. As for my comment about making the sort parallelisable, the merge component between nodes could easily be parallelised, beginning the merging before the presort component is completed, as each node comes in.

Current version (0.3) is available here:  http://ck.kolivas.org/apps/cksort/cksort.c

Probably the next step is to find the upper limit on number of nodes for when it turns O(n^2) and then use the intrinsic heap like shape of it to actually generate a heap structure and use a classic heap sort.

Wednesday, 5 October 2011

Anatomy of a software engineer reject

As most of you would be aware, I'm a doctor, a specialist in anaesthesia, and only moonlight with code in my spare time. The only reason I have time to moonlight is great care for work/life balance to make sure I have enough time to tinker with the things I enjoy doing. At about August each year, it must be recruitment time at big software companies or something because each year for the last few years, I've had a couple of different companies approach me about considering a career with them. Of course with a stable career, income, work-life balance, proximal location to work and so on, you can imagine I would never have considered changing careers, especially after the investment I put into specialising. The fact is I had pretty much enjoyed my career and the intellectual stimulation it provided. But intellectual stimulation fades with time because as one of my teachers once told me "if it's still exciting in 10 years time it's because you don't know what you're doing".

Anyway this year was different because I had decided I needed an extended break from work for personal development reasons and figured I'd go through the interview process and see where things ended up. I knew full well my knowledge base was sparse to say the least, knowing great details about minute things and missing vast amounts of other general information and experience. Sure I explained that to the recruiters and in the past they've baulked at finding that out, or wanted to put me through the process and send me to $faraway_location. This time the recruiter was positive and was considering all options that might actually suit me, including the tempting carrot of suggesting working remotely (but not confirming it), so I went through the interview routine with said $software_company expecting my lack of general knowledge would be my downfall. This might help explain my sudden interest in algorithmic complexity and sorting algorithms...

And indeed I did get rejected. This is, of course, not surprising in the slightest. Now I have to go away and decide whether I want to spend lots of spare time studying all that general stuff to move house and home, family and work security to work for much less income full time in another state or country on general software. Or I could stick to working part-time as an anaesthetist and hack on whatever I like in my spare time and keep all those other things. Change always has something that attracts us in perverse ways...

Tuesday, 4 October 2011

Visualising cksort

As part of the development of my first draft for the sorting algorithm, I spent a lot of time visualising existing sorting algorithms at the excellent website http://www.sorting-algorithms.com/ . Looking at the different algorithms allowed me to think about how I might develop my own sort to try and improve performance in some of the common cases listed there and do well in areas most do badly. Until now I was unable to generate pretty graphs like those ones, but with Serge Belyshev's help I manage to get some rudimentary versions (without the nice arrows and different shading that the website above has) to see how swaps occur.

The following graphs visualise only the writes to the main array, and an update occurs only whenever a value is changed in the main array. This tends to correlate quite well with the overall number of comparisons as well so the time taken to complete in these examples should give you an idea of where cksort performs.

I would have liked to have visualised completely sorted data, but since no moves of data occur in cksort, there is nothing to show.

These graphs correspond to 50 entries of values 0-50, so the equivalent of clicking the 50 button on sorting algorithms above. Putting them in the same order as the site above here they are (note I do NOT have the same data so you cannot compare them directly, though the ballpark performance scales should be comparable):

EDIT: NOTE These graphs link samples from the latest version of  cksort which performs better than this original post described.

Random data

Nearly sorted




Reversed



Few Unique

Part of the reason I wanted to visualise these sorts was to demonstrate how cksort worked visually, but another reason was for me to examine it and see where its weakness is and how I might improve it. Interestingly, in my testing, the worst data for cksort was a sawtooth pattern as I mentioned in my last post. Furthermore, if I "heapify" the unsorted data on every pass, it gets much slower. Looking at the random data example above, it seems that with multiple passes, it intrinsically tends to heapify the data anyway, which is counterproductive. So assuming I can improve on it, this is the area I need to concentrate on. Perhaps after a few passes if the number of clustered "highs" drops to minimal, it may be better to switch to a heap sort for the remaining data. At least it's clear in the above examples where cksort does perform well!