-ck hacking: sorting

Showing posts with label sorting. Show all posts

Tuesday, 4 October 2011

Visualising cksort

As part of the development of my first draft for the sorting algorithm, I spent a lot of time visualising existing sorting algorithms at the excellent website http://www.sorting-algorithms.com/ . Looking at the different algorithms allowed me to think about how I might develop my own sort to try and improve performance in some of the common cases listed there and do well in areas most do badly. Until now I was unable to generate pretty graphs like those ones, but with Serge Belyshev's help I manage to get some rudimentary versions (without the nice arrows and different shading that the website above has) to see how swaps occur.

The following graphs visualise only the writes to the main array, and an update occurs only whenever a value is changed in the main array. This tends to correlate quite well with the overall number of comparisons as well so the time taken to complete in these examples should give you an idea of where cksort performs.

I would have liked to have visualised completely sorted data, but since no moves of data occur in cksort, there is nothing to show.

These graphs correspond to 50 entries of values 0-50, so the equivalent of clicking the 50 button on sorting algorithms above. Putting them in the same order as the site above here they are (note I do NOT have the same data so you cannot compare them directly, though the ballpark performance scales should be comparable):

EDIT: NOTE These graphs link samples from the latest version of cksort which performs better than this original post described.

Random data

Nearly sorted

Reversed

Few Unique

Part of the reason I wanted to visualise these sorts was to demonstrate how cksort worked visually, but another reason was for me to examine it and see where its weakness is and how I might improve it. Interestingly, in my testing, the worst data for cksort was a sawtooth pattern as I mentioned in my last post. Furthermore, if I "heapify" the unsorted data on every pass, it gets much slower. Looking at the random data example above, it seems that with multiple passes, it intrinsically tends to heapify the data anyway, which is counterproductive. So assuming I can improve on it, this is the area I need to concentrate on. Perhaps after a few passes if the number of clustered "highs" drops to minimal, it may be better to switch to a heap sort for the remaining data. At least it's clear in the above examples where cksort does perform well!

Sunday, 2 October 2011

cksort follow up.

I've spent a few days playing with cksort to try and improve it with behaviour that I was planning on, and to try and categorise exactly what sort of sort (heh) it is and to analyse its algorithmic complexity. I know people haven't found this as interesting as my usual stuff, but I feel like blogging it anyway, and maybe some use will come of this code.

cksort v0.21:
http://ck.kolivas.org/apps/cksort/cksort.c
To describe the sort, it's an extension of a selection sort using external storage to substantially improve the speed of sorting certain forms of data and the worst case scenario.

I'll use this sample of 20 for simplicity:

63 98 22 30 84 73 23 51 10 57 04 73 74 11 16 27 65 22 25 25

It starts by scanning over the data linearly, sorting data into 3 buckets:
The first bucket contains the first entry, and then collects any entries equal to or greater than the last entry put into it.

63 98

The second bucket contains the first item rejected by the 1st bucket and then collects any entries equal to or smaller than the last entry put into it.

22 10 04

The last bucket stores any data not falling into either of the previous categories (unsorted).

30 84 73 23 51 57 73 74 11 16 27 65

To minimise the amount of extra storage required, only one extra buffer equal in size to the original data is used, and the data from buckets 1 and 2 are placed at the beginning and end of the same buffer, while the unsorted data gets compacted in the original storage.

The highest and lowest values in the unsorted are store for comparison.

Then the main data is reconstructed as follows:

Any values from bucket 2 below the lowest value from unsorted are put at the beginning of the main buffer and considered in their final position and will not be scanned again.
Then all unsorted values are next in line.
Then the remainder of bucket 2 data is placed in the main buffer in ascending order.
Then bucket 1 is placed in ascending order, and any that are larger than the highest value from unsorted are considered in their final position.

04 10 73 23 51 57 73 74 11 16 27 65 22 25 25 30 84 22 63 98

Then the scan is repeated in ever decreasing sections of the middle until none are left.

04 10 11 57 16 27 65 22 25 25 30 22 63 51 23 73 73 74 84 98
04 10 11 16 22 25 25 30 22 63 51 23 27 57 65 73 73 74 84 98
04 10 11 16 51 23 27 57 22 22 25 25 30 63 65 73 73 74 84 98
04 10 11 16 22 22 23 30 27 25 25 51 57 63 65 73 73 74 84 98
04 10 11 16 22 22 23 25 25 27 30 51 57 63 65 73 73 74 84 98

Trying to categorise this was difficult for a while but fortunately I discussed it with a few friends online and we were able to come up with something.

So in this overcrowded world of >9000 sorts, there needs to be some advantage or else it's a useless addition. Now I'm not claiming this is going to be something special but I'm doing this to fulfil some intellectual curiosity. Now please, I'm not that good with the algorithmic complexity details, so if I've described something wrong, please feel free to correct me (because no one ever does that on the internet!)

The advantages are with how it manages partially sorted data, and repeats of data. A set of fully sorted data in ascending order takes one pass, and after the data is placed into the backup array, it can abort out without any further swaps, so the comparisons and swaps equal n with one pass. O(n).

When data is already sorted in reverse order, the 2nd bucket picks it all up and places it in correct order. This is one set of comparisons per n, and 2 swaps per entry, O(n). This is somewhere very few sort algorithms do well.

When there are a limited number of values the variables can take, the number of passes cksort spends is bound by half that number of variables, no matter how many entries there are. e.g. if you have 1 million entries of values 0-100, it will only ever take up to 50 passes, needing ~25 million comparisons. EDIT: This means it's actually O(k) where k is the number of unique entries.

When data is completely random, there is some kind of function describing its relationship, and it is significantly less than (absolutely) n^2 but the algorithmic complexity is O(n^2). For example, 100 unique random entries take on average 1800 compares which is a lot less than 10,000 but does scale in polynomial fashion as the number of n increases.

The exact worst case scenario is actually well defined as a sawtooth pattern of input values with lots of crossover but no repeat values (e.g. 100, 97, 98, 95, 96...) and is also O(n^2) and 100 unique values takes 2424 compares.

How to describe average I find quite hard, because real data usually has either repeats or has sections of partially ordered variables. So depending on how random the data is, it will tend from O(n)->O(n^2). Strictly speaking, most documented "averages" are based on random data.

So, finally how do the benchmarks stack up compared to the industry standard qsort? It's important to note that no effort was really made to optimise the code but the coarse scales should give you an idea of performance. Note that qsort is often thought of as being quicksort, but the implementation is entirely library dependent. This one may be a heap sort given it takes the same time no matter what is fed to it. A quicksort would easily have hit O(n^2) with some of these examples:

Time is in microseconds on a 3GHz core 2 with work bound to one CPU:

All are 10,000 entries:
Data already sorted 0 - 10000
qsort: 659
cksort: 40

Data sorted in reverse order 10000 - 0

qsort: 646

cksort: 61

Data sorted interleaved rising and falling: (0, 100, 1, 99, 2, 98...)

qsort: 651

cksort: 96

Data random with repeats every 2nd: (e.g. 54, 54, 99, 99, 33, 33...)
qsort: 643
cksort: 37

Data with 2 rising values and then a random: (e.g.: 1, 2, 54, 4, 5, 65...)

qsort: 653

cksort: 102

Random data, bound to 100 different values:

qsort: 642

cksort: 781

And now to real suckage, purely random unique data:

qsort: 658

cksort: 51770

For completeness I tried to find the crossover in n for when truly random data performed the same and it was ~475 entries.
qsort: 153
cksort: 155

So it's been a fun experiment and there's no doubt that with certain data types it could actually be quite useful, such as small data sets, partially ordered data sets, or numerous repeats, but the worst case scenario is clearly unacceptable for a general purpose algorithm if truly random data is possible. In summary, its major advantage is its extreme adaptability (adaptive algorithm), but the other disadvantages are that it is not stable, does not operate in place, and may have many swaps. cksort, like BFS, is a tool that can be useful for specific tasks. As I saw with BFS, the right algorithmic complexity in the right place is better than something that just "sounds better".

Now I wonder if I'll be inspired to try and design an O(n log n) sorting algorithm that can be parallelised...

EDIT: In response to questions about the license in the code, the GPL v3 license in the sample code is just for the sample code, not the algorithm.

EDIT2: It's worth noting that I'm still experimenting with this algorithm and hopefully I'll be able to improve it further.

Friday, 30 September 2011

cksort

I've been playing with sorting algorithms lately and wondered if the landscape of a billion out there had room for one more. I tried implementing many of the better algorithms in code and found it very hard to actually generate working code for anything beyond the most simple of selection sorts. So I set out to create my own sorting algorithm that sort of kind of worked half decently but wasn't too hard to implement. Now I'm sure some of the more complex algorithms do significantly better, and this does have some disadvantages, like the space requirements and lots of writes, but its best performance is very good and its average is decent. I have yet to find any particular data it performs badly on. However I'm not very good with the whole mathematical analysis side so I figured I'd post it here and hopefully get comments from people who are good there.

I don't really know how to describe it, but it's sort of a selection sort that generates two sorted lists at the end, and part of the last list is the completely sorted component.

The original design did not have the bidirectional component, just putting the "high" list at the end, but this performed poorly when data is in the reverse order, whereas this one performs just as well with data already mostly sorted in either direction.

Note that almost no effort has gone into code optimisation, just to make sure it works and to calculate how many iterations and swaps it does and so on.

http://ck.kolivas.org/apps/cksort/cksort.c

EDIT: Note v0.21. This is significantly faster than the original one I posted working bidirectionally requiring less comparisons and swaps on average and worst case.

/* Copyright 2011, Con Kolivas
*
* cksort v0.21
*
* Gnu Public License V3.
*
* cksort: A very simple sort algorithm based on selection sort but sorts
* successive chunks of entries and scans less on each pass based on
* knowledge of how much has been successfully sorted, and does a low and high
* directional search to avoid worst case of reversed data. Uses an extra array
* equal in size to the original data to do this.
* Adaptive, not stable, best performance O(n), worst better than O(n^2) but
* unknown. The average is unknown but seems to be ~ O(n^(3/2)) for random data.
* It becomes faster the more sorted the original data is - even if it's
* backwards, and the smaller the spread of data.
*
* gcc -o cksort -Wall -O2 cksort.c
*/

#include
#include
#include

#define TOSORT 50

typedef int32_t s32;

s32 rands[TOSORT], a[TOSORT];

#define s32min -2147483647
#define s32max 2147483647

void show(void)
{
   int i;

   for (i = 0; i < TOSORT; i++)
       printf("%02d ", rands[i]);
   printf("\n");
}

int total_n, total_passes, swaps, fromsort, tosort;

int sort(void)
{
   int i, j, ai = 0, bi = 0, unsorted_end;
   s32 max = s32min, minmax = s32min;
   s32 min = s32max, maxmin = s32max;

   j = fromsort;
   /* Do a sequential sort, finding ever increasing values and
   * store them in a[] */
   for (i = fromsort; i < tosort; i++) {
       total_n++;
       if (rands[i] >= max) {
           /* Store the grouped maximums at the bottom of the a
           * array */
           max = a[ai++] = rands[i];
           if (max < maxmin)
               maxmin = max;
           swaps++;
       } else if (rands[i] <= min) {
           /* Store the grouped minimums at the top of the a
           * array */
           min = a[TOSORT - 1 - bi++] = rands[i];
           swaps++;
       } else {
           /* Group any unsorted values at the low end */
           rands[j++] = rands[i];
           swaps++;
           /* Find the largest value beyond which all data will
           * be sorted on the next pass */
           if (rands[i] > minmax)
               minmax = rands[i];
           if (rands[i] < maxmin)
               maxmin = rands[i];
       }
   }

   /* The only entries left are all sorted already */
   if (ai == tosort - fromsort)
       return 0;

   if (maxmin == s32max)
       maxmin = s32min;
   if (minmax == s32min)
       minmax = s32max;

   unsorted_end = j;
   /* Place the sorted group of low values immediately after the unsorted
   * group. This avoids small values being picked up as maximums on the
   * next pass. */
   for (i = TOSORT - bi; i < TOSORT; i++) {
       swaps++;
       /* Everything less than maxmin will be the lowest sorted values
       * and can be put in their final positions */
       if (a[i] < maxmin) {
           if (fromsort < unsorted_end) {
               s32 swap = rands[fromsort];

               swaps++;
               rands[j] = swap;
               unsorted_end++;
           }
           rands[fromsort++] = a[i];
       } else
           rands[j] = a[i];
       j++;
   }

   /* Place the sorted group of high values after the low values. After
   * the minmax value, these are now in their final positions. */
   for (i = 0; i < ai; i++) {
       swaps++;
       rands[j] = a[i];
       /* Everything greater than minmax will be the highest sorted
       * values and can be put in their final positions */
       if (a[i] >= minmax) {
           tosort = j;
           minmax = s32max;
       }
       j++;
   }
   total_passes++;
   return tosort - fromsort;
}

int main(void)
{
   struct timeval tv;
   int i;

   gettimeofday(&tv, NULL);
   srand((unsigned int)tv.tv_sec);
   for (i = 0; i < TOSORT; i++)
       rands[i] = rand() % 100;

   show();
   tosort = TOSORT;

   while (sort())
       show();

   show();

   printf("%d passes, %d comparisons, %d swaps\n", total_passes, total_n, swaps);

   return 0;
}

Sample output from 50 random variables 0-100:
07 70 66 93 18 71 59 82 30 17 25 15 24 87 66 13 29 10 05 19 34 20 75 53 33 88 79 19 99 40 94 59 62 60 52 80 32 11 14 62 81 40 30 05 79 96 70 09 59 75

[snip]

05 05 07 09 10 11 13 14 15 17 18 19 19 20 24 25 29 30 30 32 33 34 40 40 52 53 59 59 59 60 62 62 66 66 70 70 71 75 75 79 79 80 81 82 87 88 93 94 96 99
13 passes, 360 comparisons, 503 swaps

EDIT: Further testing confirms it's likely to be O(n log n) worst case performance which is exceedingly rare with any real data set. I still haven't accurately categorised the average performance but it's a lot better even with random data, and of course it depends very much on what sort of data it sorts. Any existing order is seized upon and it benefits from it, even if there are for example interleaved increasing and decreasing data sets. I have not found evidence that I have simply "reinvented" an existing algorithm.