I don't really know how to describe it, but it's sort of a selection sort that generates two sorted lists at the end, and part of the last list is the completely sorted component.

The original design did not have the bidirectional component, just putting the "high" list at the end, but this performed poorly when data is in the reverse order, whereas this one performs just as well with data already mostly sorted in either direction.

Note that almost no effort has gone into code optimisation, just to make sure it works and to calculate how many iterations and swaps it does and so on.

http://ck.kolivas.org/apps/cksort/cksort.c

EDIT: Note v0.21. This is significantly faster than the original one I posted working bidirectionally requiring less comparisons and swaps on average and worst case.

/* Copyright 2011, Con Kolivas

*

* cksort v0.21

*

* Gnu Public License V3.

*

* cksort: A very simple sort algorithm based on selection sort but sorts

* successive chunks of entries and scans less on each pass based on

* knowledge of how much has been successfully sorted, and does a low and high

* directional search to avoid worst case of reversed data. Uses an extra array

* equal in size to the original data to do this.

* Adaptive, not stable, best performance O(n), worst better than O(n^2) but

* unknown. The average is unknown but seems to be ~ O(n^(3/2)) for random data.

* It becomes faster the more sorted the original data is - even if it's

* backwards, and the smaller the spread of data.

*

* gcc -o cksort -Wall -O2 cksort.c

*/

#include

#include

#include

#define TOSORT 50

typedef int32_t s32;

s32 rands[TOSORT], a[TOSORT];

#define s32min -2147483647

#define s32max 2147483647

void show(void)

{

int i;

for (i = 0; i < TOSORT; i++)

printf("%02d ", rands[i]);

printf("\n");

}

int total_n, total_passes, swaps, fromsort, tosort;

int sort(void)

{

int i, j, ai = 0, bi = 0, unsorted_end;

s32 max = s32min, minmax = s32min;

s32 min = s32max, maxmin = s32max;

j = fromsort;

/* Do a sequential sort, finding ever increasing values and

* store them in a[] */

for (i = fromsort; i < tosort; i++) {

total_n++;

if (rands[i] >= max) {

/* Store the grouped maximums at the bottom of the a

* array */

max = a[ai++] = rands[i];

if (max < maxmin)

maxmin = max;

swaps++;

} else if (rands[i] <= min) {

/* Store the grouped minimums at the top of the a

* array */

min = a[TOSORT - 1 - bi++] = rands[i];

swaps++;

} else {

/* Group any unsorted values at the low end */

rands[j++] = rands[i];

swaps++;

/* Find the largest value beyond which all data will

* be sorted on the next pass */

if (rands[i] > minmax)

minmax = rands[i];

if (rands[i] < maxmin)

maxmin = rands[i];

}

}

/* The only entries left are all sorted already */

if (ai == tosort - fromsort)

return 0;

if (maxmin == s32max)

maxmin = s32min;

if (minmax == s32min)

minmax = s32max;

unsorted_end = j;

/* Place the sorted group of low values immediately after the unsorted

* group. This avoids small values being picked up as maximums on the

* next pass. */

for (i = TOSORT - bi; i < TOSORT; i++) {

swaps++;

/* Everything less than maxmin will be the lowest sorted values

* and can be put in their final positions */

if (a[i] < maxmin) {

if (fromsort < unsorted_end) {

s32 swap = rands[fromsort];

swaps++;

rands[j] = swap;

unsorted_end++;

}

rands[fromsort++] = a[i];

} else

rands[j] = a[i];

j++;

}

/* Place the sorted group of high values after the low values. After

* the minmax value, these are now in their final positions. */

for (i = 0; i < ai; i++) {

swaps++;

rands[j] = a[i];

/* Everything greater than minmax will be the highest sorted

* values and can be put in their final positions */

if (a[i] >= minmax) {

tosort = j;

minmax = s32max;

}

j++;

}

total_passes++;

return tosort - fromsort;

}

int main(void)

{

struct timeval tv;

int i;

gettimeofday(&tv, NULL);

srand((unsigned int)tv.tv_sec);

for (i = 0; i < TOSORT; i++)

rands[i] = rand() % 100;

show();

tosort = TOSORT;

while (sort())

show();

show();

printf("%d passes, %d comparisons, %d swaps\n", total_passes, total_n, swaps);

return 0;

}

Sample output from 50 random variables 0-100:

07 70 66 93 18 71 59 82 30 17 25 15 24 87 66 13 29 10 05 19 34 20 75 53 33 88 79 19 99 40 94 59 62 60 52 80 32 11 14 62 81 40 30 05 79 96 70 09 59 75

[snip]

05 05 07 09 10 11 13 14 15 17 18 19 19 20 24 25 29 30 30 32 33 34 40 40 52 53 59 59 59 60 62 62 66 66 70 70 71 75 75 79 79 80 81 82 87 88 93 94 96 99

13 passes, 360 comparisons, 503 swaps

EDIT: Further testing confirms it's likely to be O(n log n) worst case performance which is exceedingly rare with any real data set. I still haven't accurately categorised the average performance but it's a lot better even with random data, and of course it depends very much on what sort of data it sorts. Any existing order is seized upon and it benefits from it, even if there are for example interleaved increasing and decreasing data sets. I have not found evidence that I have simply "reinvented" an existing algorithm.

(Ralph Ulrich) I am not into it but pretty sure you didn't find any new algorithm:

ReplyDeleteSince 50 years there are millions of students whos first academic exercise is to program their "own" sorting application. Every one of them hungry for fame ...

Now the readers challenge of this blog: What is the real academic name of cksort?

That's a very good question and probably just as important as trying to categorise it: WTF is it?

ReplyDelete