Combsort: Simple but fast and small sorting

Naive sorting and its pitfalls

Most young programmers of today never have to learn anything about sorting, since there is always library or class functions available for sorting just about anything. This means that whenever they find themselves actually needing to implement a sort themselves, they invariably end up inventing the worst one of them all (and among the worst of all conceivable algorithms of any kind), Bubble sort. It's an elegant algorithm, however, and newbie programmers may be reluctant to concede its uselessness.

Not being used to evaluating sorting algorithms can be a serious handicap. Programmers who actually do check the performance of their shiny new Bubble sort will soon "improve" on it by inventing Cocktail Shaker sort, which looks like a good idea on paper but ends up being just a different version of the same painfully slow algorithm, albeit a lot more... um... vibrant? At this point, the improvements will typically cease, unless there's a genuine interest in algorithms.

In fact, I recommend implementing visual tools to examine how different algorithms work. I've done this in DOS text mode a long time ago, on the PalmPilot and in a Java application. It's always very illuminating. Seeing Quicksort whip past so quickly that you have to add delays to the code to be able to see anything, and then seeing Bubble sort trudge along for an eternity even without the delays, is a true eye-opener for most people.

How it's made: Quicksort

Quicksort is by far the most popular sorting algorithm, mainly because it is one of the fastest, and uses only a stack for auxiliary storage. (Merge sort is probably better in general, speedwise, but needs a work area on the same order as the data area being sorted, so it is often impractical)

Real-life Quicksort isn't very elegant, however. It's an engineer's algorithm. Apart from having to make a guess what to use for a pivot element, it needs a stack to implement recursion (or for storage, if the recursion is worked around), and it performs relatively badly on small arrays, so it needs to be complemented with another sorting algorithm for the final sort of sub-arrays of less than, say, four elements. Quicksort can also be very unstable, performance-wise, depending on the state of the input data and the choice of pivot element. The pure, recursive little quicksort used in student assignments is useless in real life.

Heapsort

Heapsort is a marvellously elegant sorting algorithm, but it suffers from being rather complex and tricky to implement, and it's generally slightly slower than Quicksort. It doesn't demand auxiliary storage, however.

Here's a simple Heapsort implementation:

void sift(void *base, size_t size, int l, int r,
          int (*compar)(const void *, const void *))
{
  int root=l;
  while (root * 2 + 1 <= r) {
    int child = 2 * root + 1;
    if ((child < r) &&
        (compar((char *)base + size * child,
                (char *)base + size * (child + 1)) < 0))
       child++;
 
    if (compar((char *)base + size * root, (char *)base + size * child) < 0) {
      SWAP((char *)base + size * root, (char *)base + size * child, size);
      root=child;
    } else break;
  }
}
 
void heapsort(void *base, size_t nmemb, size_t size,
              int (*compar)(const void *, const void *)) {
  if (!base || !size || nmemb <= 1) return;
  int l=nmemb/2-1;
  int r=nmemb-1;
  while (l>=0) {
    sift(base, size, l--, r, compar);
  }
 
  while(r>0)
    {
      SWAP((char *)base, (char *)base + size * r, size);
      sift(base, size, 0, --r, compar);
    }
}

The undiscovered champion: Combsort

In 1980, Wlodek Dobosiewicz invented Combsort as an improved version of Bubble sort. Later, Stephen Lacey and Richard Box rediscovered the algorithm and published an article about it in Byte magazine. Strangely enough, it's still not well-known among programmers.

Unlike other "improvements" on Bubble sort, Combsort is actuallly a vast improvement without actually being very much more complex. As a side note, the same basic principle can be employed in tidying your house, and makes for a very efficient and rewarding method.

The idea is to quickly move everything to the correct general area. which is what gives it a strong advantage over Bubble sort. By implementing the "bubbling" over larger gaps, and letting those gaps shrink by a certain amount for every iteration of the outer loop, elements in the array being sorted with quickly gravitate to the vicinity of their final position. Compared to Bubble sort's pitiful performance of O(n2), this gives Combsort the same O(n log n) characteristics as Quicksort, with the simple addition of an extra, outer loop.

The adjustment of the gap is a crucial part of the algorithm. The gap will be shortened every iteration by dividing it with 1.2473 or thereabouts, or even 1.3. As a special case, gaps of 9 and 10 will be replaced by a gap of 11 (informally named Combsort11). There was even a fixed gap table published by Jim Veale, but I'm not using that here. Other than this, the algorithm is very simple.

Note that Combsort will end up running as Bubble sort with a gap of 1 until the array is sorted. By this time, everything should be roughly in order, so it shouldn't take more than one or two runs.

Performance is quite close to that of qsort() but as you can see, this is a very simple algorithm to implement and it doesn't need any auxiliary storage except for a couple of local variables. Combsort should be very suitable for constrained environments like embedded systems.

(here implemented as a drop-in replacement of libc's qsort())

void combsort11(void *base, size_t nmemb, size_t size,
                int (*compar)(const void *, const void *)) {
  if (!base || !size || nmemb <= 1) return;
  int gap = nmemb, changed;
  do {
    gap = 93 * gap / 116;
    if (gap == 0) gap = 1;
    if (gap == 10 || gap == 9) gap = 11;
    int i;
    changed=0;
    for (i=0; i + gap < nmemb; i++) {
      if (compar((char *)base + size * i, (char *)base + size * (i + gap)) > 0) {
        SWAP((char *)base + size * i, (char *)base + size * (i + gap), size);
        changed=1;
      }
    }
  } while (changed || gap > 1);
}

Concerning references

I read about Combsort in Byte Magazine when it was published there. The reference to Wlodek Dobosiewicz comes from Wikipedia, but sadly I haven't been able to find the original publication to confirm it. The Jim Veale reference is all over the Internet and all the links go to a dead page, so I couldn't confirm that either.

There are many many more variations of this algorithm around, so there is definitely an interest in it. It's just not actually used a lot in real life.

Here is the definition of SWAP() from glibc's qsort.c:

/* Byte-wise swap two items of size SIZE. */
#define SWAP(a, b, size)                                                      \
  do                                                                          \
    {                                                                         \
      register size_t __size = (size);                                        \
      register char *__a = (a), *__b = (b);                                   \
      do                                                                      \
        {                                                                     \
          char __tmp = *__a;                                                  \
          *__a++ = *__b;                                                      \
          *__b++ = __tmp;                                                     \
        } while (--__size > 0);                                               \
    } while (0)