Will's Blog...Much Ado About Nothing: October 2010

I like to create, analyze, test, and tinker with algorithms, especially sorting algorithms. Lately, sorting has pursued hybridized sorts, like the Introsort in C++, and lately the TimSort. The TimSort is slated to become the default sort algorithm in the JDK. TimSort is a hybridized merge sort with insertion sort, that first emerged in the Python realm. Another hybrid algorithm is the FlashSort, which uses the insertion sort to fully sort after partial sorting. Another hybridized algorithm is the Dual Pivot Quicksort. It seems elegance is lost in sorting, and algorithms are spliced together.

The TimSort requires a Comparator to be implemented, a specific class with a function to compare any to possible objects that can be defined in Java. No more magic comparison defined on integers, floats, doubles, or on characters from the encoding, and likewise a string (array of characters from a C perspective).

One sorting algorithm I created and tested was the Binar Sort, a sorting algorithm that uses the binary digits to partition for each binary digit position. Further contemplation, tinkering, and experimenting led to another sorting algorithm, the WillSort. I won't explain how the algorithm works, that's for a research monograph, but my testing has compared it to the TimSort, QuickSort, MergeSort, FlashSort, and Dual Pivot Quicksort. The test was on 32-bit integers in a random permutation to avoid the worst case of QuickSort, and to force all the algorithms to work without a special case of partly or nearly sorted set of data.

To push the performance comparison, the test was with large data sets of 100-million to 500-million integers. After each sort a static method then checked the array of integers was sorted independently of the sorting algorithm as a cross-check for any anomalous or unexpected results. The same data set was used, and randomly shuffled with the same seed to the pseudo-random number generator each time. Thus, each sorting algorithm worked on the same size, same permutation data set of 32-bit integers. All things were equal, except on the sorting algorithm used.

The platform was a JVM with a 6-gigabyte heap, on an Windows PC running at 2.7GHz. The results measured in the milliseconds. Here is some raw data from a random permutation:


Size        Quick    Merge   Flash    Tim   2Pivot   Will

100-million 16926    24008   32105   54709   55116   9706    
200-million 35849    53352   82336  126391  126329  19594   
300-million 55661    77454  108218  238945  386350  32308
400-million 73289   149573  182769  426597  407020  40482
500-million 97515   307663  224577  990444  984796  51761

Interestingly, but not surprisingly, QuickSort (and not the 2-pivot QuickSort), maintains a speed advantage. WillSort is the best, and does not have any sudden spike or performance quirks. The MergeSort and FlashSort are in the middle. The TimSort and 2-pivot QuickSort lag behind, but they are both hybridized versions of the QuickSort and MergeSort. The FlashSort partially sorts, but then uses the Insertion Sort, so is not a composite of two sorts, using the flash sort to partly sort.

The ratio between the QuickSort to the WillSort is 1.7 to 1.8, and fluctuates to 1.88 at 510-million (the increment between each interval was 10-million integers). Hence the time processing (processing for the QuickSort is selecting a pivot element, partitioning using comparison) each element is 1.7 to 1.8 between the QuickSort and the WillSort.

It seems perplexing for the focus on TimSort and 2-Pivot Quicksort, when both QuickSort remains more efficient, and there are other potential algorithms (of course, the WillSort).

In the worst case, ignoring other possibilities, given the Introsort is time-tested by a wider group in the Standard Template Library of C++ developers, a port from C++ to Java of on the Introsort seems more plausible.

But have data, will sort using WillSort.

Saturday, October 9, 2010

Have Data, Will Sort

Will's Blog...Much Ado About Nothing

Followers

Blog Archive

About Me