Thursday, November 13, 2014

Treapy Sort --Getting a Bit Treapy with a Treap

Treap

A treap is a randomized data structure that is hybrid of heap and a tree structure. The creators are Cecilia R. Aragon and Raimund Seidel in 1989. What is fascinating is the use of random priority for maintaining the heap structure or heap proprety. Randomness is used to bring order, structure. As Mr. Spock would say, "Fascinating..."

The basic operations of access, insert, remove all have a Big-Oh of O(lg N). Hence for 2**32 = 4,294,967,296 or 4-billion elements in a treap will require lg(4294967296) or lg(2**32) or 32-operations. Hence a treap, while a hybrid of a tree and heap, is very efficient in performance time.

Infimum and Supremum

One operation on a treap is to get the minimal or maximal elements--the mathematical term is infimum and supremum. I use the term extrema to mean either maximum or minimum element.

Not all treap data structure implementations have an extrema methods such as getMin() or getMax(), but such methods are necessary for the treapy sort.

Sorting by Treap

The treap as a method of sorting is discussed in A Modular Calculus for the Average Cost of Data Structuring by Michel Schellekens in 2008. Sorting using the extrema methods the sorting process is simply to getMin() or getMax() and delete the element.

But, after deletion, the treap requires rebalancing of elements to maintain the heap property. Hence deletion requires the expense of rebalancing the treap.

Wikipedia explains, "To delete a node x from the treap, if x is a leaf of the tree, simply remove it. If x has a single child z, remove x from the tree and make z be the child of the parent of x (or make z the root of the tree if x had no parent). Finally, if x has two children, swap its position in the tree with the position of its immediate successor z in the sorted order, resulting in one of the previous cases. In this final case, the swap may violate the heap-ordering property for z, so additional rotations may need to be performed to restore this property."

Getting Treapy

The treapy sort works by getting the extreme element, but avoids the cost of rebalancing the treap--however small the Big-Oh cost. The treapy sort avoids this, while preserving the structure of a treap while sorting. How does it do this? That's why its not a treap sort but a treapy sort.

One important feature of the treapy sort (and likewise the treap sort) is for duplicate elements each node in the treap maintain a count for duplicate elements (and to simplify the treap operation overall). Hence for a list of elements O(N) the space is O(M) where M <= N.

The treapy sort is linearithmic or O(N lg N)

For a list of data elements 2**32 = 4,294,967,296 the Big-Oh time is O(2**32 lg(2**32)) which simplifies to O(2**32 * 32) or O(2**32 * 2**5) or O(2**37).

One feature is that if the elements are in any partial ordering, or totally sorted, the treapy sort is immutable, it remains linearithmic.

The treapy sort is akin to the heap sort, but uses a treap, a hybrid of a heap and tree. One thesis examines the heap sort to treap sort in greater detail. The abstract states, "It was discovered that treap sort is much more faster than heap sort in sorting and searching for elements."

Source Code for Treapy Sort

The idea for the treapy sort came to me while examining diagrams, figures, and code for a treap. One of the best things about being a computer scientist is that to test an idea is to code, execute, test, and refine or optimize. I've implemented the treapy sort in Java to see it works.

Writing a research monograph, and doing a performance comparison against other linearithmic sorting algorithms--merge sort, quick sort, heap sort is an experiment for the future. Perhaps even a comparison with a treap sort with the cost of deletion and rebalancing a treap.

What is fascinating is that randomness to maintain a heap property is used but yet order emerges...the ordering of sort property.

The source code in Java I *might* post it, depending upon the level of interest. I *might* write a research monograph, again depending upon the level of interest. The downside of being a computer scientist is publishing results is involved, requiring writing a research monograph whereas writing source code is simpler and quicker.

Advantages of Treapy Sort

The two primary advantages of the treapy sort are:

  1. The treapy sort maintains the treap structure (hence immutability).
  2. No need or expense to delete and rebalance treap.
  3. Linearithmic performance time independent of source data

Another advantage is that if a treap is used in other applications (for example compiler symbol table, or I considered using it in SEQu-RAmA) the treapy sort is useful to extract all the data in ascending or descending sorted order. Such as a debug operation, or to simply dump the information in the treap data structure.