Saturday, October 9, 2010

Have Data, Will Sort

I like to create, analyze, test, and tinker with algorithms, especially sorting algorithms. Lately, sorting has pursued hybridized sorts, like the Introsort in C++, and lately the TimSort. The TimSort is slated to become the default sort algorithm in the JDK. TimSort is a hybridized merge sort with insertion sort, that first emerged in the Python realm. Another hybrid algorithm is the FlashSort, which uses the insertion sort to fully sort after partial sorting. Another hybridized algorithm is the Dual Pivot Quicksort. It seems elegance is lost in sorting, and algorithms are spliced together.

The TimSort requires a Comparator to be implemented, a specific class with a function to compare any to possible objects that can be defined in Java. No more magic comparison defined on integers, floats, doubles, or on characters from the encoding, and likewise a string (array of characters from a C perspective).

One sorting algorithm I created and tested was the Binar Sort, a sorting algorithm that uses the binary digits to partition for each binary digit position. Further contemplation, tinkering, and experimenting led to another sorting algorithm, the WillSort. I won't explain how the algorithm works, that's for a research monograph, but my testing has compared it to the TimSort, QuickSort, MergeSort, FlashSort, and Dual Pivot Quicksort. The test was on 32-bit integers in a random permutation to avoid the worst case of QuickSort, and to force all the algorithms to work without a special case of partly or nearly sorted set of data.

To push the performance comparison, the test was with large data sets of 100-million to 500-million integers. After each sort a static method then checked the array of integers was sorted independently of the sorting algorithm as a cross-check for any anomalous or unexpected results. The same data set was used, and randomly shuffled with the same seed to the pseudo-random number generator each time. Thus, each sorting algorithm worked on the same size, same permutation data set of 32-bit integers. All things were equal, except on the sorting algorithm used.

The platform was a JVM with a 6-gigabyte heap, on an Windows PC running at 2.7GHz. The results measured in the milliseconds. Here is some raw data from a random permutation:

Size Quick Merge Flash Tim 2Pivot Will

100-million 16926 24008 32105 54709 55116 9706
200-million 35849 53352 82336 126391 126329 19594
300-million 55661 77454 108218 238945 386350 32308
400-million 73289 149573 182769 426597 407020 40482
500-million 97515 307663 224577 990444 984796 51761

Interestingly, but not surprisingly, QuickSort (and not the 2-pivot QuickSort), maintains a speed advantage. WillSort is the best, and does not have any sudden spike or performance quirks. The MergeSort and FlashSort are in the middle. The TimSort and 2-pivot QuickSort lag behind, but they are both hybridized versions of the QuickSort and MergeSort. The FlashSort partially sorts, but then uses the Insertion Sort, so is not a composite of two sorts, using the flash sort to partly sort.

The ratio between the QuickSort to the WillSort is 1.7 to 1.8, and fluctuates to 1.88 at 510-million (the increment between each interval was 10-million integers). Hence the time processing (processing for the QuickSort is selecting a pivot element, partitioning using comparison) each element is 1.7 to 1.8 between the QuickSort and the WillSort.

It seems perplexing for the focus on TimSort and 2-Pivot Quicksort, when both QuickSort remains more efficient, and there are other potential algorithms (of course, the WillSort).

In the worst case, ignoring other possibilities, given the Introsort is time-tested by a wider group in the Standard Template Library of C++ developers, a port from C++ to Java of on the Introsort seems more plausible.

But have data, will sort using WillSort.

Sunday, March 14, 2010

Binar Sort--the One Thing

I put the draft monograph for a new sorting algorithm, the binar sort online. The response from readers was varied in the specifics, but invariably the same was of the form "impossible to sort in linear time" or "no, it's not" (double negative), or "because of this it proves your algorithm is not linear."

All negative statements, proving a point by making impossible to prove negative assertions (remember basic logic, you cannot prove a negative, something is not). One individual with incredibly stunning powers of observation said (to paraphrase): it will only work with binary data. I didn't bother to explain digital logic, bits, or what is under the hood inside memory chips, coprocessors, and a microprocessor, or other computer hardware. A criticism by making an assertion of obviousness.

I consider myself a scientist, in particular a computer scientist. If I can be proven wrong, so be it. The broadening of knowledge, and greater understanding is of importance. But the critics seem to prove by making unprovable or obvious statements which only demonstrate one-dimensional thinking, lacking any depth or understanding. Even worse, all the critics did one thing consistently--never asked for the source code in Java, C#, or C to try the algorithm for themselves.

Algorithms have a theoretical basis, analysis, but unlike other scientific domains, computer science is applied for an algorithm--it can be run, tested, and evaluated. But not one person did that--it was all assertions and commentary.

Of course, many technical papers and research monographs are theoretical, and often practical implementation is non-existent. Once I read about a data structure, I e-mailed the authors about an implementation. They used the abstract pseudo-code notation, popular in algorithms books, but did not implement the data structure. The response to my query was "it was trivial to implement" in the usual dismissive condescension. The response was so trivial as to be banal and predictable. In this case, theory did not transfer into the praxis of application--applied example.

But in the case of the binar sort, I have a Java implementation. So it would only take the chutzpah to codify the binar sort, create a bytecode class, and run it with different test sets. Any incongruous results would pique my interest--there is a case where theory and practice do not correspond, hence an unknown factor. Learning and discovering the unknown factor is increasing knowledge and experience--adding to the scientific knowledge.

Unfortunately, verbiage only demonstrates that new things are easier dismissed as impossible, rather than an individual to have re-learn, and re-thinking what they have known.

Computer science, in that sense of testable and running immediately makes it one of the better sciences. An intrepid person can tinker with an algorithm, data structure, or library and refine and improve it, find anomalous behavior, and rewrite it in another programming language for a different platform. Hence the science is tangible, immediate, and testable.

The critics are demonstrating intellectual and scientific sloth and laziness--idleness by negative criticism. Dismiss something as impossible and it removes the need to reconsider existing knowledge, and integrate the new fact.

The wonderful book "The Programmer's Stone" discusses this in detail, and the different mindsets--packers and mappers.

But the binar sort is an algorithm, but another kind of experiment was implicitly conducted--the behavior of those who comment. It is somewhat interesting (although too familiar) to see that everyone did the same thing--inaction, never bothering to test the code and evaluate for themselves first hand.

Questioning and challenging a scientific experiment is part of science, but so is replicating an experiment to verify results. That's pretty obvious, particularly in high school where you replicate the experiments in physics and chemistry to verify well established scientific laws and constants.

But when the scientific process is only through rebuttal and making intangible, unprovable assertions--that's the realm of politics, word games, and inane chatter--political science (talk about an oxymoron) and sophistry. In a nutshell, the binar sort is an applied example illustrating the adage--you can lead a horse/jackass/donkey to water, but you cannot make it drink. Or as Mark Twain once said, "Don't let education get in the way of your learning." Too late.

Saturday, February 27, 2010

Schadenfreude of False Friends or Schaden-Freunde

The German word "schadenfreude" means a shameful joy, but life is replete with what I have coined the term schaden-Freund or "shameful friend". These are the friends that are there in passing, or worse, the ones that put a person down if only to feel superior. As Frank Sinatra croons in his song "That's Life" to quote "Some people get their kicks, Stomping on a dream" in this case "dream" is the aspirations of a friend.

Such friends are kept at a distance, as they are far, far, worse than an enemy.

These "friends" reveal themselves, often at a critical point where friendship is needed.

For example, I had a friend that I met in a previous job called Don. Looking back, he was a scadenfreund as it was always conversational one-ups, one better, always besting whatever I said. At the time, I naively took such talk as bravado, someone trying to impress. I was a friend to Don, helping him, and being generous in sharing books, papers, even copying some free material and Linux software so he did not have to find it, download it, and get it to work.

Later, when I was layed-off, and asked Don for a reference, the true colors of the schadenfreunde were revealed. In an e-mail, Don gave me his "I only give references to people I've worked with at least three years." How very considerate of him. Not!

Later when I began work on my computer architecture book, Don in an e-mail said "Why would you want to write a book with so many out there?" Stomping on a friend's dream. Simple--I want to, same reason people run, or climb mountains, or compose music, or play the piano. But I owed this false friend no explanations.

Later, after a few years, I got an e-mail from him, and again he said "I remember you were writing a book...". I did not respond. Don would want something, after all he suddenly contacts me after several years, and when I would not have it, would put down or make some nasty comment, then disappear again feeling superior. A schaden-freunde that needs someone to vomit their human excrement upon.

When he tried to contact me over LinkedIn, I blocked him. I also added his name to my e-mail blacklist. Don was a schadenfreunde, and the only way to deal with them is to avoid them. Having no reason to contact or communicate I severed all ties. A person that questions anything you do, does not share any sense of accomplishment, and does not help out of their own need for self-importance and self-aggrandizement is a person you do not need. I would be glad for a friend that was writing a book to be published, and if I could, would give a reference for a job to help them. The complete opposite applied to Don, so I need that like I need the swine flu.

Another schadenfreunde was Joe, whom again I met through work in a previous job. We were friends after both of us had left the company. But I noticed Joe always took a condescending view of patronizing superiority. When I considered going back to college once, Joe immediately started telling me about deadlines, forms, and documents, things I was well aware of. I made a point of asking Joe if he was my guidance counselor, telling him having been to college and some graduate school I was well aware of admission requirements and deadlines. That was a first, but not last time.

Later I had moved away, but Joe in e-mails was always dispensing his pithy advice, like to join the Peace Corps and move to a third world country. Unfortunately, my background is computer science, not much use in the bush of Africa. Also, I like air conditioning, hamburgers, milk shakes, Internet access, all the things of the United States--a first world nation. I pointed out to Joe that a quote of mine is "free advice is often worth the price you pay for it." It was as if Joe did not know me even though he talked to me, worked with me, and interacted with me--like I did not exist.

His response was "I was only trying to help" to which I responded trying to at least keep a dialog. But no response...can't dispense his pithy wisdom so does not communicate. A friend who constantly offers advice but only because it makes them feel self-important, and the worst kind of advice based on nothing about you. One reason I NEVER offer advice, even if directly asked by a friend. I can empathize and relate my situation of a similar event, perhaps state what I did, but not say what you should do, or how you should do it simply because it feels my empty, rotten insides for a feeling of self-importance.

Some schadenfreunde are former coworkers Vasudevan, Ravin, and Nathan, who after hurricane Katrina sent the token "I'm glad you survived." but no other e-mails again. Not interested in communicating, and having left work, you are no longer a lemming marching to the cliff in the same mindless shit job with the same equally mindless shit company (the movie Office Space comes to mind).

Questions to ask to determine if a "friend" is a schadenfreunde:


  1. For anything you do, is this friend in competition--one better or out doing you?

  2. Is this friend happy for your achievements and encourage you?

  3. Does this friend support your efforts by vouching for you?

  4. Will this friend empathize with you, or simply tell you what to do?



Life can be miserable, and a schadenfreunde only adds to the misery by being a traitor in your own ranks, a stabber in your back when you least expect it, and certainly when you do not need it. Usually with a big smile of friendship, and a carving knife ready to strike downward. Francis Bacon said, "If a man have a true friend, he may rest almost secure that the care of those things (things of great importance such as an unfinished work) will continue after him."

Dealing with such people is usually the more difficult thing, but the first step is to recognize them. I have true friends, but I can count them on one hand. I'd call a schadenfreude (borrowing from Francis Bacon's definition of a true friend) a share-not anything, whereas Francis Bacon in his book "Essays" Essayes: Religious Meditations. Places of Perswasion and Disswasion. Seene and Allowed states "I have given the rule, where a man cannot fitly play his own part; if he have not a friend, he may quit the stage." True friends make life worth living, schadenfreunde make life unlivable.