Thursday, March 24, 2011

A Scan in Time Saves a Lexeme per Nanosecond in this Rhyme

I have implemented a scanner/lexical analyzer for a compiler of a programming language (the language itself is not relevant, the discussion of implementation and language design issues is for a later web log entry or monograph.) Originally, I implemented the compiler using JavaCC, but now have written the scanner (and the rest of the compiler) from scratch. After some tweaking and improving of the hand implemented scanner, I decided to test the performance by a comparison of scanning a test file.

Performance Tests



The test file contains 10,000-lexemes or tokens, a random pastiche of gibberish of the words of the programming language. Each test run is repeated several times to find an average time per token for each scanner. JavaCC is efficient and pretty cool in that the scanner is included with the parser (wish I'd had JavaCC when I took a compiler course in college using the infamous/famous "Dragon Book" and using Lex/Yacc) in the generated source code.

Each test run was repeated many times finding the minimum time in nanoseconds.


  1. JavaCC scanner: 7873-nanoseconds

  2. Custom scanner: 537-nanoseconds



The ratio is 14.66 or approximately a ratio of 15-to-1 on Windows 7 AMD64 desktop.


  1. JavaCC scanner: 9246-nanoseconds

  2. Custom scanner: 733-nanoseconds



The ratio is 12.614 or approximately a ratio of 13-to-1 on Mac 10.6.7 Intel notebook.

The custom scanner I expected slightly better performance, but at an approximate ratio of 14:1 was surprising. In retrospect, the JavaCC compiler generate creates Java source code, but the most general case for any potential programming language. Hence not the most optimal.

Consider the programming language of 100,000-lines of source code, with 10-lexemes per line. The source code is then 1-million lexemes. Total time to scan the file on Windows 7:


  1. JavaCC: 7873-milliseconds or 7.873-seconds, approximately 8-seconds.

  2. Custom: 537-milliseconds or 0.537-seconds, approximately 0.5-seconds, or a half-second.



Total time to scan the file on Mac:


  1. JavaCC: 9246-milliseconds or 9.25-seconds, approximately 9.5-seconds.

  2. Custom: 733-milliseconds or 0.733-seconds, approximately 0.75-seconds, or three-quarters of a second.



Optimize Bit by Bit



I had a work colleague chide me for optimizing code (including old legacy code 10-years old, a mix of C and C++ for Windows, Solaris, Linux, and some other archaic systems) as I fixed legacy bugs and other glitches. Eventually a team lead for another group stopped by asking for performance nearly 3x the older versions of the software package.

I made a joke about the Beta version leaking out, but later informed the team lead that it would require a reinstall of the software on systems on-site, and overseas sites. About a week later, I sat down with the team lead, and worked out a schedule and plan to install the latest, optimized build on temporary systems, reinstall, and replace the older, less optimal systems.

But the secret was to optimize, and improve code when a bug, defect, or glitch needed to be corrected. I can remember that in some places, the comments, and the reported error messages were cryptic at best, misleading at worse, hence optimize them. I left the old comments in place (at the time, a popular software methodology called for purging all comments to wipe the slate clean, but that is like learning from a mistake by getting amnesia) but with other comments to explain the change. Over several months as I fixed problems in the legacy code, it also improved performance bit by bit.

Optimize Gradually Design Implementation



For the language compiler, I'm trying to get each stage of the compiler as optimal as possible. Each slight increase in performance gives some performance time for other, more complex compiler functionality, such as a semantic check, or optimizing code generation, etcetera. Optimize software, but at a piece at a time.

An old rule of thumb or heuristic from the structure software paradigm was not to try an optimize performance for one particular thing, it was lots of little things to optimize. The optimize by the ingredients, not the overall recipe after it has baked/boiled/fried (to use a cooking metaphor, as I love to cook...)