Tuesday, January 4, 2011

Customizing the new FitNesse parser

FitNesse began its life using the 'simplest thing that works' to parse wiki markup and render it as HTML: a set of regular expressions. Over the years, FitNesse functionality has grown, and the regex-based parser has had to support more and more complex tasks.  More and more hacks have been added to work around regex limitations and performance has suffered as bigger and more complex wiki pages have been thrown at it. So last spring, I started on a project to re-write the entire FitNesse parser, using classic grammar theory approaches.

Although the realities of earning a living slowed progress at times, and the challenges of replicating all the quirks of the original parser tested my resolve, we have finally merged the new parser into the main code base.  Thanks to a number of beta users, it has been tested on some major FitNesse test suites and the next FitNesse release, coming soon, will use the new parser.

One of the features of the original parser was the ability to extend the wiki syntax by plugging in your own custom wiki 'widgets'. This is described here: you write a class that extends WikiWidget and you add a line to a plugins.properties file.

WikiWidgets=className

James Carr wrote a nice post describing a detailed example of this.

The new parser also has this feature, but the plug-in class that you write is different. I'm going to show a very simple example here. Let's imagine we want to write !pi in our wiki pages and have it rendered with the value of pi in the HTML.


The plug-in class must extend SymbolType. A plug-in class can specify up to four pieces of information for the parser. Our simple example only needs to supply three of these.

The first is a name, specified in the super constructor. The name is just used for error reporting and debugging and so it can be any descriptive string.

The second is the wikiMatcher. This is an object that knows how to identify the symbol type in the source string. The Matcher class provides a lot of common matching behavior, so we can just tell it that our symbol type is recognized by the string "!pi". You can look at the Matcher source to find other matching behavior.

The third is the wikiRule, which our symbol type doesn't require. This is an object that implements a grammar production rule if our symbol type is composed of other symbol types (a non-terminal, in grammar-speak). Our symbol type is a terminal so we don't need a production rule. Look at the fitnesse.wikitext.parser package to see examples of how production rule classes are written.

The fourth is the htmlTranslation. This is an object that renders the symbol type as a string in the HTML output. We can implement the Translation interface and specify this as our translation object. The toTarget method renders our output, a string containing the value of pi.

We add a line to the plugins.properties file.

SymbolTypes=PiSymbolType


That's it!