Tuesday, March 16, 2010

Fun with null objects

I'm doing some heavy lifting in Java for the first time in a few years, as I'm writing a wiki parser for FitNesse. I came across the good old Null Object pattern as I was rendering the HTML output from the parser:

String translation = currentToken.render(scanner);
Since I'm using a backtracking parser design, this operation may fail. So how do I communicate that to the caller? Of course, I can return null and check for that. But passing nulls around is something I try to avoid at any cost. Well, any reasonable cost. I could throw an exception, but this isn't really an unexpected situation or an illegal operation, it's part of normal operation. I could also add a method to check if I need to backtrack:

if (currentToken.canRender(scanner)) { String translation = currentToken.render(scanner); ... }
However, canRender() could be very expensive as it might have to do all the work that render() does. Often, I wouldn't worry about this, but one of the goals of this parser is to be high-performance. I could cache the results of the work that canRender() does so that render() wouldn't have to duplicate it. But then my design is getting more stateful and more complicated, and my experience tells me to avoid that if possible.

So I decided to return an object that can tell the caller if the operation worked, and if so, what the result is. There's probably a Java idiom for doing this, but not knowing what that might be, I came up with my own:

Maybe<String> translation = currentToken.render(scanner); if (!translation.isNothing()) { result.append(translation.getValue()); ... }
Here's my implementation of the Maybe class:

public class Maybe<T> { public static final Maybe<String> noString = new Maybe<String>("*nothing*", true); private final T value; private final boolean isNothing; public Maybe(T value) { this(value, false); } private Maybe(T value, boolean isNothing) { this.value = value; this.isNothing = isNothing; } public T getValue() { return value; } public boolean isNothing() { return isNothing; } }

Update: Nat Pryce, whose Java skills are infinitely greater than mine, has posted his implementation.

Monday, March 1, 2010

Things I Ponder About

I've been working on a new parser for fitSharp and one of the things I've been kicking around is how to model a tree structure. This is what I'm currently looking at:

public interface Tree<T> { T Value { get; } IEnumerable<Tree<T>> Branches { get; } }
This says that a tree node has a collection of tree nodes that are its branches.

The existing Fit code base takes a different approach, something like this:

public interface Tree<T> { T Value { get; } Tree<T> Child { get; } Tree<T> Sibling { get; } }
This says that a tree node is related to two other nodes, a child and a sibling. (Fit calls them Parts and More). This model actually gives us more information, since we can get a node's sibling, which the first doesn't provide.

But when I started writing implementing classes, I ran into a tangle. I wanted a method to add a branch:

public void AddBranch(Tree<T> newBranch) { ... }
With the first model, this was simple. But with the second model, I need to be able to modify a branch to link in the new branch. So I need to know more about the branches than the basic Tree interface tells me.  The additional information it provides comes with a cost.

Do I really want to pay for the extra benefit of knowing a node's sibling? I don't think so. One thing I find confusing in the Fit code base is that a tree node has two uses. Sometimes it's used as a single node (a table, row or cell) and sometimes as a collection (tables, rows or cells) that can be iterated over using the sibling link. If I mean a collection, I think I should use a collection, like the Branches in the first model.

These are the things I ponder.