Should a programming language be lazy by default? Robert Harper says no. Lennart Augustsson says yes. No matter who is right, I say all computer scientists should become fluent in a lazy language, whether or not they speak it in daily life.
My evidence is a post by Russ Cox on parsing with derivatives: a very experienced programmer very convincingly argues why a parsing algorithm has exponential time complexity. But the claims are very wrong; Adams, Hollenbeck, and Might proved the algorithm is cubic.
How did he err so badly? Did he underestimate the power of lazy evaluation?
I once exclusively wrote eager code, and I imagine my younger self would have agreed with his analysis without a second thought. Today I know better. Marvel at these lines by Doug McIlroy:
int fs = 0 : zipWith (/) fs [1..]  integral from 0 to x sins = int coss coss = 1  int sins
It seems too good to be true. Indistinguishable from magic perhaps. But somehow it all works when lazily evaluated. Beware of summarily dismissing lazy code because it looks implausibly amazing.
Also consider an earlier article by Cox on regular expressions. Again, a very experienced programmer very convincingly argues why a parsing algorithm has exponential time complexity. In this post, however, the claims are solid, and backed up by graphs of running times. (It’s worth reading by the way: it tells the tragedy of how popular regular expression implementations became sluggish twisted mockeries of true regular expressions, while offering hope for the future. My only criticism is it fails to mention regular expression derivatives.)
Why does the erroneous post lack similar graphs? Why didn’t the author throw some code together and benchmark it to produce damning evidence?
Perhaps he thought it was too tedious. This would imply unfamiliarity with lazy languages, because prototyping parsing with derivatives in Haskell is easier than criticizing it.
Preliminaries
We define a Pe
data structure to represent parsing expressions, that is, the
righthand side of the production rules of a grammar.
import Control.Arrow
import Control.Monad.State
import qualified Data.Map as M
import qualified Data.Set as S
 NT = nonterminal. (:.) = concatenation.
data Pe = NT String  Eps Char  Nul  Ch Char  Or [Pe]  Pe :. Pe  Del Pe
Although it represents the empty string, the Eps
(for epsilon) expression
holds a character that winds up in the abstract syntax tree (AST) returned by
the parser. Similarly, the Del
(for delta) expression, which is only generated
internally, holds an expression which later helps build an AST.
A contextfree grammar maps nonterminal symbols to parsing expressions:
type Grammar = M.Map String Pe
Our ASTs are full binary trees whose leaf nodes are characters (the free magma on the alphabet). The tree structure captures the order the production rules are applied.
data Ast = Bad  Lf Char  Ast :@ Ast deriving Show
isBad :: Ast > Bool
isBad Bad = True
isBad _ = False
The Bad
AST is returned for unparseable strings. An alternative is to drop
Bad
and replace Ast
with Maybe Ast
throughout our code.
A fancier parser might return a parse forest, that is, all parse trees for a given input. Ours simply settles on one parse tree.
Parsing with derivatives
To parse an input string, we first take successive derivatives of the start
symbol with respect to each character of the input, taking care to leave bread
crumbs in the Eps
and Del
expressions to record consumed characters. (The
Del
constructor is named for the delta symbol from the paper, but I also
think of it as "deleted", because it remembers what has just been deleted from
the input.)
Then the string is accepted if and only if the resulting expression is nullable, that is, accepts the empty string. As we traverse the expression to determine nullability, we also build an AST to return.
We memoize derivatives by adding entries to a state of type Grammar
.
Initially, this cache contains only the input grammar, mapping nonterminal
symbols to Pe
values. Later, we place a derivative at the key formed by
concatenating the characters involved in the derivative with the
nonterminal symbol being derived.
For example, if S
is a nonterminal in the input grammar, then
abS
maps to derive 'a' (derive 'b' (NT "S"))
. We assume no nonterminal
symbol in the input grammar is a suffix of any other nonterminal symbol, which
is fine for a prototype.
It may help to imagine the grammar growing over time, gaining new production rules as we process input characters. Indeed, we consider nonterminals to refer to both nonterminals in the input grammar as well as their derivatives.
parse :: Grammar > String > String > Ast
parse g start s = evalState (parseNull $ NT $ reverse s ++ start) g
Computing nullability requires finding a least fixed point. I found this the toughest part of the algorithm, partly because they never taught fixed point theory when I was in school. For some reason, the method reminds me of Hopcroft’s algorithm to minimize a DFA, where we repeatedly refine a partition until we reach a stable answer.
We initially guess each nonterminal is not nullable, which means it
corresponds to the Bad
AST. On encountering a nonterminal, if we’ve already
seen it, then return our guess for that nonterminal. Otherwise, it’s the first
time we’ve seen it and instead of guessing, we recursively traverse its
corresponding expression. In doing so, we may discover our guess is wrong, so
we correct it if necessary before returning an AST.
We repeat until our guesses stabilize.
Guesses never change from a good AST to Bad
, and the map of all
guesses only changes if a guess is revised from Bad
to a good AST.
We exploit these facts to simplify our code slightly.
parseNull :: Pe > State Grammar Ast
parseNull pe = leastFix M.empty where
leastFix guessed = do
(b, (_, guessed')) < runStateT (visit pe) (S.empty, guessed)
if M.size guessed == M.size guessed' then pure b else leastFix guessed'
visit :: Pe > StateT (S.Set String, M.Map String Ast) (State Grammar) Ast
visit pe = case pe of
Eps x > pure $ Lf x
Del x > visit x
Nul > pure Bad
Ch _ > pure Bad
Or xs > chainsaw <$> mapM visit xs
x :. y > mul <$> visit x <*> visit y
NT s > do
(seen, guessed) < get
case () of
()  Just x < M.lookup s guessed > pure x
 S.member s seen > pure Bad
 otherwise > do
modify $ first $ S.insert s
b < visit =<< lift (memoDerive s)
unless (isBad b) $ modify $ second $ M.insert s b
pure b
mul :: Ast > Ast > Ast
mul Bad _ = Bad
mul _ Bad = Bad
mul x y = x :@ y
  Helps cut a nonempty parse forest down to one tree.
chainsaw :: [Ast] > Ast
chainsaw xs  null xs' = Bad
 otherwise = head xs'
where xs' = filter (not . isBad) xs
Memoized derivatives are straightforward. For computing derivatives, we translate the rules given in the paper, and for memoization, on discovering a missing entry, we insert a knottying value before recursing, and replace it with the result of the recursion afteward.
memoDerive :: String > State Grammar Pe
memoDerive cs@(c:s) = do
m < get
unless (M.member cs m) $ do
modify $ M.insert cs $ NT cs
d < derive c =<< memoDerive s
modify $ M.insert cs d
gets (M.! cs)
memoDerive _ = error "unreachable"
derive :: Char > Pe > State Grammar Pe
derive c pe = case pe of
NT s > pure $ NT $ c:s
Ch x  x == c > pure $ Eps x
Or xs > Or <$> mapM (derive c) xs
Del x :. y > (Del x :.) <$> derive c y
x :. y > do
b < parseNull x
dx < derive c x
if isBad b then pure $ dx :. y else do
dy < derive c y
pure $ Or [dx :. y, Del x :. dy]
_ > pure Nul
Here’s the grammar that Cox claims will grind our parser to a halt:
cox :: Grammar
cox = M.fromList
[ ("S", NT "T")
, ("T", Or [NT "T" :. (Ch '+' :. NT "T"), NT "N"])
, ("N", Ch '1')
]
Let’s try it on a small input in an interactive interpreter:
parse cox "S" "1+1+1"
The parser picks a particular parse tree:
(Lf '1' :@ (Lf '+' :@ Lf '1')) :@ (Lf '+' :@ Lf '1')
How about all strings of length 7 consisting of 1
or +
?
filter (not . isBad . parse cox "S") $ replicateM 7 "+1"
Thankfully, we get:
["1+1+1+1"]
At last, it’s time to demolish Cox’s claims. We parse an 80character input with a typo near the end:
main :: IO ()
main = print $ parse cox "S" $ concat (replicate 39 "1+") ++ "+1"
Our prototype is awful. We really should:

Add a slimmed down version of
parseNull
that returns a boolean instead of an AST, and call this inderive
. We only want to recover the AST once the whole string has been parsed; the rest of the time, we only care whether an expression is nullable. 
Use a better algorithm for finding the least fixed point. We’ve perhaps chosen the clunkiest and most obvious method.

Remove a layer of indirection when tying the knot. That is, point to a node directly rather than a string (which requires another lookup to get at the node).

Apply algebraic identities to reduce the number of nodes in parsing expressions and abstract syntax trees.
And yet, on my laptop:
Bad real 0m0.220s user 0m0.215s sys 0m0.005s
Clearly, parsing with derivatives is efficient when run on the allegedly exponentialrunningtime example given by Cox.
The moral of the story
It’s best to test drive an algorithm before condemning it. If we see hilariously bad running times, then we can include them to hammer our points home. If we see surprisingly good running times, then there’s a mistake in our reasoning and we should keep quiet until we successfully attack the algorithm from another angle. (Cox rightly notes parsing with derivatives forgoes two key properties of yacc: linear running time and ambiguity detection. If only he had focused on these tradeoffs.)
Is this practicable for parsing with derivatives? Well, we have presented an entire program, yet we have written less code than appears in Cox’s excellent article on regular expressions, which quotes just a few choice cuts from a presumably complete program. Indeed, with a splash of HTML, we can easily build an interactive online demo of parsing with derivatives.
The existence of the flawed post indicates no such sanity check was done. This was caused by poor understanding of lazy evaluation, or because it was deemed too troublesome to implement a lazy algorithm. Both problems are solved by learning a lazy language.
In sum, insufficient experience with lazy evaluation leads to faulty time complexity analysis. Therefore we should all be comfortable with lazy languages so computer science can progress unimpeded.