Should a programming language be lazy by default? Robert Harper says no. Lennart Augustsson says yes. No matter who is right, I say all computer scientists should become fluent in a lazy language, whether or not they speak it in daily life.
My evidence is a post by Russ Cox on parsing with derivatives: a very experienced programmer very convincingly argues why a parsing algorithm has exponential time complexity. But the claims are very wrong; Adams, Hollenbeck, and Might proved the algorithm is cubic.
How did he err so badly? Did he underestimate the power of lazy evaluation?
I once exclusively wrote eager code, and I imagine my younger self would have agreed with his analysis without a second thought. Today I know better. Marvel at these lines by Doug McIlroy:
int fs = 0 : zipWith (/) fs [1..] -- integral from 0 to x sins = int coss coss = 1 - int sins
It seems too good to be true. Indistinguishable from magic perhaps. But somehow it all works when lazily evaluated. Beware of summarily dismissing lazy code because it looks implausibly amazing.
Also consider an earlier article by Cox on regular expressions. Again, a very experienced programmer very convincingly argues why a parsing algorithm has exponential time complexity. In this post, however, the claims are solid, and backed up by graphs of running times. (It’s worth reading by the way: it tells the tragedy of how popular regular expression implementations became sluggish twisted mockeries of true regular expressions, while offering hope for the future. My only criticism is it fails to mention regular expression derivatives.)
Why does the erroneous post lack similar graphs? Why didn’t the author throw some code together and benchmark it to produce damning evidence?
Perhaps he thought it was too tedious. This would imply unfamiliarity with lazy languages, because prototyping parsing with derivatives in Haskell is easier than criticizing it.
We define a
Pe data structure to represent parsing expressions, that is, the
right-hand side of the production rules of a grammar.
import Control.Arrow import Control.Monad.State import qualified Data.Map as M import qualified Data.Set as S -- NT = non-terminal. (:.) = concatenation. data Pe = NT String | Eps Char | Nul | Ch Char | Or [Pe] | Pe :. Pe | Del Pe
Although it represents the empty string, the
Eps (for epsilon) expression
holds a character that winds up in the abstract syntax tree (AST) returned by
the parser. Similarly, the
Del (for delta) expression, which is only generated
internally, holds an expression which later helps build an AST.
A context-free grammar maps non-terminal symbols to parsing expressions:
type Grammar = M.Map String Pe
Our ASTs are full binary trees whose leaf nodes are characters (the free magma on the alphabet). The tree structure captures the order the production rules are applied.
data Ast = Bad | Lf Char | Ast :@ Ast deriving Show isBad :: Ast -> Bool isBad Bad = True isBad _ = False
Bad AST is returned for unparseable strings. An alternative is to drop
Bad and replace
Maybe Ast throughout our code.
A fancier parser might return a parse forest, that is, all parse trees for a given input. Ours simply settles on one parse tree.
Parsing with derivatives
To parse an input string, we first take successive derivatives of the start
symbol with respect to each character of the input, taking care to leave bread
crumbs in the
Del expressions to record consumed characters. (The
Del constructor is named for the delta symbol from the paper, but I also
think of it as "deleted", because it remembers what has just been deleted from
Then the string is accepted if and only if the resulting expression is nullable, that is, accepts the empty string. As we traverse the expression to determine nullability, we also build an AST to return.
We memoize derivatives by adding entries to a state of type
Initially, this cache contains only the input grammar, mapping nonterminal
Pe values. Later, we place a derivative at the key formed by
concatenating the characters involved in the derivative with the
nonterminal symbol being derived.
For example, if
S is a nonterminal in the input grammar, then
abS maps to
derive 'a' (derive 'b' (NT "S")). We assume no nonterminal
symbol in the input grammar is a suffix of any other nonterminal symbol, which
is fine for a prototype.
It may help to imagine the grammar growing over time, gaining new production rules as we process input characters. Indeed, we consider nonterminals to refer to both nonterminals in the input grammar as well as their derivatives.
parse :: Grammar -> String -> String -> Ast parse g start s = evalState (parseNull $ NT $ reverse s ++ start) g
Computing nullability requires finding a least fixed point. I found this the toughest part of the algorithm, partly because they never taught fixed point theory when I was in school. For some reason, the method reminds me of Hopcroft’s algorithm to minimize a DFA, where we repeatedly refine a partition until we reach a stable answer.
We initially guess each nonterminal is not nullable, which means it
corresponds to the
Bad AST. On encountering a nonterminal, if we’ve already
seen it, then return our guess for that nonterminal. Otherwise, it’s the first
time we’ve seen it and instead of guessing, we recursively traverse its
corresponding expression. In doing so, we may discover our guess is wrong, so
we correct it if necessary before returning an AST.
We repeat until our guesses stabilize.
Guesses never change from a good AST to
Bad, and the map of all
guesses only changes if a guess is revised from
Bad to a good AST.
We exploit these facts to simplify our code slightly.
parseNull :: Pe -> State Grammar Ast parseNull pe = leastFix M.empty where leastFix guessed = do (b, (_, guessed')) <- runStateT (visit pe) (S.empty, guessed) if M.size guessed == M.size guessed' then pure b else leastFix guessed' visit :: Pe -> StateT (S.Set String, M.Map String Ast) (State Grammar) Ast visit pe = case pe of Eps x -> pure $ Lf x Del x -> visit x Nul -> pure Bad Ch _ -> pure Bad Or xs -> chainsaw <$> mapM visit xs x :. y -> mul <$> visit x <*> visit y NT s -> do (seen, guessed) <- get case () of () | Just x <- M.lookup s guessed -> pure x | S.member s seen -> pure Bad | otherwise -> do modify $ first $ S.insert s b <- visit =<< lift (memoDerive s) unless (isBad b) $ modify $ second $ M.insert s b pure b mul :: Ast -> Ast -> Ast mul Bad _ = Bad mul _ Bad = Bad mul x y = x :@ y -- | Helps cut a non-empty parse forest down to one tree. chainsaw :: [Ast] -> Ast chainsaw xs | null xs' = Bad | otherwise = head xs' where xs' = filter (not . isBad) xs
Memoized derivatives are straightforward. For computing derivatives, we translate the rules given in the paper, and for memoization, on discovering a missing entry, we insert a knot-tying value before recursing, and replace it with the result of the recursion afteward.
memoDerive :: String -> State Grammar Pe memoDerive cs@(c:s) = do m <- get unless (M.member cs m) $ do modify $ M.insert cs $ NT cs d <- derive c =<< memoDerive s modify $ M.insert cs d gets (M.! cs) memoDerive _ = error "unreachable" derive :: Char -> Pe -> State Grammar Pe derive c pe = case pe of NT s -> pure $ NT $ c:s Ch x | x == c -> pure $ Eps x Or xs -> Or <$> mapM (derive c) xs Del x :. y -> (Del x :.) <$> derive c y x :. y -> do b <- parseNull x dx <- derive c x if isBad b then pure $ dx :. y else do dy <- derive c y pure $ Or [dx :. y, Del x :. dy] _ -> pure Nul
Here’s the grammar that Cox claims will grind our parser to a halt:
cox :: Grammar cox = M.fromList [ ("S", NT "T") , ("T", Or [NT "T" :. (Ch '+' :. NT "T"), NT "N"]) , ("N", Ch '1') ]
Let’s try it on a small input in an interactive interpreter:
parse cox "S" "1+1+1"
The parser picks a particular parse tree:
(Lf '1' :@ (Lf '+' :@ Lf '1')) :@ (Lf '+' :@ Lf '1')
How about all strings of length 7 consisting of
filter (not . isBad . parse cox "S") $ replicateM 7 "+1"
Thankfully, we get:
At last, it’s time to demolish Cox’s claims. We parse an 80-character input with a typo near the end:
main :: IO () main = print $ parse cox "S" $ concat (replicate 39 "1+") ++ "+1"
Our prototype is awful. We really should:
Add a slimmed down version of
parseNullthat returns a boolean instead of an AST, and call this in
derive. We only want to recover the AST once the whole string has been parsed; the rest of the time, we only care whether an expression is nullable.
Use a better algorithm for finding the least fixed point. We’ve perhaps chosen the clunkiest and most obvious method.
Remove a layer of indirection when tying the knot. That is, point to a node directly rather than a string (which requires another lookup to get at the node).
Apply algebraic identities to reduce the number of nodes in parsing expressions and abstract syntax trees.
And yet, on my laptop:
Bad real 0m0.220s user 0m0.215s sys 0m0.005s
Clearly, parsing with derivatives is efficient when run on the allegedly exponential-running-time example given by Cox.
The moral of the story
It’s best to test drive an algorithm before condemning it. If we see hilariously bad running times, then we can include them to hammer our points home. If we see surprisingly good running times, then there’s a mistake in our reasoning and we should keep quiet until we successfully attack the algorithm from another angle. (Cox rightly notes parsing with derivatives forgoes two key properties of yacc: linear running time and ambiguity detection. If only he had focused on these trade-offs.)
Is this practicable for parsing with derivatives? Well, we have presented an entire program, yet we have written less code than appears in Cox’s excellent article on regular expressions, which quotes just a few choice cuts from a presumably complete program. Indeed, with a splash of HTML, we can easily build an interactive online demo of parsing with derivatives.
The existence of the flawed post indicates no such sanity check was done. This was caused by poor understanding of lazy evaluation, or because it was deemed too troublesome to implement a lazy algorithm. Both problems are solved by learning a lazy language.
In sum, insufficient experience with lazy evaluation leads to faulty time complexity analysis. Therefore we should all be comfortable with lazy languages so computer science can progress unimpeded.