Tuesday, August 12, 2014

Let's Code!

A recent article in Dr. Dobb’s Journal bemoans the complexity of today’s development toolchains: “it’s hard to get any real programming done”. However, my own experience suggests the opposite: I find programming is now easier than ever, partly due to better tools.

I say “partly” because when I was a kid, it was difficult to obtain code, compilers, and documentation, let alone luxuries like an SCM. I scoured public libraries for books on programming and checked out what they had, which meant I studied languages which I could never use because I lacked the right compiler, or even the right computer. I nagged my parents to buy me expensive books, and occasionally they’d succumb. Perhaps the most cost-efficient were magazines containing program listings which of course had to be keyed in by hand. (One of my most treasured was an issue of Dr. Dobb’s Journal, back when it was in print, and only in print.)

Nowadays, a kid can get free high-quality compilers, code, tutorials, and more at the click of a button. But I believe even without this freer flow of information, programming would still be easier than ever because our tools have improved greatly.


got git?

The author singles out Git as a source of trouble, but the reasoning is suspect. For example, we’re told that with respect to other “SCMs you’ve used…Git almost certainly does those same actions differently.”

This suggests that the author used other SCMs, then tried Git and found it confusing. In contrast, I used Git, then tried other SCMs and found them confusing. I predict as time passes, more and more developers will learn Git first, and their opinions of SCMs will mirror mine.

Nevertheless, I’m leery of ranking the friendliness of tools by the order you picked them up. I hereby propose a different yardstick. Take Git, and a traditional SCM. Implement, or at least think about implementing, a clone of each from scratch; just enough so it is self-hosting. Then the one that takes less time to implement is simpler.

I wrote a self-hosting Git clone in a few hours: longer than expected because I spent an inordinate amount of time debugging silly mistakes. Though I haven’t attempted it, I would need more time to write a clone of Perforce or Subversion (pretty much the only other SCMs I have used). With Git, there’s no transactions, revision numbers, rename tracking, central servers, and so on; Git is essentially SHA-1 hashes all the way down.

But let’s humour the author and suppose Git is complex. Then why not use tarballs and patches? This was precisely how Linux was managed for 10 years, so should surely suffice for a budding developer. In fact, I say you should only bother with Git once you realize, firstly, you’re addicted to coding, and secondly, how annoying it is to manage source with tarballs and patches!

In other words, although Git is handy, you only really need it when your project grows beyond a certain point, by which time you’ve already had plenty of fun coding. Same goes for tools like defect trackers.


Apps and Oranges

I agree that developing for mobiles is painful. However, comparing this against those “simple programs of a few hundred lines of C++ long ago” is unfair. With mobile apps, the program usually runs on a system different to the one used to write the code.

It might be fairer to compare writing an mobile app with, say, programming a dot matrix printer of yesteryear, as in both cases the target is different to the system used to write the code. I once did the latter, for the venerable Epson MX-80: after struggling with a ton of hardware-specific low-level nonsense, I was rewarded with a handful of crummy pictures. I would say it involved less “real programming” than writing an Android app.

All the same, I concede that writing Android software is harder than it should be, largely due to Java. But firstly, a mobile phone involves security and privacy issues that would never arise with a dot matrix printer, which necessarily implies more bookkeeping, and secondly, the Java problem can be worked around: either via native code, or a non-Java compiler that generates Dalvik bytecode. [I’ve only mentioned Android throughout because it’s the only mobile platform I’ve developed on.]

Comparing server-side web apps with the good old days is similarly unfair unless the good old days also involved networks, in which case they were really the bad old days. PC gamers of a certain age may remember a myriad of mysterious network options to configure multiplayer mode; imagine the even more mysterious code behind it. As for cloud apps, I would rather work on a cloud app than on an old-school equivalent: BBS software, which involves renting out extra phones lines if you want high availability.

What about client-side web apps? As they can run on the same system used to develop them, it is therefore fair to compare developing them against writing equivalent code in those halcyon days of yore. Let’s look at a couple of examples.


Tic-tac-toe

I wrote a tic-tac-toe web app with an AI that plays perfectly because it searches the entire game tree; modern hardware and browsers are so fast that this is bearable (though we’re spared one ply because the human goes first). It works on desktops, laptops, tablets, phones: anything with a browser.

Here’s the minimax game tree search, based on code from John Hughes, Why Functional Programming Matters:

score (Game _ Won 'X') = -1
score (Game _ Won 'O') = 1
score _ = 0

maximize (Node leaf []) = score leaf
maximize (Node _ kids) = maximum (map minimize kids)

minimize (Node leaf []) = score leaf
minimize (Node _ kids) = minimum (map maximize kids)

Despite my scant Haskell knowledge and experience, the source consists of a single file containing less than 150 lines like the above, plus a small HTML file: hardly a “multiplicity of languages”. Writing it was enjoyable, and I did so with a text editor in a window 80 characters wide.

Let’s rewind ten to twenty years. I’d have a hard time achieving the brevity and clarity of the above code. The compiler I used didn’t exist, and depending how far back we go, neither did the language. Not that I’d consider compiling to JavaScript in the first place: depending how far back we go, it was too slow or didn’t exist.


Netwalk

In my student days, I developed a clone of a Windows puzzle game named Netwalk. I chose C, so users either ran untrusted binaries I supplied (one for each architecture), or built their own binaries from scratch. Forget about running it on phones and PDAs.

I managed my files with tarballs and patches. The source consisted of a few thousand lines, though admittedly much of it is GUI cruft: menus, buttons, textboxes, and so on. Lately, I hacked up a web version of Netwalk. The line count? About 150.

Thanks to Git, you can view the entire source right now on Google Code or GitHub, all dolled up with syntax highlighting and line numbers.

Building native binaries has a certain charm, but I have to admit that a client-side web app has less overhead for developers and users alike. I only need to build the JavaScript once, then anyone with a browser can play.

Thus in this case, my new tools are better than my old tools in every way.


Choose Wisely

The real problem perhaps is the sheer number of choices. Tools have multiplied and diversified, and some indeed impede creativity and productivity. But others are a boon for programmers: they truly just let you code.

Which tools are the best ones? The answer probably depends on the person as well as the application, but I will say for basic client-side web apps and native binaries, I heartily recommend my choices: Haskell, Haste, Git.

I’m confident the above would perform admirably for other kinds of projects. I intend to find out, but at the moment I’m having too much fun coding games.

Play Now!

Tuesday, July 29, 2014

15 Shades of Grey

John Carmack indirectly controlled significant chunks of my life. For hours at a time, I would fight in desperate gun battles in beautiful and terrifying worlds he helped create. On top of this, the technical wizardry of id Software’s games inspired me to spend yet more hours learning how they managed to run Wolfenstein 3D and Doom on PCs, in an era when clockspeeds were measured in megahertz and dedicated graphics cards were rare.

I read about cool tricks like binary space partitioning, and eventually wrote a toy 3D engine of my own. The process increased my respect for the programmers: it’s incredibly difficult to get all the finicky details right while sustaining good frame rates.

Accordingly, I paid close attention when John Carmack spoke about programming languages in his QuakeCon 2013 keynote. Many people, myself included, have strong opinions on programming languages, but few have a track record as impressive as his.


Carmack’s Sneaky Plan

I was flabbergasted by Carmack’s thoughts on the Haskell language. He starts by saying: “My big software evolution over the last, certainly three years and stretching back tendrils a little bit further than that, has been this move towards functional programming style and pure functions.”

He then states that not only is Haskell suitable for programming games, but moreover, thinks Haskell may beat typical languages by roughly “a factor of two”, which “would be monumental” and “a really powerful thing for game development”. He has even begun reimplementing Wolfenstein 3D in Haskell as part of a “sneaky plan” to convince others.

Wow! I had always thought Haskell was a pretty but impractical language. I loved composing elegant Haskell snippets to solve problems that one might encounter in interviews and programming contests, but for real stuff I resorted to C.

Among my concerns is garbage collection: I have bad memories of unexpected frequent pauses in Java programs. But Carmack notes that Haskell’s almost uncompromising emphasis on purity simplifies garbage collection to the point where it is a predictable fixed overhead.

A second concern is lazy evaluation. It’s easy to write clear and simple but inefficient Haskell: computing the average of a list of numbers comes to mind. Carmack is also “still not completely sold on the value of laziness”, but evidently it’s not a showstopper for him. I suppose it’s all good so long as there are ways of forcing strict evaluation.

A third concern (but probably not for Carmack) is that I don’t know how to write a Haskell compiler; I’m more at ease with languages when I know how their compilers work. I can ignore this discomfort, though I intend to overcome my ignorance one day. I’m hoping it’s mostly a matter of understanding Hindley-Milner type inference.

Speaking of types, Carmack is a fan of static strong typing, because in his experience, “if it’s syntactically legal, it will make it into the codebase”. He notes during his recent foray into Haskell, the one time he was horribly confused was due to untyped data from the original Wolfenstein 3D.


My Obvious Plan

Once again, I’m inspired by Carmack. I plan to take Haskell more seriously to see if it really is twice as good. Although I lack the resources to develop a complex game, I may be able to slap together a few prototypes from time to time.

First up is the 15-Puzzle by Noyes Palmer Chapman with a cosmetic change: to avoid loading fonts and rendering text, I replaced the numbers 1 to 15 with increasingly darker shades of grey.

I began with a program depending on SDL. The result was surprisingly playable, and I found the source code surprisingly short in spite of my scant knowledge of Haskell. To better show off my work, I made a few edits to produce a version of my program suitable for the Haste compiler, which compiles Haskell to JavaScript. I added mouse support and tweaked the HTML so the game is tolerable on tablets and phones.

Play now!

Sunday, May 25, 2014

Straw Men in Black

There’s a phrase used to praise a book: “you can’t put it down”. Unfortunately, I felt the opposite while reading The Black Swan by Nassim N. Taleb.

I’ll admit some prejudice. We’re told not to judge a book by its cover, but review quotes in the blurb ought to be exempt. One such quote originated from Peter L. Bernstein, the author of Against the Gods. While I enjoyed reading it, his book contained a litany of elementary mathematical mistakes. Did this mean The Black Swan was similarly full of errors?

All the same, the book began well. Ideas were clear and well-expressed. The writing was confident: perhaps overly so, but who wants to read text that lacks conviction? It promised wonders: we would learn how statisticians have been fooling us, and then learn the right way to deal with uncertainty, with potentially enormous life-changing payoffs.

I failed to reach this part because several chapters in, I was exhausted by a multitude of issues. I had to put the book down. I intend to read further once I’ve recovered, and hopefully the book will redeem itself. Until then, here are a few observations.


One Weird Trick

What’s on the other end of those "one weird trick" online ads? You won’t find out easily. If clicked, one is forced to sit through a video that:

  • makes impressive claims about a product

  • takes pains to keep the product a secret

  • urges the viewer to wait until the end, when they will finally learn the secret

This recipe must be effective, because I couldn’t help feeling the book was similar. It took me on a long path, meandering from anecdote to anecdote, spiced with poorly constructed arguments and sprinkled with assurances that the best was yet to come.

Perhaps this sales tactic has become a necessary evil. With so much competition, how can a book distinguish itself? Additionally, I’m guessing fattening the book for any reason has a positive effect on sales.

Even so, the main idea of the book could be worth reading. I’ll post an update if I find out.


Lay Off Laplace

Chapter 4 features a story about a turkey. As days pass, a turkey’s belief in the proposition such as "I will be cared for tomorrow" grows ever stronger, right until the day of its execution, when its belief turns out to be false. This retelling of a parable about a chicken due to Bertrand Russell is supposed to warn us about inferring knowledge from observations, a repeated theme in the book.

But what about Laplace’s sunrise problem? By the Rule of Succession, if the sun rose every day for 5000 years, that is, for 5000 × 365.2426 days, the odds it will rise tomorrow are only 1826214 to 1. Ever since Laplace wrote about this, he has been mercilessly mocked because of this ludicrously small probability.

So which is it? Do repeated observations make our degrees of belief too strong (chicken) or too weak (sunrise)?

Live long and prosper

Much of this material is discussed in Chapter 18 of Probability Theory: The Logic of Science by Edwin T. Jaynes, which also contains the following story.

A boy turns 10 years old. The Rule of Succession implies the probability he lives one more year is (10 + 1) / (10 + 2), which is 11/12. A similar computation shows his 70-year old grandfather will live one more year with probability 71/72.

I like this example, because it contains both the chicken and the sunrise problem. Two for the price of one. Shouldn’t the old man’s number be lower than the young boy’s? One number seems too big and the other too small. How can the same rule be wrong in two different ways?

Ignorance is strength?

What should we do to avoid these ridiculous results?

Well, if the sun rose for every day for 5000 years and that is all you know, then 1826214 to 1 is correct. The only reason we think this is too low is because we know a lot more than the number of consecutive sunrises: we know about stars, planets, orbits, gravity, and so on. If we take all this into account, our degree of belief that the sun rises tomorrow grows much stronger.

The same goes for the other examples. In each one:

  1. We ignored what we know about real world.

  2. Calculated based on what little data was left.

  3. Un-ignored the real world so we could laugh at the results.

In other words, we have merely shown that ignoring data leads to bad results. It’s as obvious as noting that if you shut your eyes while driving a car, you’ll end up crashing.

Sadly, despite pointing this out, Laplace became a victim of this folly. Immediately after describing the sunrise problem, Laplace explains that the unacceptable answer arises because of wilfully neglected data. For some reason, his critics take his sunrise problem, ignore his explanation for the hilarious result, then savage his ideas.

The Black Swan joins the peanut gallery in condemning Laplace. However, its conclusion differs from those of most detractors. The true problem is that most of the data is ignored when computing probabilities. Taleb considers addressing this by ignoring even more data! This begs the question: why not toss out more? Why not throw away most of mathematics and assign arbitrary probabilities to arbitrary assertions?

Orthodox statistics is indeed broken, but not because more data should be ignored. It’s broken for the opposite reason: too much data is being ignored.

Poor Laplace. Give the guy a break.


Hempel’s Joke

Stop me if you’ve heard this one: 2 + 2 = 5 for sufficiently large values of 2. This is obviously a joke (though sometimes told so convincingly that the audience is unsure).

Hempel’s Paradox is a similar but less obvious joke that proceeds as follows. Consider the hypothesis: all ravens are black. This is logically equivalent to saying all non-black things are non-ravens. Therefore seeing a white shoe is evidence supporting the hypothesis.

The following Go program makes the attempted humour abundantly clear:

package main

import "fmt"

func main() {
state := true
for {
var colour, thing string
if _, e := fmt.Scan(&colour, &thing); e != nil {
break
}
if thing == "raven" && colour != "black" {
state = false
}
fmt.Println(" hypothesis:", state)
}
}

A sample run:

black raven
hypothesis: true
white shoe
hypothesis: true
red raven
hypothesis: false
black raven
hypothesis: false
white shoe
hypothesis: false

The state of the hypothesis is represented by a boolean variable. Initially the boolean is true, and it remains true until we encounter a non-black raven. This is the only way to change the state of the program: neither "black raven" nor "white shoe" has any effect.

Saying we have "evidence supporting the hypothesis" is saying there are truer values of true. It’s like saying there are larger values of 2.

The original joke exploits the mathematical concept “sufficiently large” which has applications, but is absurd when applied to constants.

Similarly, Hempel’s joke exploits the concept "supporting evidence", which has applications, but is absurd when applied to a lone hypothesis.

Off by one

If we want to talk about evidence supporting or undermining a hypothesis mathematically, we’ll need to advance beyond boolean logic. Conventionally we represent degrees of belief with numbers between 0 and 1. The higher the number, the stronger the belief. We call these probabilities.

Next, we propose some mutually exclusive hypotheses and assign probabilities between 0 and 1 to each one. The sum of the probabilities must be 1.

If we take a single proposition by itself, such as "all ravens are black", then we’re forced to give it a probability of 1. We’re reduced to the situation above, where the only interesting thing that can happen is that we see a non-black raven and we realize we must restart with a different hypothesis. (In general, probability theory taken to extremes devolves into plain logic.)

We need at least two propositions with nonzero probabilties for the phrase "supporting evidence" to make sense. For example, we might have two propositions A and B, with probabilities of 0.2 and 0.8 respectively. If we find evidence supporting A, then its probability increases and the probability of B decreases accordingly, for their sum must always be 1. Naturally, as before, we may encounter evidence that implies all our propositions are wrong, in which case we must restart with a fresh set of hypotheses.

To avoid nonsense, we require at least two mutually exclusive propositions, such as A: "all ravens are black", and B: "there exists a non-black raven", and each must have a nonzero probability. Now it makes sense to ask if a white shoe is supporting evidence. Does it support A at B’s expense? Or B at A’s expense? Or neither?

The propositions as stated are too vague to answer one way or another. We can make the propositions more specific, but there are infinitely many ways to do so, and the choices we make change the answer. See Chapter 5 of Jaynes.

One Card Trick

Instead of trying to flesh out hypotheses involving ravens, let us content ourselves with a simpler scenario. Suppose a manufacturer of playing cards has a faulty process that sometimes uses black ink instead of red ink to print the entire suit of hearts. We estimate one in ten packs of cards have black hearts instead of red hearts and is otherwise normal, while the other nine decks are perfectly fine.

We’re given a pack of cards from this manufacturer. Thus we believe the hypothesis A: "all hearts are red" with probability 0.9, and B: "there exists a non-red heart" with probability 0.1. We draw a card. It’s the four of clubs. What does this do to our beliefs?

Nothing. Neither hypothesis is affected by this irrelevant evidence. I believe this is at least intuitively clear to most people, and furthermore, had Hempel spoke of hearts and clubs instead of ravens and shoes, his joke would have been more obvious.

Great Idea, Poor Execution

The Black Swan attacks orthodox statistics using Hempel’s paradox, alleging that it shows we should beware of evidence supporting a hypothesis.

It turns out orthodox statistics can be attacked with Hempel’s paradox, but not by claiming "supporting evidence" is meaningless. That would be like claiming "sufficiently large" is meaningless.

Instead, Hempel’s joke reminds us we must consider more than one hypothesis if we want to talk about supporting evidence. This may seem obvious; assigning a degree of belief in a lone proposition is like awarding points in a competition with only one contestant.

However, apparently it is not obvious enough. The Black Swan misses the point, and so did my university professors. My probability and statistics textbook instructs us to consider only one hypothesis. (Actually, it’s worse: one of the steps is to devise an alternate hypothesis, but this second hypothesis is never used in the procedure!)


Mathematics Versus Society

In an off-hand comment, Taleb begins a sentence with “Mathematicians will try to convince you that their science is useful to society by…”

By this point, I already found faults. First and foremost: how often do mathematicians talk about their usefulness to society? There are many jokes about mathematicians and real life, such as:

Engineers believe their equations approximate reality. Physicists believe reality approximates their equations. Mathematicians don’t care.

The truth is being exaggerated for humour, but asserting their work is useful in the real world is evidently a low priority for mathematicians. It is almost a point of pride. In fact, Taleb himself later quotes Hardy:

The “real” mathematics of the “real” mathematicians…is almost wholly “useless”.

This outlook is not new. Gauss called number theory “the queen of mathematics”, because it was pure and beautiful and had no applications in real life. (He had no way of foreseeing that number theory would one day be widely used in real life for secure communication!)

But sure, whatever, let’s suppose mathematicians go around trying to convince others that their field is useful to society. [Presumably Hardy would call such a mathematician “imaginary” or “complex”.] They are trivially right. If you try to talk about how useful things are to society, then you’ll want to measure and compare usefulness of things, all the while justifying your statements with sound logical arguments. Measuring and comparing and logic all lie squarely in the domain of mathematics.


Jumping to Conclusions

So far, I feel the author’s heart is in the right place but his reasoning is flawed. Confirmation bias is indeed pernicious, and orthodox statistics is indeed erroneous. However, The Black Swan knocks down straw men instead of hitting these juicy targets.

The above are but a few examples of the difficulties I ran into while reading the book. I had meant to pick apart more specious arguments but I’ve already written more than I had intended.

Again, I stress I have not read the whole work, and it may improve in the second half.

Friday, April 18, 2014

Crit-bit trees yet again

I’ve been pleased with my recent implementation of crit-bit trees based on Adam Langley’s code, but in the back of my mind, I’ve been bothered by Bernstein’s statement that the overhead on top of each string stored need only be two control words: one pointer and one integer.

To obtain relief, I updated my library so that internal nodes only have a single child pointer instead of one pointer each for the left and right child nodes. In a crit-bit tree, an internal node always has two children, so we may as well allocate sibling nodes in one contiguous chunk of memory.

Bernstein suggests storing strings directly in the crit-bit tree, which can be done by storing the left string backwards: it resides to the left of the integer control word. I’ve chosen an alternate scheme which he also mentions: expressing the strings as pointers in a separate storage pool.

Removing one pointer per internal node has several benefits on a 64-bit machine. Firstly, malloc is typically 16-byte aligned, which means that the 24-byte internal nodes of the original version were actually taking 32 bytes. In contrast, now we only need just one pointer and one integer, so we can fit internal nodes within 16 bytes.

Secondly, we no longer have to tag pointers. Instead, 8 bytes hold an unadulterated child pointer, and the other 8 bytes store the crit-bit position as well as one bit that indicates whether the current node is internal or not.

Thirdly, allocating two nodes at once only requires a single malloc() call. Before, we would call malloc() to allocate a new leaf node, and again to allocate a new internal node.

Again, "unrolling" the child pointer lookup produced better benchmarks on my system, namely, instead of p = p->kid + predicate(), we write p = predicate() ? p->kid + 1 : p->kid. This obviated the need for some fun but complex micro-optimizations. While I was at it, I unrolled a few other routines.

Benchmarks:

Table 1. Keys: /usr/share/dict/words
old new

insert

0.066135

0.056116

get

0.043584

0.040845

iterate

0.031851

0.015135

allprefixed

0.013285

0.011924

delete

0.048030

0.041754

overhead

3966824

3173464

Table 2. Keys: output of seq 2000000
old new

insert

2.700949

2.359999

get

2.304070

2.214699

iterate

0.912950

0.472594

allprefixed

0.295322

0.264326

overhead

79999984

63999992

delete

2.339102

2.177294

The insert benchmark of my old library is slightly worse than the one I originally posted because I have since tweaked the code to make it easy to carry out operations such as "insert or replace" or "insert if absent" in a single step.

https://github.com/blynn/blt

Thursday, December 19, 2013

Reentrant parsers with Flex and Bison

By default, Flex and Bison generate old-school code with global variables. Trawling the manuals to find the options that generate re-entrant code is tedious, so I’m recording a small example that works on my system (which has Bison 2.7.12 and Flex 2.5.35).

Flex preamble

With these options, yylval is now a pointer. When converting existing Flex source, we mostly replace with yylval with *yylval.

%option outfile="flex.c" header-file="flex.h"
%option reentrant bison-bridge
%option noyywrap nounput noinput

%{
#include "bison.h"
%}

Bison preamble

%output  "bison.c"
%defines "bison.h"
%define api.pure full
%lex-param { yyscan_t scanner }
%parse-param { yyscan_t scanner }
%parse-param { val_callback_t callback }

%code requires {
#include "val.h"
#define YYSTYPE val_ptr
#ifndef YY_TYPEDEF_YY_SCANNER_T
#define YY_TYPEDEF_YY_SCANNER_T
typedef void *yyscan_t;
#endif
}

%code {
#include "flex.h"
int yyerror(yyscan_t scanner, val_callback_t callback, const char *msg) {
return 0;
}
}

val.h: semantic values

Rather than use the %union Bison declaration or similar, I prefer to define the type that holds the semantic values in a C source file. In general, I like to minimize the amount of C in the Bison and Flex source.

enum {
T_INT,
T_STRING,
};

struct val_s {
int type;
struct {
char *s;
struct val_s **kid;
int nkid;
};
};
typedef struct val_s *val_ptr;
typedef int (*val_callback_t)(val_ptr);

Calling the parser

Because the parser is no longer global, we must initialize and pass a yyscan_t variable to Bison and Flex.

  yyscan_t scanner;
if (yylex_init(&scanner)) exit(1);
YY_BUFFER_STATE buf = NULL;
// Uncomment to parse from a string instead of standard input.
// buf = yy_scan_string("input string", scanner);
int f(struct val_s *v) {
val_print_pre(v);
putchar('\n');
val_print_tree("", v);
val_free(v);
return 0;
}
if (yyparse(scanner, f)) exit(1);
yy_delete_buffer(buf, scanner);
yylex_destroy(scanner);

Complete example

See https://github.com/blynn/symple/, which reads an expression and pretty-prints it:

$ ./main 'sin(x)*cos(y) + e^x'
+(*(sin(x), cos(y)), ^(e, x))
+─┬─*─┬─sin───x
│ └─cos───y
└─^─┬─e
└─x

Thursday, November 21, 2013

To Brute-Force A Mockingbird

To Mock a Mockingbird by Raymond M. Smullyan should be required reading for any fan of the programming language Haskell. We learn combinatory logic through a series of delightful puzzles, almost without realizing.

We’re asked to imagine a forest populated by talking birds. On encountering one of these birds, we may call out the name of any bird. In response, the bird will say the name of some bird in the forest. (The reply could be the same bird we named, or the bird’s own name, or any other bird.)

An enchanted forest populated by birds is disarmingly endearing. We’re almost unaware we’re actually dealing with a set of functions that take a function as input and return a function. The evocative backdrop also pays homage to Haskell Curry, who was an avid birdwatcher.

One puzzle challenges us to find an egocentric bird given that a lark lives in the forest. Or, using mundane terminology, given a combinator L such that (Lx)y = x(yy) for all x and y, construct a combinator E such that EE = E.

The author states his solution is “a bit tricky” and consists of 12 correctly parenthesized Ls. Furthermore, the author states he doesn’t know if a shorter solution exists.

To maximize the likelihood of solving this puzzle, the reader should take advantage of facts learned from previous puzzles, and build up to the solution in stages. But that’s only if you’re playing fair and using pencil and paper! Instead, I saw an opportunity to bust out one of my favourite snippets of code.

Seeing the forest for the trees

Let us first recast the problem in terms of trees. Instead of Ls and parentheses, we work with syntax trees. In other words, we work with full binary trees where each leaf node corresponds to an L, and to evaluate an internal node, we recursively evaluate its child nodes, then apply the left child to the right child. (In Smullyan’s terminology, we call out the name of the bird represented by the right child to the bird represented by the left child.)

In this setting, the puzzle is to find a full binary tree such that repeatedly transforming parts of the tree according to the rule (Lx)y = x(yy) produces a tree where both of the root node’s children are identical to the original tree.

Hence to solve with brute force, we need only generate all full binary trees containing up to 12 leaf nodes, and for each one see if we can transform the tree into two copies of itself.

Here’s where my treasured routine comes in. The following roughly describes how to call a function on every full binary tree with exactly n leaf nodes:

  • Allocate a node x.

  • If n is 1, then mark x as a leaf, call the function, then return.

  • Otherwise mark x as an internal node, and for every 0 < k < n:

    • For every full binary tree y with k leaf nodes:

      • Set the left child of x to y.

      • For every full binary tree z with n - k leaf nodes:

        • Set the right child of x to z.

        • Call the function.

We generate the left and right subtrees by calling this algorithm recursively. More precisely, in Go:

type node struct {
kind int // 0 = leaf, 1 = branch.
left, right *node
}

// For each full binary tree with n leaf nodes,
// sets '*p' to a pointer to the tree and calls the given function.
func forall_tree(p **node, n int, fun func()) {
var t node
*p = &t
if (n == 1) {
t.kind = 0
fun()
return
}
t.kind = 1
for k := 1; k < n; k++ {
forall_tree(&t.left, k, func() {
forall_tree(&t.right, n - k, fun)
})
}
}

I was proud when I found this gem a few years ago while working on a Project Euler problem, though I’d be shocked if it were original. [Actually, my first version preallocated an array of 2n - 1 nodes and used indices instead of pointers save a bit of time and space, but this is less elegant.]

For example, we can print the first 10 Catalan numbers:

func main() {
for k := 1; k <= 10; k++ {
var root *node
n := 0
forall_tree(&root, k, func() { n++ })
println(n)
}
}

Or print all full binary trees with exactly 6 leaf nodes, as parenthesized expressions:

func tree_print(p *node) {
if p.kind == 1 {
print("(")
tree_print(p.left)
tree_print(p.right)
print(")")
return
}
print("L")
}

func main() {
var root *node
forall_tree(&root, 6, func() { tree_print(root); println() })
}

With a little more effort, we can write a program that solves the puzzle. However, some care is needed: if we replace subtrees of the form (Lx)y with x(yy) and vice versa without rhyme nor reason, we’ll have no idea when we’ll finish and we’ll only stumble across a solution by chance.

Instead, we observe that (Lx)y is either strictly smaller than x(yy), or has the form (Lx)L. Let us say that we are reducing the tree when we replace x(yy) by (Lx)y, and expanding when we perform the reverse. Thus rather than start from a tree t and repeatedly applying the rule to obtain the tree tt, we do the reverse: we start from tt, and consider reductions only. The above observation implies that every sequence of reductions must terminate eventually.

But what if we need to temporarily expand tt before reducing it in order to reach t? Let’s optimistically hope that Smullyan’s 12-L solution was sufficiently expanded; that is, only reductions are needed to go from tt to t, where t is his solution.

Multiple subtrees may be reducible, and choosing the wrong one prevents future reductions necessary to reach the goal. We therefore try every path: for each possible reduction, we apply it and recurse. This leads to a problem of wastefully repeating many computations because there can be several ways to arrive at the same tree. We tackle this in an obvious manner: by remembering the trees we’ve seen so far.

I wrote solutions in GNU C and Go. The Go solution is a bit too slow for comfort. The C code is slightly clumsier mainly because I had to name the anonymous functions (though one can define a macro to work around this). C also lacks a map data structure, but this was no problem thanks to my recently released BLT library.

Results (Spoiler Alert)

Optimism paid off. On my laptop, my C program took well under a minute to find 4 solutions:

(((L(LL))(L(LL)))((L(LL))(L(LL))))
(((L(LL))((LL)L))((L(LL))((LL)L)))
(((L(LL))((LL)L))(((LL)L)((LL)L)))
((((LL)L)((LL)L))(((LL)L)((LL)L)))

The Go version took about 10 minutes.

These all contain 12 Ls, so in a sense, Smullyan’s solution is minimal. Since no other strings are printed, these four 12-L solutions are minimal when only reductions are permitted.

If we allow expansions (that is, (Lx)y → x(yy)), then firstly, we have at least 24 = 16 solutions of length 12, since in this case (L(LL)) and (LL(L)) are interchangeable, and secondly, we can reduce the above strings to find shorter solutions. For example, the solution:

(((L(LL))(L(LL)))((L(LL))(L(LL))))

reduces to:

(L((L(LL))(L(LL))))(L(LL))

which further reduces to:

((LL)(L(LL)))(L(LL))

which only has 8 Ls. I doubt Smullyan missed this. My guess is he meant that if you solve the problem the clever way, then you arrive at an expression with 12 Ls; reductions should be ignored because they only obscure the ingenuity of the solution.

Is there, say, a 7-L expression that expands to some expression (that is necessarily longer than 24 Ls) which can be reduced to half of itself? I think not, but I have no proof.

Exercise: Four Fours

I wanted to do something with the Go version of the the forall_tree() routine, so I tweaked it to solve the four fours puzzle. I just ploughed through all possible trees and evaluated each one; there are only 320 to do. For larger variants of the puzzle, I’d use dynamic programming; that is, memoize subtrees and their values. Division by zero is annoying, but I managed to keep the tree evaluation function short and sweet by using Go’s panic-recover mechanism.

Recursion: the best thing since recursion

The forall_tree() function is but one example of the eloquence of anonymous functions and recursion. For similar reasons, nested functions are also indispensable. We attain economy of expression by letting the stack automatically take care of the heavy lifting.

Curiously, early structured languages including ALGOL, Simula, and Pascal supported nested functions, but C shied away from this beautiful feature.

Its absence is sorely missed, as can be seen in C-style callbacks. An ugly void * argument is inevitably passed and cast. For instance, take the udata parameter in the SDL audio API.

Its absence is also egregiously gratuitous. GCC supports nested functions by using this one weird trick [compiler writers hate him!]. One might complain this weird trick conflicts with executable space protection, but executable space protection is worse than pointless thanks to return-oriented programming.

Code Complete

Wednesday, November 13, 2013

Chocolate, Logic Puzzles, and Dancing Links

Early last year, I visited a cafe that awards chocolate for solving logic puzzles. Naturally, I couldn’t resist free chocolate, and afterwards, just as naturally, I couldn’t resist thinking about programs that solve logic puzzles.

I’ve had to write such programs before for homework assignments or to test out frameworks. But oddly, I had never put much effort into it. I loved logic puzzles as a kid, even going so far as to buy a magazine or two that were filled with grids and clues. Why hadn’t I already written a decent tool to help?

Better late than never. After a spate of intense coding, I had a program that read clues in a terse text format and used brute force to find the solution. I spent most of my time devising the mini-language for the clues rather than the algorithm, as I figured the bottleneck would be the human entering the clues.

My solver worked well enough on a few examples, including the puzzle I solved to get a chocolate. But then I tried my program on the Zebra Puzzle. I was too impatient to let it finish. After a little thought, it was clear why.

On the grid

Logic grid puzzles can be abstracted as follows. We are given a table with M rows and N columns, and each cell contains a unique symbol. Our goal is to rearrange the symbols within each row except for those in the first row so that given constraints are satisfied. To be clear, symbols must stay in the row they started in, but apart from the first row, they can change places with other symbols in their row.

Some examples of constraints:

  • symbols A and B must be in the same column.

  • symbols A, B, and C must be in distinct columns.

  • symbol A’s column must be exactly one to the left of symbol B’s column.

We fix the first row because of contraints such as the last example: clues often refer to the order of the elements in the first row in the input table. Let us call them order contraints. This inspires the following convenient relabeling. Without loss of generality, let the symbols of the first row be 1, …, N from left to right.

My brute force solver generates every possible table and prints the ones that satisfy all given constraints. That means it has to examine up to N!(M-1) cases: there are N! permutations of the symbols in each row, and we have M-1 rows to rearrange. For the Zebra Puzzle, this is 1205.

Got it covered

I needed a smarter approach. Since I had already coded a sudoku solver, I chose the same approach, namely, represent a puzzle as an instance of the exact cover problem, then use the dancing links algorithm.

Firstly, we populate a set X with all the symbols in the table. Next, instead of generating every table, generate every possible column. Each such column corresponds to the subset of X consisting of the symbols in the column. Generating every possible column means brute force is still present, but to a lesser degree.

An exact cover of X corresponds to a collection of columns such that no two columns contain the same symbol, and furthermore, each symbol appears in one of the columns. Thus these columns can be joined together to form a candidate solution to the logic puzzle: we merely order them so that the first row reads 1, …, N.

It remains to disallow covers that violate the constraints. For some constraints, we achieve this by omitting certain columns. For example, suppose A and B must be in the same column. When we generate a column that only contains one of A or B, we omit it from the exact cover instance. Similarly, if a constraint requires that A and B lie in distinct columns, we omit subsets that contain both symbols from the exact cover instance.

Out of order

The above suffices for many puzzles, but falls short for those with order constraints such as "A and B lie in adjacent columns". For this particular constraint, we add N elements to X. Let us call them x[1], …, x[N]. Given a generated column whose first row contains the number n (recall we have relabeled so that the first row of the input table is 1, …, N), if the column contains:

  • both A and B: eliminate the column from consideration.

  • A and not B: we add x[n] to the column’s corresponding subset.

  • B and not A: we add x[k] to the column’s corresponding subset for all k in 1..N where |n-k| is not 1.

Lastly, we mark x[1] ,…, x[N] as optional, meaning that they need not be covered by our solution. (We could instead augment our collection of subsets with singleton subsets {x[n]} for each n, but with dancing links there’s a faster way.)

Thus any exact cover must have A and B in adjacent columns to avoid conflicts in the x[1], …, x[N] elements. We can handle other order constraints in a similar fashion.

The size of X is the number of symbols, MN, plus N for each order constraint. The number of subsets is the number of possible columns, which is NM, because we have M rows, and each can be one of N different symbols. Each subset has size M, one for each row, plus up to N more elements for each order constraint that involves it.

That’ll do

Though NM may be exponential in input size, typical logic grid puzzles are so small that my program solves them in a few milliseconds on my laptop. The bottleneck is now indeed the human entering the clues. [I briefly thought about writing a program to automate this by searching for keywords in each sentence.]

I was annoyed the above algorithm is adequate. Firstly, trying entire columns in a single step bears no resemblance to actions performed by human solvers. Secondly, it was too easy: I wish I were forced to think harder about the algorithm! Perhaps I’ll devise a larger puzzle that demands a better approach.

Around this time I lost focus because I felt my old code could use a few edits. I got sidetracked rewriting my dancing-links sudoku solver and comparing it against other ways of solving sudoku, and soon after that I moved on.

Luckily for my code, I recently felt compelled to clean it up enough for public release. It seemed a pity to let it languish in a forgotten corner until rendered useless from bit rot and lack of documentation.

The DLX library is now available at a git repository near you: