## Wednesday, April 11, 2012

### Sudoku: Knuth vs Norvig vs Cohen

For reasons I may explain eventually, I’m looking up Sudoku algorithms again. Long ago, I wrote a solver after I read about Knuth applying the “dancing links” algorithm to Sudoku. This ingenious method worked so beautifully that I left the problem alone, save for a failed attempt to solve Sudoku with ZDDs.

Because of dancing links' effectiveness, I had thought there would be little else to say about Sudoku algorithms. How wrong I was! Wikipedia now sports an article dedicated to Sudoku algorithms, and the web is inundated with Sudoku programs in many different languages using many different algorithms.

Peter Norvig and Bram Cohen both wrote articles about Sudoku programs. Neither followed Knuth. Intriguing: are they instead using a method even more ingenious than encoding a Sudoku puzzle as an exact cover problem then deftly manipulating linked lists to solve it?

On closer inspection, it appeared not. Norvig’s code uses the same constraints as Knuth’s to eliminate particular digits from particular squares. However, its search heuristic is inferior, as it only looks at one type of constraint (that each box contains exactly one digit) when deciding where to branch next; Knuth’s code instead implicitly examines all constraints.

Cohen’s code encoded the same constraints as a Boolean satisfiability problem. When branching in the search, we choose the clause with the fewest remaining literals, so like Knuth, all constraints play a part in the decision. Thus:

1. Norvig’s program should be slowest because its search heuristic only uses one type of constraint.

2. Cohen’s program should search for solutions somewhat like Knuth’s. Both Cohen and Knuth use all constraints when branching: Knuth picks the column with fewest 1s, while Cohen picks the clause with fewest literals.

3. Cohen’s program should be slower than Knuth’s. Dancing links follows a few pointers when eliminating competing digits, while the simple SAT solver performs many linear scans for a certain literal. Even if we used something fancier than lists to represent clauses, when eliminating competing digits in a box, row, column, or 3x3 subregion, we’d still visit on the order of (9 choose 2) = 36 clauses as opposed to about 9 links in a sparse matrix.

On the other hand, although slower, Norvig’s and Cohen’s code looked more succinct than my old dancing links solver. Maybe because they’re using a scripting language. Or perhaps my implementation unnecessarily verbose? Or perhaps dancing links is intrinsically more complex?

To compare, I coded all 3 approaches in C; in particular, I rewrote my old dancing links solver. As I suspected, though I compressed my old code, it was still the longest solution.

However, I was surprised when I compared performance. As I predicted, dancing links was fastest. But I had overlooked an important detail; an error lurks in my reasoning above.

Where was I wrong, and why? I’ll answer this another time because I want to show off something else for now.

## A one-liner Sudoku solver?

While researching, I stumbled across a 3-line Sudoku solver in Perl, and a similar solver in Python, both of which have similarities with a C program by Aidan Thornton.

Why hadn’t I tried writing a terse solver yet? After all, I appreciate minimalist code the same way normal people appreciate poetry.

Without delay, I wrote a solver using GCC’s nested functions.

```#include <ctype.h>
#include <stdio.h>
#define F(n) for(int i=0;i<n;i++)
int main() {
char s = {0}, c;
F(81) while(isdigit(c = getchar()) ? s[i] = c, 0 : c != '.');
void f(char *p) {
int g(char d, int r, int c) {
int h(int r, int c) { return s[9*r+c] == d; }
F(9) if (h(r, i) || h(i, c) || h(r/3*3+i/3, c/3*3+i%3)) return 0;
return 1;
}
if (p == s+81) puts(s); else if (*p>'0') f(p+1);
else F(9) if (g(i+'1', (p-s)/9, (p-s)%9)) *p = i+'1', f(p+1), *p = 0;
}
return f(s), 0;
}```

When comparing notes, I was happy to find my approach differed from the others. My inner loop iterates 9 times and checks 3 boxes at once; the others loop 81 items and check the box when the index meets one of 3 crtieria.

This suggests my code should be faster. A few puzzles I tried supported this. In particular, my program took around 8.5s on my laptop to solve Wikipedia’s "near worst case" puzzle:

```.........
.....3.85
..1.2....
...5.7...
..4...1..
.9.......
5......73
..2.1....
....4...9```

while Thornton’s brute force solver took about 2m 45s. Both programs were compiled with the -O2 option. (With -O3, my program took about 5s, while the other program seemed unchanged.)

All the same, I’d prefer a shorter solution. Haskell might be a better medium, due to many space-saving features. Also, I tend to avoid Perl, Python, and friends because I like static typing.

Translating my above code yielded:

```module Main where
f x s@('.':y)=[a|z<-['1'..'9'],notElem z(map((x++s)!!)(foldr1(++)[[9*r+i,9*i+c,9*(r`div`3*3+i`div`3)+c`div`3*3+i`mod`3]|i<-[0..8]])),a<-f(x++[z])y]where(r,c)=divMod(length x)9
f x(z:y)=f(x++[z])y
f x[]=[x]

-- The puzzle from the Wikipedia article on Sudoku.
main=print\$f[] "53..7....6..195....98....6.8...6...34..8.3..17...2...6.6....28....419..5....8..79"```

This wasn’t quite as short as I’d like, due to peculiarities of Haskell. The div and mod operators take 5 characters each, while in most languages these are single non-alphanumeric chracters. I spent a couple of lines on the trivial cases of the recursion. More space is taken by folding a list concatenation over the 4 terms for each i in [0..8].

We can eliminate most divs and mods by looping over all 81 boxes as in Thornton’s code, except with two nested loops going up to 9 rather than a single loop iterating 81 times. After a few more edits, we have:

```module Main where
f x s@(h:y)=let(r,c)=divMod(length x)9;m#n=m`div`3==n`div`3;e=[0..8]in[a|z<-['1'..'9'],h==z||h=='.'&&notElem z(map((x++s)!!)[i*9+j|i<-e,j<-e,i==r||j==c||i#r&&j#c]),a<-f(x++[z])y]
f x[]=[x]

main=print\$f[] "53..7....6..195....98....6.8...6...34..8.3..17...2...6.6....28....419..5....8..79"```

It’s still a bit longer than the Python and Perl solutions I saw. It could be my approach fundamentally takes more space. Or maybe Haskell itself is harder to shorten, due to syntax or features like static typing. Or maybe I just need to learn more Haskell.