Thursday, April 12, 2012

Sudoku: Cohen vs my reading ability

I was wrong about being wrong in my last post. My original predictions were correct after all.

I had skimmed Bram Cohen’s post too quickly, and when I tried following his approach, my branching heuristic was simply to take the clause with the fewest literals.

What happens if we do this? Initially, there are many clauses consisting of 2 negative literals, and a few consisting of 9 positive literals. Thus most of the time we pick a negative clause. We’ll only ever pick a positive clause if only 2 literals are left. By this stage, it’s no different from a negative clause because assigning to true to one of the variables is the same as assigning false to the other.

In other words, apart from unit propagation, each step simply eliminates a particular digit from a particular cell, depending on the the order of the clauses. That is, apart from unit propagation, this is no different to brute force!

I failed to see this when I tried writing a SAT solver, and wound up surprised at the results. It was only later I realized I should have been ignoring negative clauses to make Cohen’s program behave like like Knuth’s: the positive clauses correspond exactly to the columns in the exact cover problem, and the literals within a positive clause correspond to 1s in a column.

The only difference is performance: manipulating lists or arrays to find which negative clauses affect which positive clauses is much slower than manipulating a few links to find which rows affect which columns.

When I returned to Cohen’s post, I realized he had explicitly mentioned skipping negative clauses during branching.

Below is my C translation of Cohen’s program. It is more efficient than the original because of in-place deletion and undeletion of clauses and literals. (I’ve used “dancing arrays” instead of Python lists.) As expected, this solver orders of magnitude slower than my dancing links solver, but handily beats Norvig’s program.

#include <ctype.h>
#include <limits.h>
#include <stdio.h>

#define F(i,n) for(int i = 0; i < n; i++)

int main() {
  int a[9][9] = {{0}}, x[4*9*9*(1+9*8/2)][10] = {{0}}, n = 0, m;
  // Read a sudoku.
  F(i, 9) F(j, 9) do if (EOF == (m = getchar())) return 1; while(
      isdigit(m) ? a[i][j] = m - '0', 0 : m != '.');
  int enc(int a, int b, int c) {return 9*9*a + 9*b + c + 1;}
  void add(int n, int a, int b, int c) {x[n][++*x[n]] = enc(a, b, c);}
  F(i, 9) F(j, 9) F(k, 9 || (n += 4, 0)) {  // At least one digit per:
    add(n  , k, i, j);  //
    add(n+1, i, k, j);  // ...column
    add(n+2, i, j, k);  // ...row
    add(n+3, i, j/3*3 + k/3, j%3*3 + k%3);  // ...3x3 region.
  for(int i = n-1; i >= 0; i--) F(j, x[i][0]) F(k, j) {
    x[n][1] = -x[i][j+1];  // At most one digit per positive clause.
    x[n][2] = -x[i][k+1];  // (Hence the 9 choose 2 factor above.)
    x[n++][0] = 2;
  int y[n], out[9*9*9];
  F(i, n) y[i] = i;
  int assign(int n, int v) {
    F(i, n) {
      int k = y[i];
      F(j, x[k][0]) {
        if (x[k][j+1] == v) {  // Satisfied clause:
          y[i--] = y[--n];     // Swap with last one
          y[n] = k;            // and decrement array count.
        } else if (x[k][j+1] == -v) {  // False literal:
          x[k][j+1] = x[k][x[k][0]];   // Swap with last one
          x[k][x[k][0]--] = -v;        // and decrement clause size.
          break;  // Assume literals are unique in a clause.
    return n;
  void solve(int n) {
    int s = INT_MAX, t = 0;
    if (!n) {  // Print solution.
      F(i, m) if ((t = out[i] - 1) >= 0) a[t/9%9][t%9] = t/9/9 + 1;
      F(r, 9) F(c, 9 || 0*puts("")) putchar('0'+a[r][c]);
    F(i, n) if (x[y[i]][0] < s) {  // Find smallest positive clause.
      if (x[y[i]][0] > 1 && x[y[i]][1] < 0) continue;
      if (!(s = x[y[i]][0])) return;  // Empty clause: no solution.
      t = y[i];
    void try(int v) {
      solve(assign(n, out[m++] = v));
      F(i, n) {  // Undo any clause deletions.
        int k = y[i];
        if (x[k][0] < 9 && x[k][x[k][0]+1] == -v) x[k][0]++;
    if (s > 1) try(-x[t][1]);
  // Fill in the given digits, and go!
  F(r, 9) F(c, 9) if (a[r][c]) n = assign(n, enc(a[r][c]-1, r, c));
  m = 0, solve(n);  // Reuse m as count of 'out' array.
  return 0;

Wednesday, April 11, 2012

Sudoku: Knuth vs Norvig vs Cohen

For reasons I may explain eventually, I’m looking up Sudoku algorithms again. Long ago, I wrote a solver after I read about Knuth applying the “dancing links” algorithm to Sudoku. This ingenious method worked so beautifully that I left the problem alone, save for a failed attempt to solve Sudoku with ZDDs.

Because of dancing links' effectiveness, I had thought there would be little else to say about Sudoku algorithms. How wrong I was! Wikipedia now sports an article dedicated to Sudoku algorithms, and the web is inundated with Sudoku programs in many different languages using many different algorithms.

Peter Norvig and Bram Cohen both wrote articles about Sudoku programs. Neither followed Knuth. Intriguing: are they instead using a method even more ingenious than encoding a Sudoku puzzle as an exact cover problem then deftly manipulating linked lists to solve it?

On closer inspection, it appeared not. Norvig’s code uses the same constraints as Knuth’s to eliminate particular digits from particular squares. However, its search heuristic is inferior, as it only looks at one type of constraint (that each box contains exactly one digit) when deciding where to branch next; Knuth’s code instead implicitly examines all constraints.

Cohen’s code encoded the same constraints as a Boolean satisfiability problem. When branching in the search, we choose the clause with the fewest remaining literals, so like Knuth, all constraints play a part in the decision. Thus:

  1. Norvig’s program should be slowest because its search heuristic only uses one type of constraint.

  2. Cohen’s program should search for solutions somewhat like Knuth’s. Both Cohen and Knuth use all constraints when branching: Knuth picks the column with fewest 1s, while Cohen picks the clause with fewest literals.

  3. Cohen’s program should be slower than Knuth’s. Dancing links follows a few pointers when eliminating competing digits, while the simple SAT solver performs many linear scans for a certain literal. Even if we used something fancier than lists to represent clauses, when eliminating competing digits in a box, row, column, or 3x3 subregion, we’d still visit on the order of (9 choose 2) = 36 clauses as opposed to about 9 links in a sparse matrix.

On the other hand, although slower, Norvig’s and Cohen’s code looked more succinct than my old dancing links solver. Maybe because they’re using a scripting language. Or perhaps my implementation unnecessarily verbose? Or perhaps dancing links is intrinsically more complex?

To compare, I coded all 3 approaches in C; in particular, I rewrote my old dancing links solver. As I suspected, though I compressed my old code, it was still the longest solution.

However, I was surprised when I compared performance. As I predicted, dancing links was fastest. But I had overlooked an important detail; an error lurks in my reasoning above.

Where was I wrong, and why? I’ll answer this another time because I want to show off something else for now.

A one-liner Sudoku solver?

While researching, I stumbled across a 3-line Sudoku solver in Perl, and a similar solver in Python, both of which have similarities with a C program by Aidan Thornton.

Why hadn’t I tried writing a terse solver yet? After all, I appreciate minimalist code the same way normal people appreciate poetry.

Without delay, I wrote a solver using GCC’s nested functions.

#include <ctype.h>
#include <stdio.h>
#define F(n) for(int i=0;i<n;i++)
int main() {
  char s[82] = {0}, c;
  F(81) while(isdigit(c = getchar()) ? s[i] = c, 0 : c != '.');
  void f(char *p) {
    int g(char d, int r, int c) {
      int h(int r, int c) { return s[9*r+c] == d; }
      F(9) if (h(r, i) || h(i, c) || h(r/3*3+i/3, c/3*3+i%3)) return 0;
      return 1;
    if (p == s+81) puts(s); else if (*p>'0') f(p+1);
    else F(9) if (g(i+'1', (p-s)/9, (p-s)%9)) *p = i+'1', f(p+1), *p = 0;
  return f(s), 0;

When comparing notes, I was happy to find my approach differed from the others. My inner loop iterates 9 times and checks 3 boxes at once; the others loop 81 items and check the box when the index meets one of 3 crtieria.

This suggests my code should be faster. A few puzzles I tried supported this. In particular, my program took around 8.5s on my laptop to solve Wikipedia’s "near worst case" puzzle:


while Thornton’s brute force solver took about 2m 45s. Both programs were compiled with the -O2 option. (With -O3, my program took about 5s, while the other program seemed unchanged.)

All the same, I’d prefer a shorter solution. Haskell might be a better medium, due to many space-saving features. Also, I tend to avoid Perl, Python, and friends because I like static typing.

Translating my above code yielded:

module Main where
f x s@('.':y)=[a|z<-['1'..'9'],notElem z(map((x++s)!!)(foldr1(++)[[9*r+i,9*i+c,9*(r`div`3*3+i`div`3)+c`div`3*3+i`mod`3]|i<-[0..8]])),a<-f(x++[z])y]where(r,c)=divMod(length x)9
f x(z:y)=f(x++[z])y
f x[]=[x]

-- The puzzle from the Wikipedia article on Sudoku.
main=print$f[] "53..7....6..195....98....6.8...6...34..8.3..17...2...6.6....28....419..5....8..79"

This wasn’t quite as short as I’d like, due to peculiarities of Haskell. The div and mod operators take 5 characters each, while in most languages these are single non-alphanumeric chracters. I spent a couple of lines on the trivial cases of the recursion. More space is taken by folding a list concatenation over the 4 terms for each i in [0..8].

We can eliminate most divs and mods by looping over all 81 boxes as in Thornton’s code, except with two nested loops going up to 9 rather than a single loop iterating 81 times. After a few more edits, we have:

module Main where
f x s@(h:y)=let(r,c)=divMod(length x)9;m#n=m`div`3==n`div`3;e=[0..8]in[a|z<-['1'..'9'],h==z||h=='.'&&notElem z(map((x++s)!!)[i*9+j|i<-e,j<-e,i==r||j==c||i#r&&j#c]),a<-f(x++[z])y]
f x[]=[x]

main=print$f[] "53..7....6..195....98....6.8...6...34..8.3..17...2...6.6....28....419..5....8..79"

It’s still a bit longer than the Python and Perl solutions I saw. It could be my approach fundamentally takes more space. Or maybe Haskell itself is harder to shorten, due to syntax or features like static typing. Or maybe I just need to learn more Haskell.