Sudoku

Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.
Donald Knuth, Structured Programming With Go To Statements

Sudoku is a game played on 9 by 9 grid such as the one below. The goal is to fill the spots marked with dots with the digits 1–9 such that each row, column, and 3 by 3 box, contains the digits 1–9 exactly once.

2	.	.	.	.	1	.	3	8
.	.	.	.	.	.	.	.	5
.	7	.	.	.	6	.	.	.
.	.	.	.	.	.	.	1	3
.	9	8	1	.	.	2	5	7
3	1	.	.	.	.	8	.	.
9	.	.	8	.	.	.	2	.
.	5	.	.	6	9	7	8	4
4	.	.	2	5	.	.	.	.

We will write a program that can solve Sudoku puzzles. Our first attempt will be straightforward, but highly inefficient for actual boards. We will refine this initial solution to eventually develop a program that is practical, but retains its simplicity.

Data Definitions

Our first step is to identify what sorts of information must be representated and chose how to represent it.This is the first step of the Design Recipe, a systematic program design process.

The most obvious representation of the Sudoku grid is as a matrix, a list of lists. The type of the elements of the rows must also be specified. Here we chose the Char type so we can represent empty spaces as dots. Are there other reasonable choices? Maybe Int is possible, where Nothing represents a blank space. Think about the advantages and disadvantages of this approach.

type Grid = Matrix Value
type Matrix a = [Row a]
type Row a = [a]
type Value = Char

It will also be valuable to specify some constants and conventions up front. We want 9 by 9 grids, so the boxes will be 3 by 3. Our cells can have the digits 1–9 and the special value . indicating an empty cell.

box­Size :: Int
box­Size = 3

val­ues :: [Value]
val­ues = ['1' .. '9']

empty :: Value -> Bool
empty = (== '.')

sin­gle :: [a] -> Bool
sin­gle [_] = True
sin­gle _   = False

The last definition, single, is a helper function that we’ll use later. Notice all bindings have explicit type annotations and variables have camel-cased names. This is good style. For future testing, we can define an example puzzle.

puz­zle :: Grid
puz­zle = ["2....1.38",
          "........5",
          ".7...6...",
          ".......13",
          ".981..257",
          "31....8..",
          "9..8...2.",
          ".5..69784",
          "4..25...."]

But wait, where are our rows? Remember, in Haskell String is an alias for [Char] so we can use string literals instead of the list construction syntax. This is a nice convenience and one that comes as a consequence of our choice of Value as Char.

Checking Correctness

You can imagine the first step to solving Sudoku puzzles is determining whether a grid is solved or not. What does it mean to be solved? Our rows, columns, and boxes must have all the right digits without duplicates. If we have functions that extract these components, validity is just checking this criteria over all the components.

valid :: Grid -> Bool
valid g = all no­Dups (rows g) &&
          all no­Dups (cols g) &&
          all no­Dups (boxes g)

no­Dups :: Eq a => [a] -> Bool
no­Dups [] = True
no­Dups (x : xt) = not (elem x xt) && no­Dups xt

Don’t worry too much about the Eq a business, we will see what this means next week. For now, just think of it as a requirement to use the elem function.

Now we need the extraction functions. Rows are simple, we already have them!This representation of matrices is called row-major order. If we preferred columns, we could have column-major order. Then our implementations of rows and cols would be swapped. Columns require transposition, flipping the rows and columns, an operation Haskell implements already. Boxes are more complicated. You should try to implement it yourself and see how it compares to the implementation I will provide.

rows :: Matrix a -> [Row a]
rows =  id

cols :: Matrix a -> [Row a]
cols = trans­pose

boxes :: Matrix a -> [Row a]
boxes = unpack . map cols . pack
        where
          pack   = split . map split
          split  = chop box­Size
          unpack = map con­cat . con­cat

chop :: Int -> [a] -> [[a]]
chop n [] = []
chop n xs = take n xs : chop n (drop n xs)

Intermezzo: What’s a list?

You’ve used lists a lot during your time programming. You probably feel pretty comfortable with them. So then, what is a list?

One way to think of a list is as a container with a bunch of elements inside it. So a list of numbers if a like a box with numbers contained within. This I’ll call the container interpretation of lists.

However, this is not the only way to think of a list.

You can view lists as non-deterministic computations. A value like 100 or “what” can be viewed as a deterministic computation that has only one result, whereas a list like [1, 2, 3] can be viewed as a computation that can’t decide on which result it wants to have, so it presents us with all of the possible results.
Miran Lipovača, Learn You a Haskell for Great Good!

I’ll call this the computation interpretation of lists. What does this perspective buy us? Well, suppose we think about adding two lists of numbers [1, 2] and [3, 4]. Under the container view, this doesn’t really make sense. What does it mean to add a box to a box? However this has a clear meaning under the computation perspective, because [1, 2] is thought of as a single non-deterministic value. As a number, it makes sense to speak of adding it to another number. The sum of these two lists should be [4, 5, 5, 6] corresponding to every possible sum of the numbers.

Brute First

Let’s model the entire space of solutions to a Sudoku puzzle. Any pre-filled cell is a deterministic entry. Any empty cell is a non-deterministic entry over all digits. Using a list as a representation of a non-deterministic computation, we can implement this.

type Choices = [Value]

choices :: Grid -> Matrix Choices
choices g = map (map choice) g
            where
              choice v = if empty v then val­ues else [v]

Using choices we get a fixed matrix with non-deterministic values of type Matrix [a]. Instead, we need a non-deterministic matrix with fixed values of type [Matrix a]. The sequence function can do this for a list.

sequence [[1, 2], [3, 4], [5, 6]] =
   [[1, 3, 5], [1, 3, 6], [1, 4, 5], [1, 4, 6],
    [2, 3, 5], [2, 3, 6], [2, 4, 5], [2, 4, 6]]

We can extend this to matrices by mapping.

col­lapse :: Matrix [a] -> [Matrix a]
col­lapse =  sequence . map sequence

In our example puzzle there were 51 empty squares. So, collapse (choices puzzle) yields a list of grids with every possibility for the empty squares filled in. Finding all solutions is just a matter of filtering the valid solutions.

solve­Brute :: Grid -> [Grid]
solve­Brute =  fil­ter valid . col­lapse . choices

How large is the output list? There are 51 empty spaces each with 9 possible digits. That’s 9 to the 51 grids, a number of the same order of magnitude as the number of molecules on Earth. This is intractable to say the least, but our effort wasn’t for naught.

Instead of starting from scratch, think about how you would solve a Sudoku and how we could apply that reasoning to shrink the search space.

Prune Second

Enumerating all digits for empty spaces doesn’t take into account the constraints of the puzzle itself. When you solve a puzzle, you don’t just list off every possibility. You restrict your search based on already filled-in squares. We can do this too.

For each non-deterministic cell, we can throw away any inconsistent choice. In other words, any value that already occurs as a fixed entry in the same row, column, or box.

prune :: Matrix Choices -> Matrix Choices
prune =  prune­By boxes . prune­By cols . prune­By rows
         where prune­By f = f . map reduce . f

reduce :: Row Choices -> Row Choices
reduce xss =  [xs `minus` sin­gles | xs <- xss]
              where sin­gles = con­cat (fil­ter sin­gle xss)

minus :: Choices -> Choices -> Choices
xs `minus` ys = if sin­gle xs then xs else xs \\ ys

Now we can prune the search space before we collapse it.

solve­Prune :: Grid -> [Grid]
solve­Prune =  fil­ter valid . col­lapse . prune . choices

This is better, but still infeasible. Think though, why did we just prune once? In the process of pruning, we may have generated more fixed positions. A subsequent prune could use this new information to eliminate even more options. Indeed, we should be prune until we can’t prune anymore. When is that? When pruning stops eliminating choices. This is called a fixed point.

solve­Fix­Prune :: Grid -> [Grid]
solve­Fix­Prune = fil­ter valid . col­lapse . fix prune . choices

fix :: Eq a => (a -> a) -> a -> a
fix f x =  if x == x' then x else fix f x'
           where x' = f x

Better, but still not good enough.

Refine Third

A hypothetical choice affects all subsequent choices. When you solve a puzzle, there are points where you may have to make an assumption, a guess. This guess impacts every subsequent move you make. If your guess was wrong, you backtrack and make a different one.

At the moment we consider every all hypothetical choices independently. Instead, we should make one guess at a time and see how this affects the rest of the puzzle.

The solve function will act as before, but instead of generating the entire search space with collapse and filtering over it, we will make guesses one-by-one with the search function. We can then prune the search space based on our guess and continue searching.You may wonder why we don’t use fix prune. We could, but it turns out not to help that much in practice.

solve :: Grid -> [Grid]
solve = search . prune . choices

search :: Matrix Choices -> [Grid]
search m
  | blocked m  = []
  | com­plete m = col­lapse m
  | oth­er­wise  = [g | m' <- guesses m
                    , g  <- search (prune m')]

There’s a number of helpers that have to be written for this to work. We need a guesses function to pick the first non-deterministic entry and return a list of grids fixing every possible value.Try out a version of guesses that instead picks the cell with the fewest choices and expands those. Is the performance any better? You can think of this as the one cell equivalent of collapse.

guesses :: Matrix Choices -> [Matrix Choices]
guesses m =
  [rows1 ++ [row1 ++ [c] : row2] ++ rows2 | c <- cs]
  where
    (rows1, row : rows2) = break (any (not . sin­gle)) m
    (row1, cs : row2)    = break (not . sin­gle) row

We also need to know when we have a fully filled in grid or a bad grid that’s impossible to make more progress on. This is complete and blocked respectively.

com­plete :: Matrix Choices -> Bool
com­plete = all (all sin­gle)

blocked :: Matrix Choices -> Bool
blocked m = void m || not (safe m)

void :: Matrix Choices -> Bool
void = any (any null)

safe :: Matrix Choices -> Bool
safe cm = all con­sis­tent (rows cm) &&
          all con­sis­tent (cols cm) &&
          all con­sis­tent (boxes cm)

con­sis­tent :: Row Choices -> Bool
con­sis­tent = no­Dups . con­cat . fil­ter sin­gle

Notice a grid can be “bad” in two ways. Either there’s a choice with no more valid possibilities left, we call this void, or there is some inconsistency in the grid, we’ll say this isn’t safe.

Conclusion

main :: IO ()
main = (put­Str­Ln . unlines . head . solve) puz­zle

Compile and watch this churn out a solution to our example instantaneously.

The upshot here is that you shouldn’t be afraid of the bruteforce solution. We started with that straightforward approach and it was predictable bad. However, we used it as a foundation to build optimizations on top of it. These were guided by commonsense and intuitive techniques that anyone who has done Sudoku puzzles follows. What resulted was a program that works and is easy to understand.

Additional Reading

You can download the complete program.
This lecture is based heavily on Graham Hutton’s excellent note on Sudoku in Haskell. In fact, almost all the code is exactly the same as in that note.
Graham Hutton’s note is in turn based on Richard Bird’s original functional pearl on the subject.

2	.	.	.	.	1	.	3	8
.	.	.	.	.	.	.	.	5
.	7	.	.	.	6	.	.	.
.	.	.	.	.	.	.	1	3
.	9	8	1	.	.	2	5	7
3	1	.	.	.	.	8	.	.
9	.	.	8	.	.	.	2	.
.	5	.	.	6	9	7	8	4
4	.	.	2	5	.	.	.	.

2	.	.	.	.	1	.	3	8
.	.	.	.	.	.	.	.	5
.	7	.	.	.	6	.	.	.
.	.	.	.	.	.	.	1	3
.	9	8	1	.	.	2	5	7
3	1	.	.	.	.	8	.	.
9	.	.	8	.	.	.	2	.
.	5	.	.	6	9	7	8	4
4	.	.	2	5	.	.	.	.

2	.	.	.	.	1	.	3	8
.	.	.	.	.	.	.	.	5
.	7	.	.	.	6	.	.	.
.	.	.	.	.	.	.	1	3
.	9	8	1	.	.	2	5	7
3	1	.	.	.	.	8	.	.
9	.	.	8	.	.	.	2	.
.	5	.	.	6	9	7	8	4
4	.	.	2	5	.	.	.	.