Sudoku
Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.
Sudoku is a game played on 9 by 9 grid such as the one below. The goal is to fill the spots marked with dots with the digits 1–9 such that each row, column, and 3 by 3 box, contains the digits 1–9 exactly once.
2 | . | . | . | . | 1 | . | 3 | 8 |
. | . | . | . | . | . | . | . | 5 |
. | 7 | . | . | . | 6 | . | . | . |
. | . | . | . | . | . | . | 1 | 3 |
. | 9 | 8 | 1 | . | . | 2 | 5 | 7 |
3 | 1 | . | . | . | . | 8 | . | . |
9 | . | . | 8 | . | . | . | 2 | . |
. | 5 | . | . | 6 | 9 | 7 | 8 | 4 |
4 | . | . | 2 | 5 | . | . | . | . |
We will write a program that can solve Sudoku puzzles. Our first attempt will be straightforward, but highly inefficient for actual boards. We will refine this initial solution to eventually develop a program that is practical, but retains its simplicity.
Data Definitions
Our first step is to identify what sorts of information must be representated and chose how to represent it.This is the first step of the Design Recipe, a systematic program design process.
The most obvious representation of the Sudoku grid is as a matrix, a list of lists. The type of the elements of the rows must also be specified. Here we chose the Char
type so we can represent empty spaces as dots. Are there other reasonable choices? Maybe Int
is possible, where Nothing
represents a blank space. Think about the advantages and disadvantages of this approach.
type Grid = Matrix Value type Matrix a = [Row a] type Row a = [a] type Value = Char
It will also be valuable to specify some constants and conventions up front. We want 9 by 9 grids, so the boxes will be 3 by 3. Our cells can have the digits 1–9 and the special value .
indicating an empty cell.
boxSize :: Int boxSize = 3 values :: [Value] values = ['1' .. '9'] empty :: Value -> Bool empty = (== '.') single :: [a] -> Bool single [_] = True single _ = False
The last definition, single
, is a helper function that we’ll use later. Notice all bindings have explicit type annotations and variables have camel-cased names. This is good style. For future testing, we can define an example puzzle.
puzzle :: Grid puzzle = ["2....1.38", "........5", ".7...6...", ".......13", ".981..257", "31....8..", "9..8...2.", ".5..69784", "4..25...."]
But wait, where are our rows? Remember, in Haskell String
is an alias for [Char]
so we can use string literals instead of the list construction syntax. This is a nice convenience and one that comes as a consequence of our choice of Value
as Char
.
Checking Correctness
You can imagine the first step to solving Sudoku puzzles is determining whether a grid is solved or not. What does it mean to be solved? Our rows, columns, and boxes must have all the right digits without duplicates. If we have functions that extract these components, validity is just checking this criteria over all the components.
valid :: Grid -> Bool valid g = all noDups (rows g) && all noDups (cols g) && all noDups (boxes g) noDups :: Eq a => [a] -> Bool noDups [] = True noDups (x : xt) = not (elem x xt) && noDups xt
Don’t worry too much about the Eq a
business, we will see what this means next week. For now, just think of it as a requirement to use the elem
function.
Now we need the extraction functions. Rows are simple, we already have them!This representation of matrices is called row-major order. If we preferred columns, we could have column-major order. Then our implementations of rows
and cols
would be swapped. Columns require transposition, flipping the rows and columns, an operation Haskell implements already. Boxes are more complicated. You should try to implement it yourself and see how it compares to the implementation I will provide.
rows :: Matrix a -> [Row a] rows = id cols :: Matrix a -> [Row a] cols = transpose boxes :: Matrix a -> [Row a] boxes = unpack . map cols . pack where pack = split . map split split = chop boxSize unpack = map concat . concat chop :: Int -> [a] -> [[a]] chop n [] = [] chop n xs = take n xs : chop n (drop n xs)
Intermezzo: What’s a list?
You’ve used lists a lot during your time programming. You probably feel pretty comfortable with them. So then, what is a list?
One way to think of a list is as a container with a bunch of elements inside it. So a list of numbers if a like a box with numbers contained within. This I’ll call the container interpretation of lists.
However, this is not the only way to think of a list.
You can view lists as non-deterministic computations. A value like
100
or “what” can be viewed as a deterministic computation that has only one result, whereas a list like[1, 2, 3]
can be viewed as a computation that can’t decide on which result it wants to have, so it presents us with all of the possible results.
I’ll call this the computation interpretation of lists. What does this perspective buy us? Well, suppose we think about adding two lists of numbers [1, 2]
and [3, 4]
. Under the container view, this doesn’t really make sense. What does it mean to add a box to a box? However this has a clear meaning under the computation perspective, because [1, 2]
is thought of as a single non-deterministic value. As a number, it makes sense to speak of adding it to another number. The sum of these two lists should be [4, 5, 5, 6]
corresponding to every possible sum of the numbers.
Brute First
Let’s model the entire space of solutions to a Sudoku puzzle. Any pre-filled cell is a deterministic entry. Any empty cell is a non-deterministic entry over all digits. Using a list as a representation of a non-deterministic computation, we can implement this.
type Choices = [Value] choices :: Grid -> Matrix Choices choices g = map (map choice) g where choice v = if empty v then values else [v]
Using choices
we get a fixed matrix with non-deterministic values of type Matrix [a]
. Instead, we need a non-deterministic matrix with fixed values of type [Matrix a]
. The sequence
function can do this for a list.
sequence [[1, 2], [3, 4], [5, 6]] = [[1, 3, 5], [1, 3, 6], [1, 4, 5], [1, 4, 6], [2, 3, 5], [2, 3, 6], [2, 4, 5], [2, 4, 6]]
We can extend this to matrices by mapping.
collapse :: Matrix [a] -> [Matrix a] collapse = sequence . map sequence
In our example puzzle
there were 51 empty squares. So, collapse (choices puzzle)
yields a list of grids with every possibility for the empty squares filled in. Finding all solutions is just a matter of filtering the valid solutions.
solveBrute :: Grid -> [Grid] solveBrute = filter valid . collapse . choices
How large is the output list? There are 51 empty spaces each with 9 possible digits. That’s 9 to the 51 grids, a number of the same order of magnitude as the number of molecules on Earth. This is intractable to say the least, but our effort wasn’t for naught.
Instead of starting from scratch, think about how you would solve a Sudoku and how we could apply that reasoning to shrink the search space.
Prune Second
Enumerating all digits for empty spaces doesn’t take into account the constraints of the puzzle itself. When you solve a puzzle, you don’t just list off every possibility. You restrict your search based on already filled-in squares. We can do this too.
For each non-deterministic cell, we can throw away any inconsistent choice. In other words, any value that already occurs as a fixed entry in the same row, column, or box.
prune :: Matrix Choices -> Matrix Choices prune = pruneBy boxes . pruneBy cols . pruneBy rows where pruneBy f = f . map reduce . f reduce :: Row Choices -> Row Choices reduce xss = [xs `minus` singles | xs <- xss] where singles = concat (filter single xss) minus :: Choices -> Choices -> Choices xs `minus` ys = if single xs then xs else xs \\ ys
Now we can prune
the search space before we collapse it.
solvePrune :: Grid -> [Grid] solvePrune = filter valid . collapse . prune . choices
This is better, but still infeasible. Think though, why did we just prune
once? In the process of pruning, we may have generated more fixed positions. A subsequent prune
could use this new information to eliminate even more options. Indeed, we should be prune until we can’t prune anymore. When is that? When pruning stops eliminating choices. This is called a fixed point.
solveFixPrune :: Grid -> [Grid] solveFixPrune = filter valid . collapse . fix prune . choices fix :: Eq a => (a -> a) -> a -> a fix f x = if x == x' then x else fix f x' where x' = f x
Better, but still not good enough.
Refine Third
A hypothetical choice affects all subsequent choices. When you solve a puzzle, there are points where you may have to make an assumption, a guess. This guess impacts every subsequent move you make. If your guess was wrong, you backtrack and make a different one.
At the moment we consider every all hypothetical choices independently. Instead, we should make one guess at a time and see how this affects the rest of the puzzle.
The solve
function will act as before, but instead of generating the entire search space with collapse
and filtering over it, we will make guesses one-by-one with the search
function. We can then prune the search space based on our guess and continue searching.You may wonder why we don’t use fix prune
. We could, but it turns out not to help that much in practice.
solve :: Grid -> [Grid] solve = search . prune . choices search :: Matrix Choices -> [Grid] search m | blocked m = [] | complete m = collapse m | otherwise = [g | m' <- guesses m , g <- search (prune m')]
There’s a number of helpers that have to be written for this to work. We need a guesses
function to pick the first non-deterministic entry and return a list of grids fixing every possible value.Try out a version of guesses
that instead picks the cell with the fewest choices and expands those. Is the performance any better? You can think of this as the one cell equivalent of collapse
.
guesses :: Matrix Choices -> [Matrix Choices] guesses m = [rows1 ++ [row1 ++ [c] : row2] ++ rows2 | c <- cs] where (rows1, row : rows2) = break (any (not . single)) m (row1, cs : row2) = break (not . single) row
We also need to know when we have a fully filled in grid or a bad grid that’s impossible to make more progress on. This is complete
and blocked
respectively.
complete :: Matrix Choices -> Bool complete = all (all single) blocked :: Matrix Choices -> Bool blocked m = void m || not (safe m) void :: Matrix Choices -> Bool void = any (any null) safe :: Matrix Choices -> Bool safe cm = all consistent (rows cm) && all consistent (cols cm) && all consistent (boxes cm) consistent :: Row Choices -> Bool consistent = noDups . concat . filter single
Notice a grid can be “bad” in two ways. Either there’s a choice with no more valid possibilities left, we call this void
, or there is some inconsistency in the grid, we’ll say this isn’t safe
.
Conclusion
main :: IO () main = (putStrLn . unlines . head . solve) puzzle
Compile and watch this churn out a solution to our example instantaneously.
The upshot here is that you shouldn’t be afraid of the bruteforce solution. We started with that straightforward approach and it was predictable bad. However, we used it as a foundation to build optimizations on top of it. These were guided by commonsense and intuitive techniques that anyone who has done Sudoku puzzles follows. What resulted was a program that works and is easy to understand.
Additional Reading
You can download the complete program.
This lecture is based heavily on Graham Hutton’s excellent note on Sudoku in Haskell. In fact, almost all the code is exactly the same as in that note.
Graham Hutton’s note is in turn based on Richard Bird’s original functional pearl on the subject.