Zippers

Let’s imagine we’re implementng a text input. What kind of a data definition should we use? Here’s one option.

type Bad­Input = (String, Int)

Here we interpret String as the content of the input. The Int is the cursor’s position in the input. Suppose now we wish to write an insert function that takes a text input and inserts a character before the cursor.

bad­Put­Char :: Bad­Input -> Char -> Bad­Input
bad­Put­Char (xs, 0) c = (c : xs, 1)
bad­Put­Char (x : xt, k) c =
  (x : xt', k' + 1)
  where (xt', k') = bad­Put­Char (xt, k - 1) c

Consider a function that inserts a whole String instead of just a Char.

bad­Put­Str :: Bad­Input -> String -> Bad­Input
bad­Put­Str input s = foldl bad­Put­Char input s

Maybe you can see why this is not a great representation for the text input. Suppose we’re editing the end of our input. Putting a single character will traverse the entire input to add a character. For badPutStr we traverse the entire input n times where n is the length of the string. That’s terrible!

Can you think of a better representation?

The List Zipper

type Input = (String, String)

Here the first String will be text before the cursor reversed. Slightly weird, but we’ll see why in a minute. The second String is the text after the cursor. To make a connection with our previous structure, we can define a function that can take an Input to a BadInput.

good­To­Bad :: Input -> Bad­Input
good­To­Bad (pre, post) = ((reverse pre) ++ post, length pre)

Now we can write putChar and putStr.

put­Char :: Input -> Char -> Input
put­Char (pre, post) c = (c : pre, post)

put­Str :: Input -> String -> Input
put­Str input s = foldl put­Char input s

Notice how much simpler putChar is! The performance of our program is much better now too. A call to putChar is constant-time. For editing operations, this representation is far superior.

While we used the example of a text input to motivate this representation, this technique is totally general. This would speed up any situation where we have a list of elements that we repeatedly make local edits.

type List­Zip­per a = ([a], [a])

We call the first component of the tuple the context and the second component the control. Think of the context as the original data structure with a hole in it. It’s made of a list of frames where each frame is a Cons with a hole in it. The control is the thing needed to plug the whole to get the original structure. It turns out that this zipper transformation can work over more types than just lists. You can have a zipper for a tree that can make constant-time local edits.

Zipping Binary Trees

Here is your run-of-the-mill binary tree.

data Tree a = Empty | Node a (Tree a) (Tree a) deriv­ing (Show)

To get a zipper we need a representation for the context.

data Holy­Node a = Left­Hole a (Tree a) | Right­Hole a (Tree a) deriv­ing (Show)
type Con­text a = [Holy­Node a]
type Tree­Zip­per a = (Con­text a, Tree a)

We can now traverse this structure and make local edits.

go­Left :: Tree­Zip­per a -> Tree­Zip­per a
go­Left (ctx, Node x l r) = (Left­Hole x r : ctx, l)

go­Right :: Tree­Zip­per a -> Tree­Zip­per a
go­Right (ctx, Node x l r) = (Right­Hole x l : ctx, r)

go­Up :: Tree­Zip­per a -> Tree­Zip­per a
go­Up (Left­Hole x r : ctx, t) = (ctx, Node x t r)
go­Up (Right­Hole x l : ctx, t) = (ctx, Node x l t)

Data Type Algebra

And now for something completely different.

The unit value () has type (). I will write unit types as 1 instead.
Booleans have type Bool. I will write them as 2 instead.
The pair value ((), ()) has type ((), ()). I will write pairs types with * instead. So, that value now has type 1 * 1.
The Either value Left () has type Either () b. I will denote Either types with +. So, that value now has type 1 + b.

In our new notation, this function f has what return type?

f True = Left ()
f False = Right ()

This should be 1 + 1. We’ll also write this as 2. In general 1 + ... + 1 will be written as the number it adds up to. Same for products. One more exercise, what’s the return type of g in our new notation?

g x y = (f x, f y)

This should be 2 * 2 = 4. You should verify that if we have a type n, then there are n values of that type. We can see this pretty explicitly with the above example.

g True True = (Left (), Left ())
g True False = (Left (), Right ())
g False True = (Right (), Left ())
g False False = (Right (), Right ())

This seems like a cute notation, but is it more than just cute? Is there something deeper going on here?

The List Type

Recall the definition of a list type.

data List a = Nil | Cons a (List a)

I’m going to write this in a slightly more primitive way.

data List a = Either () (a, List a)

This is the same type as before except we’re not using Haskell’s nice syntax for defining types. Instead everything is just in terms of the more primitive notions. Here we represent the list [1, 2, 3] as Right (1, Right (2, Right (3, Left ()))).

Remember that list isn’t a type, but a whole family of types. Given an element type, we can give back a particular list type specialized to that list. We can model this as a function over types. In our new notation a list of a has type L(a) where L(t) = 1 + t * L(t).

Doing Algebra

Forget we’re talking about types. Let’s do some math.

What is the derivative of 1 + ax with respect to x? Should be a.
How about 1 + ax^2 with respect to x? That’s 2ax.

Do these look familiar? They’re the types of our context frames! This works in general. Let’s try something else.

L(t) = 1 + t * L(t). Solving for L we get L(t) = 1 / (1 - t). Does anyone remember the Taylor expansion of this? Yes, L(t) = 1 + t + t^2 + t^3 + ....

So we’ve derived an alternative definition of L(t). Turn on your type interpretation again. This is either 1 which is an empty list, t which is a singleton list, t^2 a list with two elements, ... Looks like a list to me!

Try the same thing for binary trees. What do you get? The Catalan numbers for a good reason. Cn is the number of full binary trees with n + 1 leaves.

Additional Reading

For the practical parts about using Haskell for zippers, see LYaH.
Gerard Huet wrote the first zipper paper.
Conor McBride discovered that the type of a zipper is the derivative of its data type.
Erik Hinton presented that paper at the Papers We Love meetup.
There is a nice series on zippers by Pavel Panchekha with even more interesting connections.
Finally, more of the mathematics can be found in a lecture note presented by Mathias Pedersen.