Monoids
-- strings with concatenation ("foo" ++ "bar") ++ "baz" -- foobarbaz "foo" ++ ("bar" ++ "baz") -- foobarbaz "" ++ "foo" -- foo -- numbers with addition (2 + 3) + 4 -- 9 2 + (3 + 4) -- 9 0 + 100 -- 100 -- numbers with multiplication (2 * 3) * 4 -- 24 2 * (3 * 4) -- 24 1 * 100 -- 100 -- lists with append ([1, 2] ++ [3, 4]) ++ [5, 6] -- [1, 2, 3, 4, 5, 6] [1, 2] ++ ([3, 4] ++ [5, 6]) -- [1, 2, 3, 4, 5, 6] [] ++ [1, 2, 3] -- [1, 2, 3] -- booleans with or (True or True) or False -- True True or (True or False) -- True False or True -- True -- booleans with and (True and True) and False -- False True and (True and False) -- False True and False -- False
The Laws
If you were to distill down the similarities between all those examples, what would we have? For all x
, y
, z
of the same type:
There exists an associative operator we’ll call
mappend
.If you surround a function name with ` it becomes an infix operator. These are backticks, not single quotes!(x `mappend` y) `mappend` z = x `mappend` (y `mappend` z)
There exists an identity element we’ll call
mempty
.mempty `mappend` x = x `mappend` mempty = x
All of the examples above satisfy these two laws. Any data that satisfies the above laws is callend a monoid.This is the Official Math Name™, but it’s the exact same thing as combinable from your homework. When we speak of a type being a monoid, it must be with respect to a particular operator. For example, integers are a monoid with respect to addition, but not subtraction. A type can be a monoid with respect to many different operators, as booleans above are with both conjunction and disjunction.
The Typeclass
We can define a typeclass for monoids.
class Monoid m where mempty :: m mappend :: m -> m -> m mconcat :: [m] -> m mconcat = foldr mappend mempty
So, in order to make a type an instance of Monoid
you have to provide an mempty
value and an mappend
You can use the infix operator <>
instead of mappend
. function. You get mconcat
for free. The following is an instance of Monoid
for lists.
instance Monoid [a] where mempty = [] mappend = (++)
Haskell does not automatically check that the monoid laws hold. It is the responsibility of the programmer to verify that every instance of Monoid
satisfies the laws.
Fast Indexing
With finger trees you can write fast purely functional implementations of many data structures including sequences, priority queues, and search trees. They fundamentally rely on monoids.
Consider the humble linked list. While accessing the head is fast, the tail requires touching all the elements of the list. Can we do better? Suppose we have the following data structure.
data Tree v a = Leaf v a | Branch v (Tree v a) (Tree v a)
This is a binary tree such that all the nodes are annotated with a value v
, while all the elements of type a
are stored at the leaves.
The leaves store elements from left to right. The following will construct the original list stored in a tree.
toList :: Tree v a -> [a] toList (Leaf _ a) = [a] toList (Branch _ x y) = toList x ++ toList y
Given a Tree
we can retrieve its annotation easily by pattern matching.
tag :: Tree v a -> v tag (Leaf v _) = v tag (Branch v _ _) = v
Given a tree, we want to index elements from it. Getting the head
is easy. We can just traverse to the left-most element.
head :: Tree v a -> a head (Leaf _ a) = a head (Branch _ x _) = head x
What about the others? This is where our annotations can help. What happens if every subtree is annotated with its size?
We can use these annotations to direct our search. If the left subtree is larger than our index, then we know the element must be on the left side. Otherwise, we know the element is in the right subtree. We can recurse and adjust the index as necessary.
Let’s implement this idea.
type Size = Int
This type will remind us what our annotations represent. We want the following invariant to hold across all our nodes.
tag (Leaf ...) = 1 tag (Branch ... x y) = (tag x) + (tag y)
Instead of constructing trees with their actual constructors, we can build our own smart constructors which will automatically annotate nodes with their proper values.
leaf :: a -> Tree Size a leaf a = Leaf 1 a branch :: Tree Size a -> Tree Size a -> Tree Size a branch x y = Branch ((tag x) + (tag y)) x y
So long as we only create trees using leaf
and branch
, our invariant will be guaranteed. Now we can write our indexing function !!
.
(!!) :: Tree Size a -> Int -> a (Leaf _ a) !! 0 = a (Branch _ x y) !! n | n < tag x = x !! n | otherwise = y !! (n - tag x)
A Priority Queue
A priority is a different data structure than returns the most “urgent” element according to its priority. We will represent priorities as integers, where smaller ones are more urgent.
type Priority = Int
We recycle the binary tree from earlier. We can view this as a tournament, so every subtree is annotated with the smallest priority it contains.
Like before, we need an invariant about our annotations.
tag (Leaf ...) = priority a tag (Branch ... x y) = (tag x) `min` (tag y)
Given a tournament tree, we could retrieve the element with smallest priority in log n
time, assuming the tree is balanced, using a similar technique as above.
winner :: Tree Priority a -> a winner t = go t where go (Leaf _ a) = a go (Branch _ x y) | tag x == tag t = go x -- winner on left | tag y == tag t = go y -- winner on right
The Monoid Connection
One and the same tree structure can be used for two quite different purposes, just by using different annotations. Recognizing that the tags form a monoid, we can unify both implementations. Notice !!
and winner
are actually special cases of the same function.
Viewing the previous examples through the lense of a monoid, we generalize our invariant.
tag (Branch ... x y) = (tag x) <> (tag y)
The previous examples can be recreated by making each annotation an instance of Monoid
.
instance Monoid Size where mempty = 0 mappend = (+) instance Monoid Priority where mempty = maxBound mappend = min
Hence, we can write a unified smart constructor.
branch :: Monoid v => Tree v a -> Tree v a -> Tree v a branch x y = Branch ((tag x) <> (tag y)) x y
For leaves, the tag is obtained from the element. We can capture this in a typeclass.
class Monoid v => Measured a v where measure :: a -> v
The corresponding smart constructor reads as follows.
leaf :: Measured a v => a -> Tree v a leaf a = Leaf (measure a) a
For our examples, we need the appropriate instances again.
instance Measured a Size where measure _ = 1 -- one element = size 1 instance Measured Foo Priority where measure a = priority a -- urgency of the element
How does the annotation at the top of a tree relate to the elements at the leaves? In our two examples, it was the total number of leaves and the least priority respectively. These values are independent of the actual shape of the tree. Thanks to the associativity of <>
, this is true for any monoid.
By associativity, these two trees have the same annotations.
(v1<>v2)<>(v3<>v4) = v1<>(v2<>(v3<>v4))
In fact, as long as the sequences of leaves are the same, this quantity will be the same. The tag at the root of a tree with n
elements is
measure a1 <> measure a2 <> measure a3 <> ... <> measure an
While independent of the shape of the branching, i.e. on the placement of parenthesis, this may of course depend on the order of elements. It makes sense to refer to this combination of measures of all elements as the measure of the tree.
instance Measured a v => Measured (Tree a v) v where measure = tag
Thus, every tree is annotated with its measure.
Search
Our efforts culminate in the unification of the two search algorithms !!
and winner
. They are certainly similar; at each node, they descend into one of the subtrees which is chosen depending on the annotations. But to see their exact equivalence, we have to ignore branches and grouping for now because this is exactly what associativity “abstracts away.”
In a sequence of elements a1, a2, ..., an
how does one find say the third one? Well, we scan the list from left to right and add 1 for each element encountered. As soon as the count exceeds 3, we have found the third element.
1 -- is not > 3 1 + 1 -- is not > 3 1 + 1 + 1 -- is not > 3 1 + 1 + 1 + 1 -- is > 3 ...
Similarly, how to find the element of a least priority say v
? Well, we can scan the list from left to right and keep track of the minimum priority so far. We have completed our search once it becomes equal to v
.
v1 -- still bigger than v v1 `min` v2 -- still bigger than v v1 `min` v2 `min` v3 -- still bigger than v v1 `min` v2 `min` v3 `min` v4 -- equal to v! ...
In general terms, we are looking for the position where a predicate p
switches from False
to True
.
measure a1 -- not p measure a1 <> measure a2 -- not p measure a1 <> measure a2 <> measure a3 -- not p measure a1 <> measure a2 <> measure a3 <> measure a4 -- p ... -- p
In other words, we are looking for the position k where
p (measure a1 <> ... <> measure ak) = False p (measure a1 <> ... <> measure ak <> measure a(k + 1)) = True
The key point is that p
does not test single elements but combinations of them, and this allows us to do binary search! Namely, how to find the element where p
flips? Divide the total measure into two halves.
x <> y x = measure a1 <> ... <> measure a(n/2) y = measure a(n/2 + 1) <> ... <> measure an
If p
is True
on the first half, then we have to look there for the flip, otherwise we have to search the second half. In the latter case, we would have to split y = y1 <> y2
and test p (x <> y1)
.
In the case of our data structures, the tree shape determines how the measure is split into parts at each step.
search :: Measured a v => (v -> Bool) -> Tree v a -> Maybe a search p t | p (measure t) = Just (go mempty p t) | otherwise = Nothing where go i p (Leaf _ a) = a go i p (Branch _ l r) | p (i <> measure l) = go i p l | otherwise = go (i <> measure l) p r
Since we have annotated each branch with its measure, testing p
takes no time at all. Of course, this algorithm only works if p
really does flip from False
to True
exactly once. This is the case we say that p
is a monotonic predicate. Our two examples (> 3)
and (== minimum)
have this property and thus, we can finally conclude.
t !! k = search (> k) winner t = search (== measure t)