16 Knock: pattern matching
16.1 Matching
One feature we’ve taken advantage extensively in the writing of our compilers is the Racket’s match facility for pattern matching.
Let’s add a similar feature to our own language.
We’ll call it Knock!
In Knock, we will support a limited form of pattern matching of the form:
(match e [p0 e0] ...)
A pattern matching expression is used to perform case-analysis, deconstruction, and binding on the value produced by e. A match consists of any number of clauses, where each clause consists of a pattern pi and expression ei. Each pi is a pattern, which can include literal booleans, characters, integers, and the empty list, or can be a pattern variable, a wildcard, a cons-pattern, or a and-pattern. Clauses are matched in the order in which they appear and if a pattern matches the value of e, then the corresponding expression is evaluated in an environment that binds any pattern variables to the matching parts of e’s value. If no patterns match e’s value, an error is signalled.
The syntax is extended as follows:
#lang racket (provide (all-defined-out)) ;; type Prog = (Prog (Listof Defn) Expr) (struct Prog (ds e) #:prefab) ;; type Defn = (Defn Id (Listof Id) Expr) (struct Defn (f xs e) #:prefab) ;; type Expr = (Eof) ;; | (Empty) ;; | (Int Integer) ;; | (Bool Boolean) ;; | (Char Character) ;; | (Str String) ;; | (Prim0 Op0) ;; | (Prim1 Op1 Expr) ;; | (Prim2 Op2 Expr Expr) ;; | (Prim3 Op3 Expr Expr Expr) ;; | (If Expr Expr Expr) ;; | (Begin Expr Expr) ;; | (Let Id Expr Expr) ;; | (Var Id) ;; | (App Id (Listof Expr)) ;; | (Match Expr (Listof Pat) (Listof Expr)) ;; type Id = Symbol ;; type Op0 = 'read-byte ;; type Op1 = 'add1 | 'sub1 | 'zero? ;; | 'char? | 'integer->char | 'char->integer ;; | 'write-byte | 'eof-object? ;; | 'box | 'car | 'cdr | 'unbox ;; | 'empty? | 'cons? | 'box? ;; | 'vector? | vector-length ;; | 'string? | string-length ;; type Op2 = '+ | '- | '< | '= ;; | 'cons | 'eq? ;; | 'make-vector | 'vector-ref ;; | 'make-string | 'string-ref ;; type Op3 = 'vector-set! ;; type Pat = (PVar Id) ;; | (PWild) ;; | (PLit Lit) ;; | (PBox Pat) ;; | (PCons Pat Pat) ;; | (PAnd Pat Pat) ;; type Lit = Boolean ;; | Character ;; | Integer ;; | '() (struct Eof () #:prefab) (struct Empty () #:prefab) (struct Int (i) #:prefab) (struct Bool (b) #:prefab) (struct Char (c) #:prefab) (struct Str (s) #:prefab) (struct Prim0 (p) #:prefab) (struct Prim1 (p e) #:prefab) (struct Prim2 (p e1 e2) #:prefab) (struct Prim3 (p e1 e2 e3) #:prefab) (struct If (e1 e2 e3) #:prefab) (struct Begin (e1 e2) #:prefab) (struct Let (x e1 e2) #:prefab) (struct Var (x) #:prefab) (struct App (f es) #:prefab) (struct Match (e ps es) #:prefab) (struct PVar (x) #:prefab) (struct PWild () #:prefab) (struct PLit (x) #:prefab) (struct PBox (p) #:prefab) (struct PCons (p1 p2) #:prefab) (struct PAnd (p1 p2) #:prefab)
16.2 Match by Example
Since we’ve been using pattern matching throughout the course, it probably is pretty natural at this point, but let’s quickly walk through some examples to try and disentangle the different aspects of match.
Perhaps the simplest form of a match-expression uses a pattern that just consists of a variable, e.g.
(match e [x e0])
This expression is equivalent to (let ((x e)) e0) because a pattern variable matches any value and binds that name in the scope of its right-hand expression. We can see from this example that match is doing variable binding.
Relatedly, a “wildcard” pattern can be used to match anything without binding the value to a name:
(match e [_ e0])
This expression is equivalent to (begin e e0).
Another simple form of pattern is to use a literal such as an integer, character, etc. which matches when the value is the same as the literal. This form of pattern doesn’t bind any names, but is used to discriminate between different cases of what the value may be. For example:
(match e [#f e1] [_ e2])
This expression is equivalent to (if e e2 e1). Here we can see that match is doing conditional evaluation, selecting e1 if e produces #f, and selecting e2 otherwise.
A more complicated pattern involves a constructor-style pattern like cons:
(match e [(cons x y) e1] [_ e2])
Here, the cons pattern is both discriminating between cons and non-cons values, matching only when e is a pair, but also binding the names x and y to the components of the pair when the value is in fact a pair; these names are bound in the scope of e1. In this way, the pattern is used to destructure compound values such as pair.
The x and y in this example are actually just instances of patterns themselves, and patterns can be nested arbitrarily deep. So for example, if we wanted only to match a pair containing 1 and 2, we could write:
(match e [(cons 1 2) e1] [_ e2])
The and-pattern is used to match the conjunction of two patterns, so (and p1 p2) matches whenever p1 and p2 both match and binds all of the names in p1 and p2. For example,
(match e [(and (cons 1 x) (cons y 2)) e1] [_ e2])
The first clause matches when e evaluates to (cons 1 2) and binds the name x to 2 and y to 1 in the scope of e1.
Here are some complete examples and how they are parsed:
Examples
> (parse-e '(match z [x x])) '#s(Match #s(Var z) (#s(PVar x)) (#s(Var x)))
> (parse-e '(match z [_ #t])) '#s(Match #s(Var z) (#s(PWild)) (#s(Bool #t)))
> (parse-e '(match z [1 #t])) '#s(Match #s(Var z) (#s(PLit 1)) (#s(Bool #t)))
> (parse-e '(match z [1 #t] [2 #f]))
'#s(Match
#s(Var z)
(#s(PLit 1) #s(PLit 2))
(#s(Bool #t) #s(Bool #f)))
> (parse-e '(match z [(cons x y) #t]))
'#s(Match
#s(Var z)
(#s(PCons #s(PVar x) #s(PVar y)))
(#s(Bool #t)))
> (parse-e '(match z [(cons 1 2) #t]))
'#s(Match
#s(Var z)
(#s(PCons #s(PLit 1) #s(PLit 2)))
(#s(Bool #t)))
> (parse-e '(match z [(and (cons x 2) (cons 1 y)) #t]))
'#s(Match
#s(Var z)
(#s(PAnd
#s(PCons #s(PVar x) #s(PLit 2))
#s(PCons #s(PLit 1) #s(PVar y))))
(#s(Bool #t)))
> (parse-define '(define (length xs) (match xs ['() 0] [(cons x xs) (add1 (length xs))])))
'#s(Defn
length
(xs)
#s(Match
#s(Var xs)
(#s(PLit ())
#s(PCons #s(PVar x) #s(PVar xs)))
(#s(Int 0)
#s(Prim1
add1
#s(App length (#s(Var xs)))))))
16.3 An Interpreter for Pattern Matching
At the heart of interpreter for Knock is the function:
; Pat Value Env -> [Maybe Env] (define (interp-match-pat p v r) ...)
This function takes a single pattern and value, along with an environment, and determines whether the pattern matches the value, and if so, an environment that binds the variables in the pattern to the sub-parts of the value that match. If the pattern doesn’t match, #f is produced.
So for example, if the pattern is simply a variable x, the function produces r extended to bind x to v. If the pattern is a wildcard, it produces r, indicating a match, but with no new bindings. Likewise, if the pattern is a literal, it produces r when the value is the same as the literal. The more interesting cases are of cons- and and-patterns which recursively match the sub-patterns.
It’s important to see that this function’s return type is communicating multiple things at the same time. If the pattern doesn’t match, it produces #f. If it produces an environment, it means the pattern matched and the environment communicates the binding of the pattern variables to values.
Examples
> (interp-match-pat (PWild) 99 '()) '()
Here the pattern matches, but binds no variables so the result is the same environment as given.
Examples
> (interp-match-pat (PVar 'x) 99 '()) '((x 99))
Here the pattern matches and binds x to 99, which is reflected in the output environment.
Examples
> (interp-match-pat (PLit 99) 99 '()) '()
Here the pattern matches but binds nothing.
Examples
> (interp-match-pat (PLit 100) 99 '()) #f
Here the pattern doesn’t match.
Examples
> (interp-match-pat (PAnd (PLit 99) (PVar 'x)) 99 '()) '((x 99))
Here the pattern matches and binds x to 99.
Examples
> (interp-match-pat (PAnd (PLit 100) (PVar 'x)) 99 '()) #f
Here the pattern doesn’t match.
Examples
> (interp-match-pat (PCons (PVar 'x) (PVar 'y)) 99 '()) #f
Here the pattern doesn’t match.
Examples
> (interp-match-pat (PCons (PVar 'x) (PVar 'y)) (cons 99 100) '()) '((y 100) (x 99))
Here the pattern matches and binds x to 99 and y to 100.
As you can see, the patterns can be nested arbitrarily deep but the environment produced will bind each variable to the appropriate sub-part of the given value:
Examples
> (interp-match-pat (PCons (PCons (PVar 'x) (PVar 'y)) (PCons (PVar 'p) (PVar 'q))) (cons (cons 99 100) (cons #t #f)) '()) '((q #f) (p #t) (y 100) (x 99))
The complete code for interp-match-pat is:
; Pat Value Env -> [Maybe Env] (define (interp-match-pat p v r) (match p [(PWild) r] [(PVar x) (ext r x v)] [(PLit l) (and (eqv? l v) r)] [(PBox p) (match v [(box v) (interp-match-pat p v r)] [_ #f])] [(PCons p1 p2) (match v [(cons v1 v2) (match (interp-match-pat p1 v1 r) [#f #f] [r1 (interp-match-pat p2 v2 r1)])] [_ #f])] [(PAnd p1 p2) (match (interp-match-pat p1 v r) [#f #f] [r1 (interp-match-pat p2 v r1)])]))
With interp-match-pat, we can then build up the function for interpreting a match expression:
; Value [Listof Pat] [Listof Expr] Env Defns -> Answer (define (interp-match v ps es r ds) ...)
This function traverses the patterns in order until finding one that matches (using interp-match-pat) and then evaluating the corresponding right-hand expression in the environment that interp-match-pat produced. If it runs out of clauses without finding a matching, it produces an error.
It’s fairly straightforward:
; Value [Listof Pat] [Listof Expr] Env Defns -> Answer (define (interp-match v ps es r ds) (match* (ps es) [('() '()) 'err] [((cons p ps) (cons e es)) (match (interp-match-pat p v r) [#f (interp-match v ps es r ds)] [r (interp-env e r ds)])]))
The complete interpreter:
#lang racket (provide interp interp-env interp-match-pat interp-match) (require "ast.rkt" "env.rkt" "interp-prims.rkt") ;; type Answer = Value | 'err ;; type Value = ;; | Integer ;; | Boolean ;; | Character ;; | Eof ;; | Void ;; | '() ;; | (cons Value Value) ;; | (box Value) ;; | (vector Value ...) ;; | (string Char ...) ;; type REnv = (Listof (List Id Value)) ;; type Defns = (Listof Defn) ;; Prog -> Answer (define (interp p) (match p [(Prog ds e) (interp-env e '() ds)])) ;; Expr Env Defns -> Answer (define (interp-env e r ds) (match e [(Int i) i] [(Bool b) b] [(Char c) c] [(Eof) eof] [(Empty) '()] [(Var x) (lookup r x)] [(Str s) s] [(Prim0 'void) (void)] [(Prim0 'read-byte) (read-byte)] [(Prim0 'peek-byte) (peek-byte)] [(Prim1 p e) (match (interp-env e r ds) ['err 'err] [v (interp-prim1 p v)])] [(Prim2 p e1 e2) (match (interp-env e1 r ds) ['err 'err] [v1 (match (interp-env e2 r ds) ['err 'err] [v2 (interp-prim2 p v1 v2)])])] [(Prim3 p e1 e2 e3) (match (interp-env e1 r ds) ['err 'err] [v1 (match (interp-env e2 r ds) ['err 'err] [v2 (match (interp-env e3 r ds) ['err 'err] [v3 (interp-prim3 p v1 v2 v3)])])])] [(If p e1 e2) (match (interp-env p r ds) ['err 'err] [v (if v (interp-env e1 r ds) (interp-env e2 r ds))])] [(Begin e1 e2) (match (interp-env e1 r ds) ['err 'err] [_ (interp-env e2 r ds)])] [(Let x e1 e2) (match (interp-env e1 r ds) ['err 'err] [v (interp-env e2 (ext r x v) ds)])] [(App f es) (match (interp-env* es r ds) ['err 'err] [vs (match (defns-lookup ds f) [(Defn f xs e) ; check arity matches (if (= (length xs) (length vs)) (interp-env e (zip xs vs) ds) 'err)])])] [(Match e ps es) (match (interp-env e r ds) ['err 'err] [v (interp-match v ps es r ds)])])) ;; Value [Listof Pat] [Listof Expr] Env Defns -> Answer (define (interp-match v ps es r ds) (match* (ps es) [('() '()) 'err] [((cons p ps) (cons e es)) (match (interp-match-pat p v r) [#f (interp-match v ps es r ds)] [r (interp-env e r ds)])])) ;; Pat Value Env -> [Maybe Env] (define (interp-match-pat p v r) (match p [(PWild) r] [(PVar x) (ext r x v)] [(PLit l) (and (eqv? l v) r)] [(PBox p) (match v [(box v) (interp-match-pat p v r)] [_ #f])] [(PCons p1 p2) (match v [(cons v1 v2) (match (interp-match-pat p1 v1 r) [#f #f] [r1 (interp-match-pat p2 v2 r1)])] [_ #f])] [(PAnd p1 p2) (match (interp-match-pat p1 v r) [#f #f] [r1 (interp-match-pat p2 v r1)])])) ;; (Listof Expr) REnv Defns -> (Listof Value) | 'err (define (interp-env* es r ds) (match es ['() '()] [(cons e es) (match (interp-env e r ds) ['err 'err] [v (match (interp-env* es r ds) ['err 'err] [vs (cons v vs)])])])) ;; Defns Symbol -> Defn (define (defns-lookup ds f) (findf (match-lambda [(Defn g _ _) (eq? f g)]) ds)) (define (zip xs ys) (match* (xs ys) [('() '()) '()] [((cons x xs) (cons y ys)) (cons (list x y) (zip xs ys))]))
We can now see it in action:
Examples
> (define (run e) (interp-env (parse-e e) '() '())) > (run '(match 1 [1 #t] [_ #f])) #t
> (run '(match 2 [1 #t] [_ #f])) #f
> (run '(match 2 [x x] [_ #f])) 2
> (run '(match (cons 1 2) [(cons x y) x] [_ #f])) 1
> (run '(match (box 1) [(box x) x] [_ #f])) 1
> (run '(match (box 1) [(box 2) #t] [_ #f])) #f
And we can use pattern matching to define functions in style similar to what we’ve been using all semester:
Examples
> (interp (parse '[(define (length xs) (match xs ['() 0] [(cons x xs) (add1 (length xs))])) (length (cons 7 (cons 8 (cons 9 '()))))])) 3
16.4 A Compiler for Pattern Matching
The compilation of pattern matching expression is significantly more complicated compared to interpretation.
Most of the complication is due to the fact that the computation of the binding structure in the interpreter must be split and mirrored across compile-time and run-time in the compiler. Each right-hand-side of a clause must be compiled in a static environment that is dependent on the variables occurring in the left-hand-side pattern. At run-time, these variables will be bound by pushing parts of the matched value on the stack.
To make matters worse, the stack will also be needed to save intermediate results for later processing. For example, in matching a cons-pattern, we must push the cdr of the pair on the stack while pattern-matching the car.
The function compile-pat has the following signature:
; Pat CEnv Symbol -> (list Asm Asm CEnv) (define (compile-pat p cm next) ...)
It consumes a single pattern, which it is compiling, a static environment that describes the bindings that have occurred so far, and label name which denotes where to jump to in order to try matching the next pattern.
It produces three things:
a sequence of instructions which determine whether the value in 'rax match the pattern p and bind any variables that may occur in p,
a sequence of instructions which handle what to do if p doesn’t match such as restoring the stack to its state before the match started and jumping to next, and
a static environment that describes the bindings of the pattern in case it matches.
Let’s look at some examples. First, consider the wildcard pattern:
Examples
> (compile-pattern (PWild) '() 'next) '(() () ())
When the pattern is a wildcard, it produces an empty sequence of instructions for the “determine if the pattern matches” part. This is because the pattern always matches. There’s nothing to do. Similarly, it produces an empty sequence of instructions for the “what to do if it doesn’t match” part because that’s impossible; this pattern always matches. Finally, it produces the environment it was given because it doesn’t bind anything.
Now pattern variables:
Examples
> (compile-pattern (PVar 'x) '() 'next) (list (list (Push 'rax)) '() '(x))
A pattern variable always matches and binds the value to x, so in the “determine and bind” part it simply pushes 'rax on to the stack to bind the value.
It has empty sequences of instructions for the “failing” part because it always matches just like a wildcard. Finally the static environment part adds x to the environment because this pattern binds x when it matches.
Pattern literals:
Examples
> (compile-pattern (PLit 0) '() 'next)
(list
(list (Cmp 'rax 0) (Jne 'g6791))
(list (Label 'g6791) (Add 'rsp 0) (Jmp 'next))
'())
In the “determine and bind” part, we compare the value in 'rax to the literal. If they are not equal, the pattern doesn’t match so control jumps a generated label that is defined in the “fail” part. The instructions in the “fail” part pop off all of the current bindings in the pattern (in this example there are none) and then jumps to next.
The environment stays the same because a literal doesn’t bind anything.
Supposing we had changed the example to:
Examples
> (compile-pattern (PLit 0) '(x y z) 'next)
(list
(list (Cmp 'rax 0) (Jne 'g6792))
(list (Label 'g6792) (Add 'rsp 24) (Jmp 'next))
'(x y z))
This is essentially saying “compile the pattern (PLit 0) assuming it occurs in the context of a surrounding pattern that binds x, y, and z before getting to this point.” If it fails, it needs to pop all three bindings of the stack, hence the “fail” code adds 24 to 'rsp before jumping to 'next.
Now we get to the inductive patterns, which will be more interesting. Let’s start with the box-pattern.
Examples
> (compile-pattern (PBox (PWild)) '() 'next)
(list
(list
(Mov 'r8 'rax)
(And 'r8 7)
(Cmp 'r8 1)
(Jne 'g6793)
(Xor 'rax 1)
(Mov 'rax (Offset 'rax 0)))
(list (Label 'g6793) (Add 'rsp 0) (Jmp 'next))
'())
This “determine and bind” part moves the value to a temporary register and masks the final three bits then compares the result to the type tag for boxes. If they are not equal, the value in 'rax is not a box, so it jumps to the generated label for the “fail” part, which pops all bound pattern variables before jumping to 'next. If the value is a box, it is untagged and the value inside the box is fetched to 'rax for the subsequent pattern to match against, in this case the wildcard. Nothing is bound so no changes in the output environment.
Let’s change the wild card to a literal:
Examples
> (compile-pattern (PBox (PLit 0)) '() 'next)
(list
(list
(Mov 'r8 'rax)
(And 'r8 7)
(Cmp 'r8 1)
(Jne 'g6795)
(Xor 'rax 1)
(Mov 'rax (Offset 'rax 0))
(Cmp 'rax 0)
(Jne 'g6794))
(list
(Label 'g6794)
(Add 'rsp 0)
(Jmp 'next)
(Label 'g6795)
(Add 'rsp 0)
(Jmp 'next))
'())
This works just like before but now in the “determine and bind” instructions, it compares the unboxed value to 0.
Notice that the code here is modifying 'rax. As it descends into the box and tries to match the inner pattern, it moves the value inside the box into 'rax. This is important because it maintains the invariant that the pattern is being matched against the value in 'rax, but it also means that in compound patterns, we may have to do more work to ensure the right value is in 'rax.
Let’s consider a cons-pattern. A cons-pattern is similar to a box pattern in that the first thing it needs to do is determine if the value is a pointer tagged with the appropriate type, in this case the cons tag. Then it needs to move a value into 'rax and check if a subpattern matches. In particular, it needs to move the car value into 'rax and check if the first subpattern matches.
Assuming it does match, what happens next? We need to move the cdr value in to 'rax and check it matches the second subpattern. But where can we get the cdr? The moment we overwrite 'rax with the car, we’ve lost a handle on the pair and thus access to the cdr.
The solution is to use the same mechanism we’ve always used to save values: push it on the stack and fetch it later. With this in mind, consider the following example for matching (cons 0 0):
Examples
> (compile-pattern (PCons (PLit 0) (PLit 0)) '() 'next)
(list
(list
(Mov 'r8 'rax)
(And 'r8 7)
(Cmp 'r8 2)
(Jne 'g6798)
(Xor 'rax 2)
(Mov 'r8 (Offset 'rax 0))
(Push 'r8)
(Mov 'rax (Offset 'rax 8))
(Cmp 'rax 0)
(Jne 'g6796)
(Mov 'rax (Offset 'rsp 0))
(Cmp 'rax 0)
(Jne 'g6797))
(list
(Label 'g6796)
(Add 'rsp 8)
(Jmp 'next)
(Label 'g6797)
(Add 'rsp 8)
(Jmp 'next)
(Label 'g6798)
(Add 'rsp 0)
(Jmp 'next))
'(#f))
This starts off like the box pattern checking the tag bits of the value. But then, before moving the car into 'rax, it pushes the cdr on the stack. It then installs the car and checks if it matches 0. If it does, it then installs the cdr off the stack and into 'rax to check if it too is 0. Note that if either subpatterns fail to match, they both jump to code that pops a single element off the stack, which is the stashed away cdr value that was pushed.
Also note that the static environment produced is '(#f) to account for the cdr value that was pushed.
The and-pattern is a bit like cons in that it has to push a value on the stack in order to restore it after matching the first subpattern:
Examples
> (compile-pattern (PAnd (PLit 0) (PLit 0)) '() 'next)
(list
(list
(Push 'rax)
(Cmp 'rax 0)
(Jne 'g6799)
(Mov 'rax (Offset 'rsp 0))
(Cmp 'rax 0)
(Jne 'g6800))
(list
(Label 'g6799)
(Add 'rsp 8)
(Jmp 'next)
(Label 'g6800)
(Add 'rsp 8)
(Jmp 'next))
'(#f))
The compile-pattern function is used by compile-match-clause which takes care of compiling a single match clause. It is given a pattern patterns and a right-hand-side expression to execute should the pattern match, an environment that describes the current bindings, a label to jump to when the code is done, i.e. the correct result is in 'rax, and finally a boolean indicating if this match expression is in tail position.
; Pat Expr CEnv Symbol Bool -> Asm (define (compile-match-clause p e c done t?) ...)
This function stitches together the parts returned by compile-pattern to implement a clause. This function assumes the value to be matched is the top element of the stack, so the first thing is does is fetch the value and install it in 'rax. It then executes the “determine if the pattern matches and bind” code followed by the right hand side expression, then pops all the pattern-bound values off the stack and jumps to done. After this it emits the code for what to do if the pattern doesn’t fail (thus jumping to done will jump past this code).
Consider a match clause like [_ #t]:
Examples
> (compile-match-clause (PWild) (Bool #t) '() 'done #f)
(list
(Mov 'rax (Offset 'rsp 0))
(Mov 'rax 24)
(Add 'rsp 0)
(Jmp 'done)
(Label 'g6801))
Here we can see the value being matched is fetched from the top of the stack. Since this pattern always matches, it next executes the right-hand-side by moving the bit-representation of #t into 'rax. It pops everything matching the pattern pushed on the stack (in this case nothing), then jumps to done. The final label, which is never reached, is where control should jump to in order to try matching the next clause.
Let’s look at a literal; consider a clause [0 #t]:
Examples
> (compile-match-clause (PLit 0) (Bool #t) '() 'done #f)
(list
(Mov 'rax (Offset 'rsp 0))
(Cmp 'rax 0)
(Jne 'g6803)
(Mov 'rax 24)
(Add 'rsp 0)
(Jmp 'done)
(Label 'g6803)
(Add 'rsp 0)
(Jmp 'g6802)
(Label 'g6802))
As always, it starts by fetching the top of the stack and putting the value in 'rax. It then does the “determine if matches and bind” instructions followed by the right-hand-side. If the value in 'rax is not 0 it will jump to code that handles the failure to match by popping of anything pushed to the stack (in this case nothing) and then jumping to the next clause (in this case, that’s the next label, but this isn’t the case in general). If the value in 'rax is 0, #t is moved into 'rax, the stack is popped, and control jumps to done.
Let’s see what a clause involving a pattern variable looks like, e.g. [x x]. Here we’re going to reference the variable bound in the pattern in the right-hand-side:
Examples
> (compile-match-clause (PVar 'x) (Var 'x) '() 'done #f)
(list
(Mov 'rax (Offset 'rsp 0))
(Push 'rax)
(Mov 'rax (Offset 'rsp 0))
(Add 'rsp 8)
(Jmp 'done)
(Label 'g6804))
The value being matched is fetched from the stack. It’s immediately pushed (again) to the stack because the variable pattern always matches and binds. We then execute the right hand side, which is just a reference to x, hence it fetches the top element of the stack, then pops this off and jumps to done.
OK, now let’s try something like [(box x) x]:
Examples
> (compile-match-clause (PBox (PVar 'x)) (Var 'x) '() 'done #f)
(list
(Mov 'rax (Offset 'rsp 0))
(Mov 'r8 'rax)
(And 'r8 7)
(Cmp 'r8 1)
(Jne 'g6806)
(Xor 'rax 1)
(Mov 'rax (Offset 'rax 0))
(Push 'rax)
(Mov 'rax (Offset 'rsp 0))
(Add 'rsp 8)
(Jmp 'done)
(Label 'g6806)
(Add 'rsp 0)
(Jmp 'g6805)
(Label 'g6805))
The value being matched is fetched from the stack. It’s checked for whether it is a box, jump away when it isn’t. Otherwise it unboxes the value and pushes it on the stack to bind to x, the executes the RHS, which fetches x into 'rax, pops, and jumps to done.
Here is the complete code for compile-match-clause:
; Pat Expr CEnv Symbol Bool -> Asm (define (compile-match-clause p e c done t?) (let ((next (gensym))) (match (compile-pattern p '() next) [(list i f cm) (seq (Mov rax (Offset rsp 0)) ; restore value being matched i (compile-e e (append cm c) t?) (Add rsp (* 8 (length cm))) (Jmp done) f (Label next))])))
Generating code for a sequence of match clauses is as simple as generate the code for each clause in sequence:
; [Listof Pat] [Listof Expr] CEnv Symbol Bool -> Asm (define (compile-match-clauses ps es c done t?) (match* (ps es) [('() '()) (seq)] [((cons p ps) (cons e es)) (seq (compile-match-clause p e c done t?) (compile-match-clauses ps es c done t?))]))
Finally, we have a function for compiling a complete match expression:
; Expr [Listof Pat] [Listof Expr] CEnv Bool -> Asm (define (compile-match e ps es c t?) (let ((done (gensym))) (seq (compile-e e c t?) (Push rax) ; save away to be restored by each clause (compile-match-clauses ps es (cons #f c) done t?) (Jmp 'raise_error_align) (Label done) (Add rsp 8)))) ; pop the saved value being matched
We can check that the compiler works for a complete example:
Examples
> (define (run p) (unload/free (asm-interp (compile (parse p)))))
> (run '[(define (length xs) (match xs ['() 0] [(cons x xs) (add1 (length xs))])) (length (cons 7 (cons 8 (cons 9 '()))))]) 3
With these pieces in place, here’s the complete compiler:
#lang racket (provide (all-defined-out)) (require "ast.rkt" "types.rkt" "compile-ops.rkt" a86/ast) ;; Registers used (define rax 'rax) ; return (define rbx 'rbx) ; heap (define rsp 'rsp) ; stack (define rdi 'rdi) ; arg ;; type CEnv = [Listof Variable] ;; Prog -> Asm (define (compile p) (match p [(Prog ds e) (prog (externs) (Global 'entry) (Label 'entry) (Mov rbx rdi) ; recv heap pointer (compile-e e '() #t) (Ret) (compile-defines ds) (Label 'raise_error_align) pad-stack (Call 'raise_error))])) (define (externs) (seq (Extern 'peek_byte) (Extern 'read_byte) (Extern 'write_byte) (Extern 'raise_error))) ;; [Listof Defn] -> Asm (define (compile-defines ds) (match ds ['() (seq)] [(cons d ds) (seq (compile-define d) (compile-defines ds))])) ;; Defn -> Asm (define (compile-define d) (match d [(Defn f xs e) (seq (Label (symbol->label f)) (compile-e e (reverse xs) #t) (Add rsp (* 8 (length xs))) ; pop args (Ret))])) ;; Expr CEnv Bool -> Asm (define (compile-e e c t?) (match e [(Int i) (compile-value i)] [(Bool b) (compile-value b)] [(Char c) (compile-value c)] [(Eof) (compile-value eof)] [(Empty) (compile-value '())] [(Var x) (compile-variable x c)] [(Str s) (compile-string s)] [(Prim0 p) (compile-prim0 p c)] [(Prim1 p e) (compile-prim1 p e c)] [(Prim2 p e1 e2) (compile-prim2 p e1 e2 c)] [(Prim3 p e1 e2 e3) (compile-prim3 p e1 e2 e3 c)] [(If e1 e2 e3) (compile-if e1 e2 e3 c t?)] [(Begin e1 e2) (compile-begin e1 e2 c t?)] [(Let x e1 e2) (compile-let x e1 e2 c t?)] [(App f es) (compile-app f es c t?)] [(Match e ps es) (compile-match e ps es c t?)])) ;; Value -> Asm (define (compile-value v) (seq (Mov rax (imm->bits v)))) ;; Id CEnv -> Asm (define (compile-variable x c) (let ((i (lookup x c))) (seq (Mov rax (Offset rsp i))))) ;; String -> Asm (define (compile-string s) (let ((len (string-length s))) (if (zero? len) (seq (Mov rax type-str)) (seq (Mov rax len) (Mov (Offset rbx 0) rax) (compile-string-chars (string->list s) 8) (Mov rax rbx) (Or rax type-str) (Add rbx (+ 8 (* 4 (if (odd? len) (add1 len) len)))))))) ;; [Listof Char] Integer -> Asm (define (compile-string-chars cs i) (match cs ['() (seq)] [(cons c cs) (seq (Mov rax (char->integer c)) (Mov (Offset rbx i) 'eax) (compile-string-chars cs (+ 4 i)))])) ;; Op0 CEnv -> Asm (define (compile-prim0 p c) (compile-op0 p)) ;; Op1 Expr CEnv -> Asm (define (compile-prim1 p e c) (seq (compile-e e c #f) (compile-op1 p))) ;; Op2 Expr Expr CEnv -> Asm (define (compile-prim2 p e1 e2 c) (seq (compile-e e1 c #f) (Push rax) (compile-e e2 (cons #f c) #f) (compile-op2 p))) ;; Op3 Expr Expr Expr CEnv -> Asm (define (compile-prim3 p e1 e2 e3 c) (seq (compile-e e1 c #f) (Push rax) (compile-e e2 (cons #f c) #f) (Push rax) (compile-e e3 (cons #f (cons #f c)) #f) (compile-op3 p))) ;; Expr Expr Expr CEnv Bool -> Asm (define (compile-if e1 e2 e3 c t?) (let ((l1 (gensym 'if)) (l2 (gensym 'if))) (seq (compile-e e1 c #f) (Cmp rax val-false) (Je l1) (compile-e e2 c t?) (Jmp l2) (Label l1) (compile-e e3 c t?) (Label l2)))) ;; Expr Expr CEnv Bool -> Asm (define (compile-begin e1 e2 c t?) (seq (compile-e e1 c #f) (compile-e e2 c t?))) ;; Id Expr Expr CEnv Bool -> Asm (define (compile-let x e1 e2 c t?) (seq (compile-e e1 c #f) (Push rax) (compile-e e2 (cons x c) t?) (Add rsp 8))) ;; Id [Listof Expr] CEnv Bool -> Asm (define (compile-app f es c t?) (if t? (compile-app-tail f es c) (compile-app-nontail f es c))) ;; Id [Listof Expr] CEnv -> Asm (define (compile-app-tail f es c) (seq (compile-es es c) (move-args (length es) (length c)) (Add rsp (* 8 (length c))) (Jmp (symbol->label f)))) ;; Integer Integer -> Asm (define (move-args i off) (cond [(zero? off) (seq)] [(zero? i) (seq)] [else (seq (Mov r8 (Offset rsp (* 8 (sub1 i)))) (Mov (Offset rsp (* 8 (+ off (sub1 i)))) r8) (move-args (sub1 i) off))])) ;; Id [Listof Expr] CEnv -> Asm ;; The return address is placed above the arguments, so callee pops ;; arguments and return address is next frame (define (compile-app-nontail f es c) (let ((r (gensym 'ret))) (seq (Lea rax r) (Push rax) (compile-es es (cons #f c)) (Jmp (symbol->label f)) (Label r)))) ;; [Listof Expr] CEnv -> Asm (define (compile-es es c) (match es ['() '()] [(cons e es) (seq (compile-e e c #f) (Push rax) (compile-es es (cons #f c)))])) ;; Expr [Listof Pat] [Listof Expr] CEnv Bool -> Asm (define (compile-match e ps es c t?) (let ((done (gensym))) (seq (compile-e e c #f) (Push rax) ; save away to be restored by each clause (compile-match-clauses ps es (cons #f c) done t?) (Jmp 'raise_error_align) (Label done) (Add rsp 8)))) ; pop the saved value being matched ;; [Listof Pat] [Listof Expr] CEnv Symbol Bool -> Asm (define (compile-match-clauses ps es c done t?) (match* (ps es) [('() '()) (seq)] [((cons p ps) (cons e es)) (seq (compile-match-clause p e c done t?) (compile-match-clauses ps es c done t?))])) ;; Pat Expr CEnv Symbol Bool -> Asm (define (compile-match-clause p e c done t?) (let ((next (gensym))) (match (compile-pattern p '() next) [(list i f cm) (seq (Mov rax (Offset rsp 0)) ; restore value being matched i (compile-e e (append cm c) t?) (Add rsp (* 8 (length cm))) (Jmp done) f (Label next))]))) ;; Pat CEnv Symbol -> (list Asm Asm CEnv) (define (compile-pattern p cm next) (match p [(PWild) (list (seq) (seq) cm)] [(PVar x) (list (seq (Push rax)) (seq) (cons x cm))] [(PLit l) (let ((fail (gensym))) (list (seq (Cmp rax (imm->bits l)) (Jne fail)) (seq (Label fail) (Add rsp (* 8 (length cm))) (Jmp next)) cm))] [(PAnd p1 p2) (match (compile-pattern p1 (cons #f cm) next) [(list i1 f1 cm1) (match (compile-pattern p2 cm1 next) [(list i2 f2 cm2) (list (seq (Push rax) i1 (Mov rax (Offset rsp (* 8 (- (sub1 (length cm1)) (length cm))))) i2) (seq f1 f2) cm2)])])] [(PBox p) (match (compile-pattern p cm next) [(list i1 f1 cm1) (let ((fail (gensym))) (list (seq (Mov r8 rax) (And r8 ptr-mask) (Cmp r8 type-box) (Jne fail) (Xor rax type-box) (Mov rax (Offset rax 0)) i1) (seq f1 (Label fail) (Add rsp (* 8 (length cm))) ; haven't pushed anything yet (Jmp next)) cm1))])] [(PCons p1 p2) (match (compile-pattern p1 (cons #f cm) next) [(list i1 f1 cm1) (match (compile-pattern p2 cm1 next) [(list i2 f2 cm2) (let ((fail (gensym))) (list (seq (Mov r8 rax) (And r8 ptr-mask) (Cmp r8 type-cons) (Jne fail) (Xor rax type-cons) (Mov r8 (Offset rax 0)) (Push r8) ; push cdr (Mov rax (Offset rax 8)) ; mov rax car i1 (Mov rax (Offset rsp (* 8 (- (sub1 (length cm1)) (length cm))))) i2) (seq f1 f2 (Label fail) (Add rsp (* 8 (length cm))) ; haven't pushed anything yet (Jmp next)) cm2))])])])) ;; Id CEnv -> Integer (define (lookup x cenv) (match cenv ['() (error "undefined variable:" x)] [(cons y rest) (match (eq? x y) [#t 0] [#f (+ 8 (lookup x rest))])])) ;; Symbol -> Label ;; Produce a symbol that is a valid Nasm label (define (symbol->label s) (string->symbol (string-append "label_" (list->string (map (λ (c) (if (or (char<=? #\a c #\z) (char<=? #\A c #\Z) (char<=? #\0 c #\9) (memq c '(#\_ #\$ #\# #\@ #\~ #\. #\?))) c #\_)) (string->list (symbol->string s)))) "_" (number->string (eq-hash-code s) 16))))