8 Fraud: local binding and variables
To be is to be the value of a variable.
8.1 Variables
Let’s now consider add a notion of local binding to our target language.
We’ll call it Fraud.
We will use the following syntax to bind local variables:
(let ((id0 e0)) e)
This form binds the identifier i0 to value of e0 within the scope of e.
This is a specialization of Racket’s own local binding form, which allows for any number of bindings to be made with let:
(let ((id0 e0) ...) e)
We adopt this specialization of Racket’s let syntax so that you can always take a Fraud program and run it in Racket to confirm what it should produce.
Adding a notion of variable binding also means we need to add variables to the syntax of expressions.
Together this leads to the following grammar for Fraud:
Which can be modeled with the following data type definition:
#lang racket ;; type Expr = ;; | Integer ;; | Boolean ;; | Variable ;; | `(add1 ,Expr) ;; | `(sub1 ,Expr) ;; | `(zero? ,Expr) ;; | `(if ,Expr ,Expr ,Expr) ;; | `(let ((,Variable ,Expr)) ,Expr) ;; type Variable = Symbol (except 'add1 'sub1 'if 'let)
We will also need a predicate for well-formed Fraud expressions, but let’s return to this after considering the semantics and interpreter.
8.2 Meaning of Fraud programs
The meaning of Fraud programs depends on the form of the expression and in the case of integers, increments, and decrements, the meaning is the same as in the prior languages.
The two new forms are let-expressions and variables.
the meaning of a let expression (let ((x e0)) e) is the meaning of e (the body of the let) when x means the value of e0 (the right hand side of the let),
the meaning of a variable x depends on the context in which it is bound. It means the value of the right-hand side of the nearest enclosing let expression that binds x. If there is no such enclosing let expression, the variable is meaningless.
Let’s consider some examples:
x: this expression is meaningless on its own.
(let ((x 7)) x): this means 7, since the body expression, x, means 7 because the nearest enclosing binding for x is to 7, which means 7.
(let ((x 7)) 2): this means 2 since the body expression, 2, means 2.
(let ((x 7)) (add1 x)): this means 8 since the body expression, (add1 x), means one more than x and x means 7 because the nearest enclosing binding for x is to 7.
(let ((x (add1 7))) x): this means 8 since the body expression, x, means 8 because the nearest enclosing binding for x is to (add1 7) which means 8.
(let ((x 7)) (let ((y 2)) x)): this means 7 since the body expression, (let ((y 2)) x), means 2 since the body expression, x, means 7 since the nearest enclosing binding for x is to 7.
(let ((x 7)) (let ((x 2)) x)): this means 2 since the body expression, (let ((x 2)) x), means 2 since the body expression, x, means 7 since the nearest enclosing binding for x is to 2.
(let ((x (add1 x))) x): this is meaningless, since the right-hand side expression, (add1 x) is meaningless because x has no enclosing let that binds it.
(let ((x 7)) (let ((x (add1 x))) x)): this means 8 because the body expression (let ((x (add1 x))) x) means 8 because the body expression, x, is bound to (add1 x) is in the nearest enclosing let expression that binds x and (add1 x) means 8 because it is one more than x where x is bound to 7 in the nearest enclosing let that binds it.
Make sure you have a good understanding of how binding work in these examples before moving on. Remember: you can always check your understanding by pasting expressions into Racket and seeing what it produces.
One thing that should be clear from these examples is that the meaning of a sub-expression is not determined by the form of that expression alone. For example, x could mean 7, or it could mean 8, or it could be meaningless, or it could mean 22, etc. It depends on the context in which it occurs. So in formulating the meaning of an expression, we will have to have take this context into account.
Thinking more about what information we need to keep track of reveals that when considering the meaning of a let’s body, we need to know that the variable it’s binding means the value of the right hand expression. Since a program potentially consists of nested let expressions, we will need to keep track of some number of pairs of variables and their meaning. We will refer to this contextual information as an environment.
The meaning of a variable is resolved by looking up its meaning in the environment. The meaning of a let will depend on the meaning of its body with an extended environment that associates its variable binding to the value of the right hand side.
The heart of the semantics is an auxiliary relation, , which relates an expression and an environement to the integer the expression evaluates to (in the given environment):
These rely on two functions: one for extending an environment with a variable binding and one for lookup up a variable binding in an environment:
The operational semantics for Fraud is then defined as a binary relation , which says that (e,i) in , only when e evaluates to i in the empty environment according to :
The interpreter closely mirrors the semantics. The top-level interp function relies on a helper function interp-env that takes an expression and environment and computes the result. It is defined by structural recursion on the expression. Environments are represented as lists of associations between variables and integers. There are two helper functions for ext and lookup:
#lang racket (provide (all-defined-out)) ;; type REnv = (Listof (List Variable Value)) ;; Expr -> Answer (define (interp e) (interp-env e '())) ;; Expr REnv -> Answer (define (interp-env e r) (match e [(? value? v) v] [(list (? prim? p) e) (let ((a (interp-env e r))) (interp-prim p a))] [`(if ,e0 ,e1 ,e2) (match (interp-env e0 r) ['err 'err] [v (if v (interp-env e1 r) (interp-env e2 r))])] [(? symbol? x) (lookup r x)] [`(let ((,x ,e0)) ,e1) (match (interp-env e0 r) ['err 'err] [v (interp-env e1 (ext r x v))])])) ;; Any -> Boolean (define (prim? x) (and (symbol? x) (memq x '(add1 sub1 zero?)))) ;; Any -> Boolean (define (value? x) (or (integer? x) (boolean? x))) ;; Prim Answer -> Answer (define (interp-prim p a) (match (list p a) [(list p 'err) 'err] [(list 'add1 (? integer? i0)) (+ i0 1)] [(list 'sub1 (? integer? i0)) (- i0 1)] [(list 'zero? (? integer? i0)) (zero? i0)] [_ 'err])) ;; REnv Variable -> Answer (define (lookup env x) (match env ['() 'err] [(cons (list y v) env) (match (symbol=? x y) [#t v] [#f (lookup env x)])])) ;; REnv Variable Value -> Value (define (ext r x v) (cons (list x v) r))
We can confirm the interpreter computes the right result for the examples given earlier:
Examples
> (interp 'x) 'err
> (interp '(let ((x 7)) x)) 7
> (interp '(let ((x 7)) 2)) 2
> (interp '(let ((x 7)) (add1 x))) 8
> (interp '(let ((x (add1 7))) x)) 8
> (interp '(let ((x 7)) (let ((y 2)) x))) 7
> (interp '(let ((x 7)) (let ((x 2)) x))) 2
> (interp '(let ((x (add1 x))) x)) 'err
> (interp '(let ((x 7)) (let ((x (add1 x))) x))) 8
Interpreter Correctness: For all Fraud expressions e and values v, if (e,v) in , then (interp e) equals v.
8.3 Lexical Addressing
Just as we did with Dupe: a duplicity of types, the best way of understanding the forthcoming compiler is to write a “low-level” interpreter that explains some of the ideas used in the compiler without getting bogged down in code generation details.
At first glance at interp, it would seem we will need to generate code for implementing the REnv data structure and its associated operations: lookup and ext. REnv is an inductively defined data type, represented in the interpreter as a list of lists. Interpreting a variable involves recursively scanning the environment until the appropriate binding is found. This would take some doing to accomplish in x86.
However...
It is worth noting some invariants about environments created during the running of interp. Consider some subexpression e of the program. What environment will be used whenever e is interpreted? Well, it will be a mapping of every variable bound outside of e. It’s not so easy to figure out what these variables will be bound to, but the skeleton of the environment can be read off from the program structure.
For example:
(let ((x ...)) (let ((y ...)) (let ((z ...)) e)))
'((z ...) (y ...) (x ...))
Moreover, every free occurrence of x in e will resolve to the value in the third element of the environment; every free occurrence of y in e will resolve to the second element; etc.
This suggests that variable locations can be resolved statically using lexical addresses. The lexical address of a variable is the number of let-bindings between the variable occurrence and the let that binds it.
So for example:
(let ((x ...)) x): the occurrence of x has a lexical address of 0; there are no bindings between the let that binds x and its occurrence.
(let ((x ...)) (let ((y ...)) x)): the occurrence of x has a lexical address of 1 since the let-binding of y sits between the let-binding of x and its occurrence.
We can view a variable name as just a placeholder for the lexical address; it tells us which binder the variable comes from.
Using this idea, let’s build an alternative interpreter that operates over an intermediate form of expression that has no notion of variables, but just lexical addresses:
; type IExpr = ; | Integer ; | Boolean ; | ‘(address ,Natural) ; | ‘(add1 ,Expr) ; | ‘(sub1 ,Expr) ; | ‘(zero? ,Expr) ; | ‘(if ,Expr ,Expr ,Expr) ; | ‘(let ((_ ,Expr)) ,Expr)
Notice that variables have gone away, replaced by a `(address ,Natural) form. The let binding form no longer binds a variable name either.
The idea is that we will translate expression (Expr) like:
(let ((x ...)) x)
into intermediate expressions (IExpr) like:
And:
into:
Similar to how interp is defined in terms of a helper function that takes an environment mapping variables to their value, the translate function will be defined in terms of a helper function that takes an environment mapping variables to their lexical address.
The lexical environment will just be a list of variable names. The address of a variable occurrence is count of variable names that occur before it in the list. When a variable is bound (via-let) the list grows:
#lang racket (provide (all-defined-out)) (require (only-in "interp.rkt" prim? value? interp-prim)) ;; type LEnv = (Listof Variable) ;; Expr -> IExpr (define (translate e) (translate-e e '())) ;; Expr LEnv -> IExpr (define (translate-e e r) (match e [(? value? v) v] [(list (? prim? p) e) (list p (translate-e e r))] [`(if ,e0 ,e1 ,e2) `(if ,(translate-e e0) ,(translate-e e1) ,(translate-e e2))] [(? symbol? x) `(address ,(lexical-address x r))] [`(let ((,x ,e0)) ,e1) `(let ((_ ,(translate-e e0 r))) ,(translate-e e1 (cons x r)))])) ;; Variable LEnv -> Natural (define (lexical-address x r) (match r ['() (error "unbound variable")] [(cons y r) (match (symbol=? x y) [#t 0] [#f (add1 (lexical-address x r))])]))
Notice that translate is a kind of mini-compiler that compiles Exprs to IExprs. It’s only job is to eliminate variable names by replacing variable occurrences with their lexical addresses. It does a minor amount of syntax checking while it’s at it by raising a (compile-time) error in the case of unbound variables.
The interpreter for IExprs will still have an environment data structure, however it will be simpler the association list we started with. The run-time environment will consist only of a list of values; the lexical address of (what used to be a) variable indicates the position in this list. When a value is bound by a let, the list grows:
#lang racket (provide (all-defined-out)) (require (only-in "interp.rkt" prim? value? interp-prim) "translate.rkt") ;; type VEnv = (Listof Value) ;; Expr -> Answer (define (interp e) (interp-env (translate e) '())) ;; IExpr VEnv -> Answer (define (interp-env e r) (match e [(? value? v) v] [(list (? prim? p) e) (let ((a (interp-env e r))) (interp-prim p a))] [`(if ,e0 ,e1 ,e2) (match (interp-env e0 r) ['err 'err] [v (if v (interp-env e1 r) (interp-env e2 r))])] [`(address ,i) (list-ref r i)] [`(let ((_ ,e0)) ,e1) (match (interp-env e0 r) ['err 'err] [v (interp-env e1 (cons v r))])]))
Try to convince yourself that the two version of interp compute the same function.
8.4 An Example of Fraud compilation
Suppose we want to compile (let ((x 7)) (add1 x)). There are two new forms we need to compile: the (let ((x ...)) ...) part and the x part in the body.
We already know how to compile the (add1 ...) part and the 7 part.
What needs to happen? Compiling the 7 part will emit instructions that, when run, leave 7 in the 'rax register. Compiling the (add1 ...) part relies on the result of evaluating it’s subexpression to be in 'rax when it increments it. So, compile the variable binding needs to stash the 7 somewhere and compiling the variable occurrence needs to retrieve that stashed value. After the let expression has been run, the stashed value should go away since the variable is no longer in scope.
This “stashing” of values follows a stack discipline. When entering a let, after the right-hand side has been run, the result should be pushed. When evaluating a variable occurrence, the bound value is on the stack. After exiting the let, the stack can be popped.
Suppose we want to compile (let ((x 7)) (let ((y 2)) (add1 x))). Using the intuition developed so far, we should push 7, push 8, and then run the body. But notice that the value of 'x is no longer on the top of the stack; y is. So to retrieve the value of x we need jump past the y. But calculating these offsets is pretty straightforward. In this example there is one binding between the binding of x and this occurrence. Since we push every time we enter a let and pop every time we leave, the number of bindings between an occurrence and its binder is exactly the offset from the top of the stack we need use.
#lang racket ;; type Arg = ;; ... ;; | `(offset ,Reg ,Integer) ;; type Reg = ;; ... ;; | `rsp
#lang racket (provide (all-defined-out)) ;; type CEnv = [Listof Variable] ;; Expr -> Asm (define (compile e) `(entry ,@(compile-e e '()) ret err (push rbp) (call error) ret)) ;; Expr CEnv -> Asm (define (compile-e e c) (match e [(? integer? i) `((mov rax ,(* i 2)))] [(? boolean? b) `((mov rax ,(if b #b11 #b01)))] [`(add1 ,e0) (let ((c0 (compile-e e0 c))) `(,@c0 ,@assert-integer (add rax 2)))] [`(sub1 ,e0) (let ((c0 (compile-e e0 c))) `(,@c0 ,@assert-integer (sub rax 2)))] [`(zero? ,e0) (let ((c0 (compile-e e0 c)) (l0 (gensym)) (l1 (gensym))) `(,@c0 ,@assert-integer (cmp rax 0) (mov rax #b01) ; #f (jne ,l0) (mov rax #b11) ; #t ,l0))] [`(if ,e0 ,e1 ,e2) (let ((c0 (compile-e e0 c)) (c1 (compile-e e1 c)) (c2 (compile-e e2 c)) (l0 (gensym)) (l1 (gensym))) `(,@c0 (cmp rax #b01) ; compare to #f (je ,l0) ; jump to c2 if #f ,@c1 (jmp ,l1) ; jump past c2 ,l0 ,@c2 ,l1))] [(? symbol? x) (let ((i (lookup x c))) `((mov rax (offset rsp ,(- (add1 i))))))] [`(let ((,x ,e0)) ,e1) (let ((c0 (compile-e e0 c)) (c1 (compile-e e1 (cons x c)))) `(,@c0 (mov (offset rsp ,(- (add1 (length c)))) rax) ,@c1))])) ;; Variable CEnv -> Natural (define (lookup x cenv) (match cenv ['() (error "undefined variable:" x)] [(cons y cenv) (match (symbol=? x y) [#t (length cenv)] [#f (lookup x cenv)])])) (define assert-integer `((mov rbx rax) (and rbx 1) (cmp rbx 0) (jne err)))
Examples
> (asm-display (compile 'x)) undefined variable: x
> (asm-display (compile '(let ((x 7)) x)))
global entry
extern error
section .text
entry:
mov rax, 14
mov [rsp + -8], rax
mov rax, [rsp + -8]
ret
err:
push rbp
call error
ret
> (asm-display (compile '(let ((x 7)) 2)))
global entry
extern error
section .text
entry:
mov rax, 14
mov [rsp + -8], rax
mov rax, 4
ret
err:
push rbp
call error
ret
> (asm-display (compile '(let ((x 7)) (add1 x))))
global entry
extern error
section .text
entry:
mov rax, 14
mov [rsp + -8], rax
mov rax, [rsp + -8]
mov rbx, rax
and rbx, 1
cmp rbx, 0
jne err
add rax, 2
ret
err:
push rbp
call error
ret
> (asm-display (compile '(let ((x (add1 7))) x)))
global entry
extern error
section .text
entry:
mov rax, 14
mov rbx, rax
and rbx, 1
cmp rbx, 0
jne err
add rax, 2
mov [rsp + -8], rax
mov rax, [rsp + -8]
ret
err:
push rbp
call error
ret
> (asm-display (compile '(let ((x 7)) (let ((y 2)) x))))
global entry
extern error
section .text
entry:
mov rax, 14
mov [rsp + -8], rax
mov rax, 4
mov [rsp + -16], rax
mov rax, [rsp + -8]
ret
err:
push rbp
call error
ret
> (asm-display (compile '(let ((x 7)) (let ((x 2)) x))))
global entry
extern error
section .text
entry:
mov rax, 14
mov [rsp + -8], rax
mov rax, 4
mov [rsp + -16], rax
mov rax, [rsp + -16]
ret
err:
push rbp
call error
ret
> (asm-display (compile '(let ((x (add1 x))) x))) undefined variable: x
> (asm-display (compile '(let ((x 7)) (let ((x (add1 x))) x))))
global entry
extern error
section .text
entry:
mov rax, 14
mov [rsp + -8], rax
mov rax, [rsp + -8]
mov rbx, rax
and rbx, 1
cmp rbx, 0
jne err
add rax, 2
mov [rsp + -16], rax
mov rax, [rsp + -16]
ret
err:
push rbp
call error
ret
Examples
> (asm-interp (compile 'x)) undefined variable: x
> (asm-interp (compile '(let ((x 7)) x))) 7
> (asm-interp (compile '(let ((x 7)) 2))) 2
> (asm-interp (compile '(let ((x 7)) (add1 x)))) 8
> (asm-interp (compile '(let ((x (add1 7))) x))) 8
> (asm-interp (compile '(let ((x 7)) (let ((y 2)) x)))) 7
> (asm-interp (compile '(let ((x 7)) (let ((x 2)) x)))) 2
> (asm-interp (compile '(let ((x (add1 x))) x))) undefined variable: x
> (asm-interp (compile '(let ((x 7)) (let ((x (add1 x))) x)))) 8