On this page:
8.1 Variables
8.2 Meaning of Fraud programs
8.3 Lexical Addressing
8.4 An Example of Fraud compilation
7.4

8 Fraud: local binding and variables

To be is to be the value of a variable.

    8.1 Variables

    8.2 Meaning of Fraud programs

    8.3 Lexical Addressing

    8.4 An Example of Fraud compilation

8.1 Variables

Let’s now consider add a notion of local binding to our target language.

We’ll call it Fraud.

We will use the following syntax to bind local variables:

(let ((id0 e0))
  e)

This form binds the identifier i0 to value of e0 within the scope of e.

This is a specialization of Racket’s own local binding form, which allows for any number of bindings to be made with let:

(let ((id0 e0) ...)
  e)

We adopt this specialization of Racket’s let syntax so that you can always take a Fraud program and run it in Racket to confirm what it should produce.

Adding a notion of variable binding also means we need to add variables to the syntax of expressions.

Together this leads to the following grammar for Fraud:

image

Which can be modeled with the following data type definition:

fraud/ast.rkt

  #lang racket
  ;; type Expr =
  ;; | Integer
  ;; | Boolean
  ;; | Variable
  ;; | `(add1 ,Expr)
  ;; | `(sub1 ,Expr)
  ;; | `(zero? ,Expr)
  ;; | `(if ,Expr ,Expr ,Expr)
  ;; | `(let ((,Variable ,Expr)) ,Expr)
   
  ;; type Variable = Symbol (except 'add1 'sub1 'if 'let)
   

We will also need a predicate for well-formed Fraud expressions, but let’s return to this after considering the semantics and interpreter.

8.2 Meaning of Fraud programs

The meaning of Fraud programs depends on the form of the expression and in the case of integers, increments, and decrements, the meaning is the same as in the prior languages.

The two new forms are let-expressions and variables.

Let’s consider some examples:

Make sure you have a good understanding of how binding work in these examples before moving on. Remember: you can always check your understanding by pasting expressions into Racket and seeing what it produces.

One thing that should be clear from these examples is that the meaning of a sub-expression is not determined by the form of that expression alone. For example, x could mean 7, or it could mean 8, or it could be meaningless, or it could mean 22, etc. It depends on the context in which it occurs. So in formulating the meaning of an expression, we will have to have take this context into account.

Thinking more about what information we need to keep track of reveals that when considering the meaning of a let’s body, we need to know that the variable it’s binding means the value of the right hand expression. Since a program potentially consists of nested let expressions, we will need to keep track of some number of pairs of variables and their meaning. We will refer to this contextual information as an environment.

The meaning of a variable is resolved by looking up its meaning in the environment. The meaning of a let will depend on the meaning of its body with an extended environment that associates its variable binding to the value of the right hand side.

The heart of the semantics is an auxiliary relation, image, which relates an expression and an environement to the integer the expression evaluates to (in the given environment):

The rules for dealing with the new forms (variables and lets) are:

image    image

These rely on two functions: one for extending an environment with a variable binding and one for lookup up a variable binding in an environment:

image

image

The remaining rules are just an adaptation of the existing rules from Extort to thread the environment through. For example, here are just a couple:

image

image    image

And rules for propagating errors through let:

image

The operational semantics for Fraud is then defined as a binary relation image, which says that (e,i) in image, only when e evaluates to i in the empty environment according to image:

image

The interpreter closely mirrors the semantics. The top-level interp function relies on a helper function interp-env that takes an expression and environment and computes the result. It is defined by structural recursion on the expression. Environments are represented as lists of associations between variables and integers. There are two helper functions for ext and lookup:

fraud/interp.rkt

  #lang racket
  (provide (all-defined-out))
   
  ;; type REnv = (Listof (List Variable Value))
   
  ;; Expr -> Answer
  (define (interp e)
    (interp-env e '()))
   
  ;; Expr REnv -> Answer
  (define (interp-env e r)
    (match e
      [(? value? v) v]
      [(list (? prim? p) e)
       (let ((a (interp-env e r)))
         (interp-prim p a))]
      [`(if ,e0 ,e1 ,e2)
       (match (interp-env e0 r)
         ['err 'err]
         [v
          (if v
              (interp-env e1 r)
              (interp-env e2 r))])]    
      [(? symbol? x)
       (lookup r x)]
      [`(let ((,x ,e0)) ,e1)
       (match (interp-env e0 r)
         ['err 'err]
         [v
          (interp-env e1 (ext r x v))])]))
   
  ;; Any -> Boolean
  (define (prim? x)
    (and (symbol? x)
         (memq x '(add1 sub1 zero?))))
   
  ;; Any -> Boolean
  (define (value? x)
    (or (integer? x)
        (boolean? x)))
   
  ;; Prim Answer -> Answer
  (define (interp-prim p a)
    (match (list p a)
      [(list p 'err) 'err]
      [(list 'add1 (? integer? i0)) (+ i0 1)]
      [(list 'sub1 (? integer? i0)) (- i0 1)]
      [(list 'zero? (? integer? i0)) (zero? i0)]
      [_ 'err]))
   
  ;; REnv Variable -> Answer
  (define (lookup env x)
    (match env
      ['() 'err]
      [(cons (list y v) env)
       (match (symbol=? x y)
         [#t v]
         [#f (lookup env x)])]))
   
  ;; REnv Variable Value -> Value
  (define (ext r x v)
    (cons (list x v) r))
   

We can confirm the interpreter computes the right result for the examples given earlier:

Examples

> (interp 'x)

'err

> (interp '(let ((x 7)) x))

7

> (interp '(let ((x 7)) 2))

2

> (interp '(let ((x 7)) (add1 x)))

8

> (interp '(let ((x (add1 7))) x))

8

> (interp '(let ((x 7)) (let ((y 2)) x)))

7

> (interp '(let ((x 7)) (let ((x 2)) x)))

2

> (interp '(let ((x (add1 x))) x))

'err

> (interp '(let ((x 7)) (let ((x (add1 x))) x)))

8

Interpreter Correctness: For all Fraud expressions e and values v, if (e,v) in image, then (interp e) equals v.

8.3 Lexical Addressing

Just as we did with Dupe: a duplicity of types, the best way of understanding the forthcoming compiler is to write a “low-level” interpreter that explains some of the ideas used in the compiler without getting bogged down in code generation details.

At first glance at interp, it would seem we will need to generate code for implementing the REnv data structure and its associated operations: lookup and ext. REnv is an inductively defined data type, represented in the interpreter as a list of lists. Interpreting a variable involves recursively scanning the environment until the appropriate binding is found. This would take some doing to accomplish in x86.

However...

It is worth noting some invariants about environments created during the running of interp. Consider some subexpression e of the program. What environment will be used whenever e is interpreted? Well, it will be a mapping of every variable bound outside of e. It’s not so easy to figure out what these variables will be bound to, but the skeleton of the environment can be read off from the program structure.

For example:

(let ((x ...))
  (let ((y ...))
    (let ((z ...))
      e)))

The subexpression e will always be evaluated in an environment that looks like:

'((z ...) (y ...) (x ...))

Moreover, every free occurrence of x in e will resolve to the value in the third element of the environment; every free occurrence of y in e will resolve to the second element; etc.

This suggests that variable locations can be resolved statically using lexical addresses. The lexical address of a variable is the number of let-bindings between the variable occurrence and the let that binds it.

So for example:

We can view a variable name as just a placeholder for the lexical address; it tells us which binder the variable comes from.

Using this idea, let’s build an alternative interpreter that operates over an intermediate form of expression that has no notion of variables, but just lexical addresses:

; type IExpr =
; | Integer
; | Boolean
; | (address ,Natural)
; | (add1 ,Expr)
; | (sub1 ,Expr)
; | (zero? ,Expr)
; | (if ,Expr ,Expr ,Expr)
; | (let ((_ ,Expr)) ,Expr)

Notice that variables have gone away, replaced by a `(address ,Natural) form. The let binding form no longer binds a variable name either.

The idea is that we will translate expression (Expr) like:

(let ((x ...)) x)

into intermediate expressions (IExpr) like:

(let ((_ ...)) (address 0))

And:

(let ((x ...)) (let ((y ...)) x))

into:

(let ((_ ...)) (let ((_ ...)) (address 1)))

Similar to how interp is defined in terms of a helper function that takes an environment mapping variables to their value, the translate function will be defined in terms of a helper function that takes an environment mapping variables to their lexical address.

The lexical environment will just be a list of variable names. The address of a variable occurrence is count of variable names that occur before it in the list. When a variable is bound (via-let) the list grows:

fraud/translate.rkt

  #lang racket
  (provide (all-defined-out))
  (require (only-in "interp.rkt" prim? value? interp-prim))
   
  ;; type LEnv = (Listof Variable)
   
  ;; Expr -> IExpr
  (define (translate e)
    (translate-e e '()))
   
  ;; Expr LEnv -> IExpr
  (define (translate-e e r)
    (match e
      [(? value? v) v]
      [(list (? prim? p) e)
       (list p (translate-e e r))]
      [`(if ,e0 ,e1 ,e2)
       `(if ,(translate-e e0)
            ,(translate-e e1)
            ,(translate-e e2))]
      [(? symbol? x)
       `(address ,(lexical-address x r))]
      [`(let ((,x ,e0)) ,e1)
       `(let ((_ ,(translate-e e0 r)))
         ,(translate-e e1 (cons x r)))]))
   
  ;; Variable LEnv -> Natural
  (define (lexical-address x r)
    (match r
      ['() (error "unbound variable")]
      [(cons y r)
       (match (symbol=? x y)
         [#t 0]
         [#f (add1 (lexical-address x r))])]))
   

Notice that translate is a kind of mini-compiler that compiles Exprs to IExprs. It’s only job is to eliminate variable names by replacing variable occurrences with their lexical addresses. It does a minor amount of syntax checking while it’s at it by raising a (compile-time) error in the case of unbound variables.

The interpreter for IExprs will still have an environment data structure, however it will be simpler the association list we started with. The run-time environment will consist only of a list of values; the lexical address of (what used to be a) variable indicates the position in this list. When a value is bound by a let, the list grows:

fraud/interp-lexical.rkt

  #lang racket
  (provide (all-defined-out))
  (require (only-in "interp.rkt" prim? value? interp-prim)
           "translate.rkt")
   
  ;; type VEnv = (Listof Value)
   
  ;; Expr -> Answer
  (define (interp e)
    (interp-env (translate e) '()))
   
  ;; IExpr VEnv -> Answer
  (define (interp-env e r)
    (match e
      [(? value? v) v]
      [(list (? prim? p) e)
       (let ((a (interp-env e r)))
         (interp-prim p a))]
      [`(if ,e0 ,e1 ,e2)
       (match (interp-env e0 r)
         ['err 'err]
         [v
          (if v
              (interp-env e1 r)
              (interp-env e2 r))])]
      [`(address ,i)
       (list-ref r i)]
      [`(let ((_ ,e0)) ,e1)
       (match (interp-env e0 r)
         ['err 'err]
         [v
          (interp-env e1 (cons v r))])]))
   

Try to convince yourself that the two version of interp compute the same function.

8.4 An Example of Fraud compilation

Suppose we want to compile (let ((x 7)) (add1 x)). There are two new forms we need to compile: the (let ((x ...)) ...) part and the x part in the body.

We already know how to compile the (add1 ...) part and the 7 part.

What needs to happen? Compiling the 7 part will emit instructions that, when run, leave 7 in the 'rax register. Compiling the (add1 ...) part relies on the result of evaluating it’s subexpression to be in 'rax when it increments it. So, compile the variable binding needs to stash the 7 somewhere and compiling the variable occurrence needs to retrieve that stashed value. After the let expression has been run, the stashed value should go away since the variable is no longer in scope.

This “stashing” of values follows a stack discipline. When entering a let, after the right-hand side has been run, the result should be pushed. When evaluating a variable occurrence, the bound value is on the stack. After exiting the let, the stack can be popped.

Suppose we want to compile (let ((x 7)) (let ((y 2)) (add1 x))). Using the intuition developed so far, we should push 7, push 8, and then run the body. But notice that the value of 'x is no longer on the top of the stack; y is. So to retrieve the value of x we need jump past the y. But calculating these offsets is pretty straightforward. In this example there is one binding between the binding of x and this occurrence. Since we push every time we enter a let and pop every time we leave, the number of bindings between an occurrence and its binder is exactly the offset from the top of the stack we need use.

fraud/asm/ast.rkt

#lang racket
;; type Arg =
;; ...
;; | `(offset ,Reg ,Integer)
 
;; type Reg =
;; ...
;; | `rsp

fraud/compile.rkt

  #lang racket
  (provide (all-defined-out))
   
  ;; type CEnv = [Listof Variable]
   
  ;; Expr -> Asm
  (define (compile e)
    `(entry
      ,@(compile-e e '())
      ret
      err
      (push rbp)
      (call error)
      ret))
   
  ;; Expr CEnv -> Asm
  (define (compile-e e c)
    (match e
      [(? integer? i)
       `((mov rax ,(* i 2)))]
      [(? boolean? b)
       `((mov rax ,(if b #b11 #b01)))]
      [`(add1 ,e0)
       (let ((c0 (compile-e e0 c)))
         `(,@c0
           ,@assert-integer
           (add rax 2)))]
      [`(sub1 ,e0)
       (let ((c0 (compile-e e0 c)))
         `(,@c0
           ,@assert-integer
           (sub rax 2)))]
      [`(zero? ,e0)
       (let ((c0 (compile-e e0 c))
             (l0 (gensym))
             (l1 (gensym)))
         `(,@c0
           ,@assert-integer
           (cmp rax 0)
           (mov rax #b01) ; #f
           (jne ,l0)
           (mov rax #b11) ; #t
           ,l0))]
      [`(if ,e0 ,e1 ,e2)
       (let ((c0 (compile-e e0 c))
             (c1 (compile-e e1 c))
             (c2 (compile-e e2 c))
             (l0 (gensym))
             (l1 (gensym)))
         `(,@c0
           (cmp rax #b01) ; compare to #f
           (je ,l0)       ; jump to c2 if #f
           ,@c1
           (jmp ,l1)      ; jump past c2
           ,l0
           ,@c2
           ,l1))]
      [(? symbol? x)
       (let ((i (lookup x c)))
         `((mov rax (offset rsp ,(- (add1 i))))))]
      [`(let ((,x ,e0)) ,e1)
       (let ((c0 (compile-e e0 c))
             (c1 (compile-e e1 (cons x c))))
         `(,@c0
           (mov (offset rsp ,(- (add1 (length c)))) rax)
           ,@c1))]))
   
  ;; Variable CEnv -> Natural
  (define (lookup x cenv)
    (match cenv
      ['() (error "undefined variable:" x)]
      [(cons y cenv)
       (match (symbol=? x y)
         [#t (length cenv)]
         [#f (lookup x cenv)])]))
   
  (define assert-integer
    `((mov rbx rax)
      (and rbx 1)
      (cmp rbx 0)
      (jne err)))
   

Examples

> (asm-display (compile 'x))

undefined variable: x

> (asm-display (compile '(let ((x 7)) x)))

global entry

extern error

section .text

entry:

mov rax, 14

mov [rsp + -8], rax

mov rax, [rsp + -8]

ret

err:

push rbp

call error

ret

> (asm-display (compile '(let ((x 7)) 2)))

global entry

extern error

section .text

entry:

mov rax, 14

mov [rsp + -8], rax

mov rax, 4

ret

err:

push rbp

call error

ret

> (asm-display (compile '(let ((x 7)) (add1 x))))

global entry

extern error

section .text

entry:

mov rax, 14

mov [rsp + -8], rax

mov rax, [rsp + -8]

mov rbx, rax

and rbx, 1

cmp rbx, 0

jne err

add rax, 2

ret

err:

push rbp

call error

ret

> (asm-display (compile '(let ((x (add1 7))) x)))

global entry

extern error

section .text

entry:

mov rax, 14

mov rbx, rax

and rbx, 1

cmp rbx, 0

jne err

add rax, 2

mov [rsp + -8], rax

mov rax, [rsp + -8]

ret

err:

push rbp

call error

ret

> (asm-display (compile '(let ((x 7)) (let ((y 2)) x))))

global entry

extern error

section .text

entry:

mov rax, 14

mov [rsp + -8], rax

mov rax, 4

mov [rsp + -16], rax

mov rax, [rsp + -8]

ret

err:

push rbp

call error

ret

> (asm-display (compile '(let ((x 7)) (let ((x 2)) x))))

global entry

extern error

section .text

entry:

mov rax, 14

mov [rsp + -8], rax

mov rax, 4

mov [rsp + -16], rax

mov rax, [rsp + -16]

ret

err:

push rbp

call error

ret

> (asm-display (compile '(let ((x (add1 x))) x)))

undefined variable: x

> (asm-display (compile '(let ((x 7)) (let ((x (add1 x))) x))))

global entry

extern error

section .text

entry:

mov rax, 14

mov [rsp + -8], rax

mov rax, [rsp + -8]

mov rbx, rax

and rbx, 1

cmp rbx, 0

jne err

add rax, 2

mov [rsp + -16], rax

mov rax, [rsp + -16]

ret

err:

push rbp

call error

ret

And running the examples:

Examples

> (asm-interp (compile 'x))

undefined variable: x

> (asm-interp (compile '(let ((x 7)) x)))

7

> (asm-interp (compile '(let ((x 7)) 2)))

2

> (asm-interp (compile '(let ((x 7)) (add1 x))))

8

> (asm-interp (compile '(let ((x (add1 7))) x)))

8

> (asm-interp (compile '(let ((x 7)) (let ((y 2)) x))))

7

> (asm-interp (compile '(let ((x 7)) (let ((x 2)) x))))

2

> (asm-interp (compile '(let ((x (add1 x))) x)))

undefined variable: x

> (asm-interp (compile '(let ((x 7)) (let ((x (add1 x))) x))))

8