On this page:
10.1 Meaning of Grift programs
10.2 A Compile for Grift
7.4

10 Grift: binary operations

If you have to eat two frogs, eat the ugliest one first.

You may have noticed that up until this point, evaluating compound expressions in our language always depend upon the result of a single subexpression. For example, (add1 e) depends upon the result of e, (zero? e) depends upon e, and so on. Even expressions that involve multiple subexpressions such as (if e0 e1 e2) really only depende on e0 to determine which of e1 or e2 to evaluate.

Let’s now consider what happens when we have multiple subexpressions whose results must be combined in order to evaluate an expression. As an example, consider (+ e0 e1). We must evaluate both e0 and e1 and sum their results.

We’ll call this language Grift.

What’s new are the following binary operations:

(+ e0 e1)
(- e0 e1)

This leads to the following grammar for Grift:

image

We can model it as a datatype as usual:

grift/ast.rkt

  #lang racket
  ;; type Expr =
  ;; | Integer
  ;; | Boolean
  ;; | Variable
  ;; | (list Prim1 Expr)
  ;; | (list Prim2 Expr Expr)
  ;; | `(if ,Expr ,Expr ,Expr)
  ;; | `(let ((,Variable ,Expr)) ,Expr)
   
  ;; type Prim1 = 'add1 | 'sub1 | 'zero?
  ;; type Prim2 = '+ | '-
   
  ;; type Variable = Symbol (except 'add1 'sub1 'if, etc.)
   
10.1 Meaning of Grift programs

The meaning of Grift programs is pretty straightfoward. For (+ e0 e1), the meaning is the sum of the meanings of e0 and e1, when they mean integers, otherwise the meaning is an error.

The handling of primitives occurs in the following rule:

image

It makes use of an auxiliary judgment for interpreting primitives:

image

The interpreter is likewise straightforward:

grift/interp.rkt

  #lang racket
  (provide (all-defined-out))
   
  ;; Expr -> Answer
  (define (interp e)
    (interp-env e '()))
   
  ;; Expr REnv -> Integer
  (define (interp-env e r)
    (match e
      [(? value? v) v]
      [(list (? prim? p) es ...)
       (let ((as (interp-env* es r)))
         (interp-prim p as))]
      [`(if ,e0 ,e1 ,e2)
       (match (interp-env e0 r)
         ['err 'err]
         [v
          (if v
              (interp-env e1 r)
              (interp-env e2 r))])]    
      [(? symbol? x)
       (lookup r x)]
      [`(let ((,x ,e0)) ,e1)
       (match (interp-env e0 r)
         ['err 'err]
         [v
          (interp-env e1 (ext r x v))])]))
   
  ;; [Listof Expr] REnv -> [Listof Answer]
  (define (interp-env* es r)
    (match es
      ['() '()]
      [(cons e es)
       (cons (interp-env e r)
             (interp-env* es r))]))
   
  ;; Any -> Boolean
  (define (prim? x)
    (and (symbol? x)
         (memq x '(add1 sub1 + - zero?))))
   
  ;; Any -> Boolean
  (define (value? x)
    (or (integer? x)
        (boolean? x)))
   
  ;; Prim [Listof Answer] -> Answer
  (define (interp-prim p as)
    (match (cons p as)
      [(list p (? value?) ... 'err _ ...) 'err]
      [(list 'add1 (? integer? i0)) (+ i0 1)]
      [(list 'sub1 (? integer? i0)) (- i0 1)]
      [(list 'zero? (? integer? i0)) (zero? i0)]
      [(list '+ (? integer? i0) (? integer? i1)) (+ i0 i1)]
      [(list '- (? integer? i0) (? integer? i1)) (- i0 i1)]
      [_ 'err]))
   
  ;; Env Variable -> Answer
  (define (lookup env x)
    (match env
      ['() 'err]
      [(cons (list y i) env)
       (match (symbol=? x y)
         [#t i]
         [#f (lookup env x)])]))
   
  ;; Env Variable Value -> Value
  (define (ext r x i)
    (cons (list x i) r))
   

We can see that it works as expected:

Examples

> (interp '(+ 3 4))

7

> (interp '(+ 3 (+ 2 2)))

7

> (interp '(+ #f 8))

'err

10.2 A Compile for Grift

Binary expressions are easy to deal with at the level of the semantics and interpreter. However things are more complicated at the level of the compiler.

To see the problem consider blindly following the pattern we used (and ignoring type errors for the moment):

; Expr Expr CEnv -> Asm
(define (compile-+ e0 e1 c)
  (let ((c0 (compile-e e0 c))
        (c1 (compile-e e1 c)))
    `(,@c0 ; result in rax
      ,@c1 ; result in rax
      (add rax ???))))

The problem here is that executing c0 places its result in register 'rax, but then executing c1 places its result in 'rax, overwriting the value of e0.

It may be tempting to use another register to stash away the result of the first subexpression:

; Expr Expr CEnv -> Asm
(define (compile-+ e0 e1 c)
  (let ((c0 (compile-e e0 c))
        (c1 (compile-e e1 c)))
    `(,@c0 ; result in rax
      (mov rbx rax)
      ,@c1 ; result in rax
      (add rax rbx))))

Can you think of how this could go wrong?

To come up with a general solution to this problem, we need to save the result of e0 and then retrieve it after computing e1 and it’s time to sum.

Note that this issue only comes up when e0 is a serious expression, i.e. an expression that must do some computation. If e0 were a literal integer or a variable, we could emit working code. For example:

; Integer Expr CEnv -> Asm
; A special case for compiling (+ i0 e1)
(define (compile-+-int i0 e1 c)
  (let ((c1 (compile-e e1 c)))
    `(,@c1 ; result in rax
      (add rax ,(arithmetic-shift i0 imm-shift)))))
 
; Variable Expr CEnv -> Asm
; A special case for compiling (+ x0 e1)
(define (compile-+-var x0 e1)
  (let ((c1 (compile-e e1 c))
        (i (lookup x0 c)))
    `(,@c1
      (add rax (offset rsp ,(- (add1 i)))))))

The latter suggests a general solution could be to transform binary primitive applications into a let form that binds the first subexpression to a variable and then uses the compile-+-var function above. The idea is that every time the compiler encounters (+ e0 e1), we transform it to (let ((x e0)) (+ x e1)). For this to work out, x needs to be some variable that doesn’t appear free in e1. This transfomration is what’s called ANF (adminitrative normal form) and is a widely used intermediate representation for compilers.

But, we can also solve the problem more directly by considering the code that is generated for the ANF style expression above.

Consider the lexical address of x in the transformed code above. It is always 0 becuase the transformation puts the let immediately around the occurrence of x. So if we’re compiling (+ e0 e1) in environment c using this approach, we know the value of e0 will live at `(offset rsp ,(- (add1 (length c)))). There’s no need for a let binding or a fresh variable name. And this observation enables us to write a general purpose compiler for binary primitives that doesn’t require any program transformation: we simply push the value of e0 on the top of the stack and retrieve it later.

Here is a first cut:

; Expr Expr CEnv -> Asm
(define (compile-+ e0 e1 c)
  (let ((x (gensym))) ; generate a fresh variable
    (let ((c0 (compile-e e0 c))
          (c1 (compile-e e1 (cons x c))))
      `(,@c0
        (mov (offset rsp ,(add1 (- (length c)))) rax)
        ,@c1
        (add rax (offset rsp ,(- (add1 (lookup x (cons x c))))))))))

There are a couple things to notice. First: the (lookup x (cons x c)) just produces (length c). Second, when compiling e1 in environment (cons x c), we know that no variable in e1 resolves to x because x is a freshly gensym’d symbol. Putting (an unreferenced) x in the environment serves only to “bump up” by one the offset of any variable bound after x so as to not override the spot where e0’s values lives. We can acomplish the same thing by sticking in something that no variable is equal to: #f:

; Expr Expr CEnv -> Asm
(define (compile-+ e0 e1 c)
  (let ((c0 (compile-e e0 c))
        (c1 (compile-e e1 (cons x c))))
    `(,@c0
      (mov (offset rsp ,(add1 (- (length c)))) rax)
      ,@c1
      (add rax (offset rsp ,(- (add1 (length c))))))))

The commplete code for the compiler is:

grift/compile.rkt

  #lang racket
   
  (define imm-shift        3)
  (define imm-type-mask    (sub1 (arithmetic-shift 1 imm-shift)))
  (define imm-type-int     #b000)
  (define imm-type-true    #b001)
  (define imm-type-false   #b010)
  (define imm-type-empty   #b011)
  (define imm-type-char    #b100)
   
  ;; Expr -> Asm
  (define (compile e)
    `(entry
      ,@(compile-e e '())
      ret
      err
      (push rbp)
      (call error)
      ret))
   
  ;; Expr CEnv -> Asm
  (define (compile-e e c)
    (match e
      [(? integer? i)        (compile-integer i)]
      [(? boolean? b)        (compile-boolean b)]
      [(? symbol? x)         (compile-variable x c)]    
      [`(add1 ,e0)           (compile-add1 e0 c)]
      [`(sub1 ,e0)           (compile-sub1 e0 c)]
      [`(zero? ,e0)          (compile-zero? e0 c)]
      [`(if ,e0 ,e1 ,e2)     (compile-if e0 e1 e2 c)]
      [`(let ((,x ,e0)) ,e1) (compile-let x e0 e1 c)]
      [`(+ ,e0 ,e1)          (compile-+ e0 e1 c)]
      [`(- ,e0 ,e1)          (compile-- e0 e1 c)]))
   
  ;; Integer -> Asm
  (define (compile-integer i)
    `((mov rax ,(arithmetic-shift i imm-shift))))
   
  ;; Boolean -> Asm
  (define (compile-boolean b)
    `((mov rax ,(if b imm-type-true imm-type-false))))
   
  ;; Expr CEnv -> Asm
  (define (compile-add1 e0 c)
    (let ((c0 (compile-e e0 c)))
      `(,@c0
        ,@assert-integer
        (add rax ,(arithmetic-shift 1 imm-shift)))))
   
  ;; Expr CEnv -> Asm
  (define (compile-sub1 e0 c)
    (let ((c0 (compile-e e0 c)))
      `(,@c0
        ,@assert-integer
        (sub rax ,(arithmetic-shift 1 imm-shift)))))
   
  ;; Expr CEnv -> Asm
  (define (compile-zero? e0 c)
    (let ((c0 (compile-e e0 c))
          (l0 (gensym))
          (l1 (gensym)))
      `(,@c0
        ,@assert-integer
        (cmp rax 0)
        (mov rax ,imm-type-false)
        (jne ,l0)
        (mov rax ,imm-type-true)
        ,l0)))
   
  ;; Expr Expr Expr CEnv -> Asm
  (define (compile-if e0 e1 e2 c)
    (let ((c0 (compile-e e0 c))
          (c1 (compile-e e1 c))
          (c2 (compile-e e2 c))
          (l0 (gensym))
          (l1 (gensym)))
      `(,@c0
        (cmp rax ,imm-type-false)
        (je ,l0)       ; jump to c2 if #f
        ,@c1
        (jmp ,l1)      ; jump past c2
        ,l0
        ,@c2
        ,l1)))
   
  ;; Variable CEnv -> Asm
  (define (compile-variable x c)
    (let ((i (lookup x c)))
      `((mov rax (offset rsp ,(- (add1 i)))))))
   
  ;; Variable Expr Expr CEnv -> Asm
  (define (compile-let x e0 e1 c)
    (let ((c0 (compile-e e0 c))
          (c1 (compile-e e1 (cons x c))))
      `(,@c0
        (mov (offset rsp ,(- (add1 (length c)))) rax)
        ,@c1)))
   
  ;; Expr Expr CEnv -> Asm
  (define (compile-+ e0 e1 c)
    (let ((c1 (compile-e e1 c))
          (c0 (compile-e e0 (cons #f c))))
      `(,@c1
        ,@assert-integer
        (mov (offset rsp ,(- (add1 (length c)))) rax)
        ,@c0
        ,@assert-integer
        (add rax (offset rsp ,(- (add1 (length c))))))))
   
  ;; Expr Expr CEnv -> Asm
  (define (compile-- e0 e1 c)
    (let ((c1 (compile-e e1 c))
          (c0 (compile-e e0 (cons #f c))))
      `(,@c1
        ,@assert-integer
        (mov (offset rsp ,(- (add1 (length c)))) rax)
        ,@c0
        ,@assert-integer
        (sub rax (offset rsp ,(- (add1 (length c))))))))
   
  ;; Variable CEnv -> Natural
  (define (lookup x cenv)
    (match cenv
      ['() (error "undefined variable:" x)]
      [(cons y cenv)
       (match (eq? x y)
         [#t (length cenv)]
         [#f (lookup x cenv)])]))
   
  (define assert-integer
    `((mov rbx rax)
      (and rbx ,imm-type-mask)
      (cmp rbx ,imm-type-int)
      (jne err)))