13 Iniquity: function definitions and calls
13.1 Functions
Our programming languages so far have been impoverished in the following sense: in order to process arbitrarily large data, the programs themselves must be proportionally as large. Want to compute something over a billion element list? You’ll need a billion expressions. Consequently, the expressiveness of our language is severely restricted.
Let’s now remove that restriction by incorporating functions, and in particular, recursive functions, which will allow us to compute over arbitrarily large data with finite-sized programs.
Let’s call it Iniquity.
We will extend the syntax by introducing a new syntactic category of programs, which have the shape:
(begin (define (f0 x0 ...) e0) (define (f1 x1 ...) e1) ... e)
And the syntax of expressions will be extended to include function calls:
(fi e0 ...)
where fi is one of the function names defined in the program.
Note that functions can have any number of parameters and, symmetrically, calls can have any number of arguments. A program consists of zero or more function definitions followed by an expression.
13.2 An Interpreter for Functions
Writing an interpreter for Inquity is not too hard. The main idea is that the interpretation of expression is now parameterized by a set of function definitions from the program. It serves as a second kind of environment that gets passed around and is used to resolve function definitions when interpreting function calls.
The way a function call is interpreted is to first interpret all of the arguments, building up a list of results. Then the definition of the function being called is looked up. If the function has the same number of parameters as there are arguments in the call, the body of the function is interpreted in an enviorment that maps each parameter to to the corresponding argument. That’s it.
#lang racket (provide (all-defined-out)) (require "ast.rkt") ;; type Prog = ;; | `(begin ,@(Listof Defn) ,Expr) ;; | Expr ;; type Defn = `(define (,Variable ,@(Listof Variable)) ,Expr) ;; Prog -> Answer (define (interp p) (match p [(prog ds e) (interp-env e '() ds)])) ;; Expr REnv (Listof Defn) -> Answer (define (interp-env e r ds) (match e [(var-e v) (lookup r v)] [(int-e i) i] [(bool-e b) b] [(nil-e) '()] [(prim-e (? prim? p) es) (let ((as (interp-env* es r ds))) (interp-prim p as))] [(if-e p e1 e2) (match (interp-env p r ds) ['err 'err] [v (if v (interp-env e1 r ds) (interp-env e2 r ds))])] [(let-e (list (binding x def)) body) (match (interp-env def r ds) ['err 'err] [v (interp-env body (ext r x v) ds)])] [(app-e f es) (match (interp-env* es r ds) [(list vs ...) (match (defns-lookup ds f) [(fundef f xs body) ; check arity matches (if (= (length xs) (length vs)) (interp-env body (zip xs vs) ds) 'err)])] [_ 'err])])) ;; (Listof Defn) Symbol -> Defn (define (defns-lookup ds f) (findf (match-lambda [(fundef g _ _) (eq? f g)]) ds)) ;; (Listof Expr) REnv -> (Listof Value) | 'err (define (interp-env* es r ds) (match es ['() '()] [(cons e es) (match (interp-env e r ds) ['err 'err] [v (cons v (interp-env* es r ds))])])) ;; Any -> Boolean (define (prim? x) (and (symbol? x) (memq x '(add1 sub1 + - zero? box unbox empty? cons car cdr)))) ;; Any -> Boolean (define (value? x) (or (integer? x) (boolean? x) (null? x) (and (pair? x) (value? (car x)) (value? (cdr x))))) ;; Prim (Listof Value) -> Answer (define (interp-prim p vs) (match (cons p vs) [(list 'add1 (? integer? i0)) (add1 i0)] [(list 'sub1 (? integer? i0)) (sub1 i0)] [(list 'zero? (? integer? i0)) (zero? i0)] [(list 'box v0) (box v0)] [(list 'unbox (? box? v0)) (unbox v0)] [(list 'empty? v0) (empty? v0)] [(list 'cons v0 v1) (cons v0 v1)] [(list 'car (cons v0 v1)) v0] [(list 'cdr (cons v0 v1)) v1] [(list '+ (? integer? i0) (? integer? i1)) (+ i0 i1)] [(list '- (? integer? i0) (? integer? i1)) (- i0 i1)] [_ 'err])) ;; Env Variable -> Answer (define (lookup env x) (match env ['() 'err] [(cons (list y i) env) (match (symbol=? x y) [#t i] [#f (lookup env x)])])) ;; Env Variable Value -> Value (define (ext r x i) (cons (list x i) r)) (define (zip xs ys) (match* (xs ys) [('() '()) '()] [((cons x xs) (cons y ys)) (cons (list x y) (zip xs ys))]))
A couple of things to note:
since the function definition environment is passed along even when interpreting the body of function definitions, this interpretation supports recursion, and even mutual recursion.
functions are not values (yet). We cannot bind a variable to a function. We cannot make a list of functions. We cannot compute a function. The first position of a function call is a function name, not an arbitrary expression. Nevertheless, we have significantly increased the expressivity of our language.
We can try it out:
Examples
> (interp (sexpr->prog '(begin (define (double x) (+ x x)) (double 5)))) 10
We can see it works with recursive functions, too. Here’s a recursive function for computing triangular numbers:
Examples
> (interp (sexpr->prog '(begin (define (tri x) (if (zero? x) 0 (+ x (tri (sub1 x))))) (tri 9)))) 45
We can even define mutually recursive functions such as even? and odd?:
Examples
> (interp (sexpr->prog '(begin (define (even? x) (if (zero? x) #t (odd? (sub1 x)))) (define (odd? x) (if (zero? x) #f (even? (sub1 x)))) (even? 101)))) #f
13.3 Compiling a Call
Turning to compilation, let’s start small by supposing we have a single, pre-defined function and we add to the language the ability to call this function.
A function in assembly has an entry point (a label), followed by a sequence of instruction, ending with the ’ret instruction. As a convention, we will pass all arguments to a function on the stack.
So here is Asm representing a single function named double
'(double (mov rax (offset rsp -1)) (add rax rax) ret)
This function takes one argument from the stack, adds it to itself, leaving the result in 'rax when it returns.
The 'ret instruction works in concert with the 'call instruction, which can be given a label, designating which function to call.
So if we wanted to call double with an argument of 5, we’d first need to write 5 in to the approrpriate spot in the stack, then issue the '(call double) instruction.
Since the double code is reading from offset -1 from 'rsp, it is tempting to assume this is where you should write the argument:
'((mov (offset rsp -1) 5) (call double) (add rax 1)) ; rax now holds 11
The problem is here is that the 'call instruction works by modifying the 'rsp register.
Remember how 'rsp points to an “occupied” memory location and we said we just leave whatever is there alone? We can now explain what’s going on.
The 'call instruction advances 'rsp to the next word of memory and writes the location of the instruction that occurs after the 'call instruction. This is a return pointer. It then jumps to the beginning of the instruction sequence after the label that is the argument of 'call. Those instruction execute and when we get to 'ret, the return instruction reads that address stored in '(offset rsp 0), moves 'rsp back one word, and jumps to the instruction pointed to by the return pointer.
pushing an address (where to return) on the stack
jumping to a label
executing some code
poping the return point off the stack and jumping to it
The problem with the function call we wrote above is that we put the argument in '(offset rsp -1), but then the 'call advances (by decrementing) the 'rsp register and writes the return point in '(offset rsp 0), but that’s exactly where we had put the argument!
The solution then, is to put the argument at index -2 from the caller’s perspective. When the call is made, it will be at index -1 from the function’s perspective:
'((mov (offset rsp -2) 5) (call double) (add rax 1)) ; rax now holds 11
Now that we have seen how to make a call and return in assembly, we can tackle code generation for a function call (double e) in our language.
(define (compile-call-double e0 c) (let ((c0 (compile-e e0 c))) `(,@c0 (mov (offset rsp -2) rax) (call double))))
This will work if the program consists only of a call to double, however it doesn’t work in general.
To see the problem, notice how the call code always uses the index -2 for the first argument and index -1 will hold the return pointer when the call is made. But what if those spots are occuppied on the stack!? The problem is that we’ve always calculated stack offsets statically and never mutated 'rsp. But 'call expects 'rsp to be pointing to the top of the stack.
The solution is to emit code that will adjust 'rsp to the top of (our statically calculated) stack. How much does 'rsp need to change? It needs to be decremented by the number of items in the static environment, c. We can adjust 'rsp, make the call, but after the call returns, we can adjust 'rsp back to where it was before the call.
The code is:
; Expr CEnv -> Asm (define (compile-call-double e0 c) (let ((c0 (compile-e e0 c)) (h (* 8 (length c)))) `(,@c0 (sub rsp ,h) (mov (offset rsp -2) rax) ; place result of e0 in stack (call double) (add rsp ,h))))
This makes calls work in any stack context.
It’s easy to generalize this code to call any given function name:
; Variable Expr CEnv -> Asm (define (compile-call f e0 c) (let ((c0 (compile-e e0 c)) (h (* 8 (length c)))) `(,@c0 (sub rsp ,h) (mov (offset rsp -2) rax) (call ,f) (add rsp ,h))))
If we want accept any number of arguments, we have to do a little more work.
We rely on the following helpful function for compiling a list of expressions and saving the results on the stack:
; (Listof Expr) CEnv -> Asm (define (compile-es es c) (match es ['() '()] [(cons e es) (let ((c0 (compile-e e c)) (cs (compile-es es (cons #f c)))) `(,@c0 (mov (offset rsp ,(- (add1 (length c)))) rax) ,@cs))]))
So to compile a call with any number of arguments:
; Variable (Listof Expr) CEnv -> Asm (define (compile-call f es c) (let ((cs (compile-es es (cons #f c))) (h (* 8 (length c)))) `(,@cs (sub rsp ,h) (call ,f) (add rsp ,h))))
Notice that we call compile-es in an extended static environment, that has one addition slot used. This will bump the location of all the argument results by one, leaving the first slot available for the return pointer!
13.4 Compiling a Function Definition
Now that we can compile calls, we just need to be able to compile function definitions such as:
(define (double x) (+ x x))
The idea here is pretty simple. The compiler needs to emit a label for the function, such as 'double, followed by the instructions for the body of the function.
The body of the function has a single free variable, x. We can compile the expression in a static environement '(x) so that it resolves this variable to the first position on the stack, which, thanks to the code we emit for calls, will hold the argument value.
After the instructions for the body, a 'ret instruction is emitted so that control transfers back to the caller.
So the code for compiling a function definition is:
; Variable Variable Expr -> Asm (define (compile-define f x e0) (let ((c0 (compile-e e0 (list x)))) `(,f ,@c0 ret)))
What about functions that take zero or more arguments? That’s easy, just compile the body in an appropriate static environment.
; Variable (Listof Variable) Expr -> Asm (define (compile-define f xs e0) (let ((c0 (compile-e e0 (reverse xs)))) `(,f ,@c0 ret)))
(Note that we reverse the parameter list due to the order in which arguments are added to the stack.)
13.5 On Names and Labels
There is one final wrinkle, which is that identifiers in our language include many things which are not valid labels for the Nasm assembler. Hence compiling a function like:
(define (^weird% x) x)
will cause the assembler to reject the emitted code since '^weird% is not a valid label name. Labels must consist only of letters, numbers, _, $, ?, @, ~, and ?.
We solve this problem by using a function that maps arbitrary Racket symbols to valid Nasm labels (represented as symbols). The function has the property distinct symbols always map to distinct labels.
Using this function, we can touch up our code:
; Variable (Listof Expr) CEnv -> Asm (define (compile-call f es c) (let ((cs (compile-es es (cons #f c))) (h (* 8 (length c)))) `(,@cs (sub rsp ,h) (call ,(symbol->label f)) (add rsp ,h)))) ; Variable (Listof Variable) Expr -> Asm (define (compile-define f xs e0) (let ((c0 (compile-e e0 (reverse xs)))) `(,(symbol->label f) ,@c0 ret)))
13.6 A Compiler for Iniquity
The last piece of the puzzle is the function for emitting code for a complete program:
; Prog -> Asm (define (compile p) (match p [(list 'begin `(define (,fs . ,xss) ,es) ... e0) (let ((ds (compile-defines fs xss es)) (c0 (compile-l e0))) `(,@c0 ,@ds))] [e (compile-l e)]))
It relies on a helper compile-defines for compiling each function definition and flattening the assembly instructions into a single list:
; (Listof Variable) (Listof (Listof Variable)) (Listof Expr) -> Asm (define (compile-defines fs xss es) (append-map compile-define fs xss es))
Here’s an example of the code this compiler emits:
Examples
> (asm-display (compile (sexpr->prog '(begin (define (double x) (+ x x)) (double 5)))))
global entry
default rel
extern error
section .text
entry:
mov rax, 160
mov [rsp + -16], rax
sub rsp, 0
call label_double_bd8f36496bdeb8
add rsp, 0
ret
err:
push rbp
call error
ret
label_double_bd8f36496bdeb8:
mov rax, [rsp + -8]
mov rbx, rax
and rbx, 31
cmp rbx, 0
jne err
mov [rsp + -16], rax
mov rax, [rsp + -8]
mov rbx, rax
and rbx, 31
cmp rbx, 0
jne err
add rax, [rsp + -16]
ret
And we can confirm running the code produces results consistent with the interpreter:
Examples
> (asm-interp (compile (sexpr->prog '(begin (define (double x) (+ x x)) (double 5))))) 10
> (asm-interp (compile (sexpr->prog '(begin (define (tri x) (if (zero? x) 0 (+ x (tri (sub1 x))))) (tri 9))))) 45
> (asm-interp (compile (sexpr->prog '(begin (define (even? x) (if (zero? x) #t (odd? (sub1 x)))) (define (odd? x) (if (zero? x) #f (even? (sub1 x)))) (even? 101))))) #f
The complete compiler code:
#lang racket (provide (all-defined-out)) (require "ast.rkt") ;; An immediate is anything ending in #b000 ;; All other tags in mask #b111 are pointers (define result-shift 3) (define result-type-mask (sub1 (arithmetic-shift 1 result-shift))) (define type-imm #b000) (define type-box #b001) (define type-pair #b010) (define type-string #b011) (define imm-shift (+ 2 result-shift)) (define imm-type-mask (sub1 (arithmetic-shift 1 imm-shift))) (define imm-type-int (arithmetic-shift #b00 result-shift)) (define imm-type-bool (arithmetic-shift #b01 result-shift)) (define imm-type-char (arithmetic-shift #b10 result-shift)) (define imm-type-empty (arithmetic-shift #b11 result-shift)) (define imm-val-false imm-type-bool) (define imm-val-true (bitwise-ior (arithmetic-shift 1 (add1 imm-shift)) imm-type-bool)) ;; Allocate in 64-bit (8-byte) increments, so pointers ;; end in #b000 and we tag with #b001 for boxes, etc. ;; type CEnv = (Listof (Maybe Variable)) ;; type Imm = Integer | Boolean | Char | ''() ;; Prog -> Asm (define (compile p) (match p [(prog defs e) (let ((ds (compile-defines defs)) (c0 (compile-entry e))) `(,@c0 ,@ds))])) ;; Expr -> Asm ;; Compile e as the entry point (define (compile-entry e) `(entry ,@(compile-e e '()) ret err (push rbp) (call error) ret)) ;; Expr CEnv -> Asm (define (compile-e e c) (match e [(? imm? i) (compile-imm i)] [(var-e v) (compile-var v c)] [(prim-e (? prim? p) es) (compile-prim p es c)] [(if-e p t f) (compile-if p t f c)] [(let-e (list b) body) (compile-let b body c)] [(app-e f es) (compile-call f es c)])) (define (compile-prim p es c) (match (cons p es) [`(add1 ,e0) (compile-add1 e0 c)] [`(sub1 ,e0) (compile-sub1 e0 c)] [`(zero? ,e0) (compile-zero? e0 c)] [`(+ ,e0 ,e1) (compile-+ e0 e1 c)] [`(- ,e0 ,e1) (compile-- e0 e1 c)] [`(box ,e0) (compile-box e0 c)] [`(unbox ,e0) (compile-unbox e0 c)] [`(empty? ,e0) (compile-empty? e0 c)] [`(cons ,e0 ,e1) (compile-cons e0 e1 c)] [`(car ,e0) (compile-car e0 c)] [`(cdr ,e0) (compile-cdr e0 c)] [_ (error (format "prim applied to wrong number of args: ~a ~a" p es))])) ;; Variable (Listof Expr) CEnv -> Asm ;; Statically know the function we're calling (define (compile-call f es c) (let ((cs (compile-es es (cons #f c))) (stack-size (* 8 (length c)))) `(,@cs (sub rsp ,stack-size) (call ,(symbol->label f)) (add rsp ,stack-size)))) ;; (Listof Expr) CEnv -> Asm (define (compile-es es c) (match es ['() '()] [(cons e es) (let ((c0 (compile-e e c)) (cs (compile-es es (cons #f c)))) `(,@c0 (mov (offset rsp ,(- (add1 (length c)))) rax) ,@cs))])) ;; Variable (Listof Variable) Expr -> Asm (define (compile-define def) (match def [(fundef name args body) (let ((c0 (compile-e body (reverse args)))) `(,(symbol->label name) ,@c0 ret))])) ;; (Listof Variable) (Listof (Listof Variable)) (Listof Expr) -> Asm (define (compile-defines defs) (append-map compile-define defs)) ;; Any -> Boolean (define (imm? x) (or (int-e? x) (bool-e? x) ;; TODO (char? x) (nil-e? x))) ;; Imm -> Asm (define (compile-imm i) `((mov rax ,(imm->bits i)))) ;; Imm -> Integer (define (imm->bits i) (match i [(int-e i) (arithmetic-shift i imm-shift)] ; TODO [(char-e c) (+ (arithmetic-shift (char->integer c) imm-shift) imm-type-char)] [(bool-e b) (if b imm-val-true imm-val-false)] [(nil-e) imm-type-empty])) ;; Variable CEnv -> Asm (define (compile-var x c) (let ((i (lookup x c))) `((mov rax (offset rsp ,(- (add1 i))))))) ;; Expr CEnv -> Asm (define (compile-box e0 c) (let ((c0 (compile-e e0 c))) `(,@c0 (mov (offset rdi 0) rax) (mov rax rdi) (or rax ,type-box) (add rdi 8)))) ; allocate 8 bytes ;; Expr CEnv -> Asm (define (compile-unbox e0 c) (let ((c0 (compile-e e0 c))) `(,@c0 ,@assert-box (xor rax ,type-box) (mov rax (offset rax 0))))) ;; Expr Expr CEnv -> Asm (define (compile-cons e0 e1 c) (let ((c0 (compile-e e0 c)) (c1 (compile-e e1 (cons #f c)))) `(,@c0 (mov (offset rsp ,(- (add1 (length c)))) rax) ,@c1 (mov (offset rdi 0) rax) (mov rax (offset rsp ,(- (add1 (length c))))) (mov (offset rdi 1) rax) (mov rax rdi) (or rax ,type-pair) (add rdi 16)))) ;; Expr CEnv -> Asm (define (compile-car e0 c) (let ((c0 (compile-e e0 c))) `(,@c0 ,@assert-pair (xor rax ,type-pair) (mov rax (offset rax 1))))) ;; Expr CEnv -> Asm (define (compile-cdr e0 c) (let ((c0 (compile-e e0 c))) `(,@c0 ,@assert-pair (xor rax ,type-pair) (mov rax (offset rax 0))))) ;; Expr CEnv -> Asm (define (compile-empty? e0 c) (let ((c0 (compile-e e0 c)) (l0 (gensym))) `(,@c0 (and rax ,imm-type-mask) (cmp rax ,imm-type-empty) (mov rax ,imm-val-false) (jne ,l0) (mov rax ,imm-val-true) ,l0))) ;; Expr CEnv -> Asm (define (compile-add1 e0 c) (let ((c0 (compile-e e0 c))) `(,@c0 ,@assert-integer (add rax ,(arithmetic-shift 1 imm-shift))))) ;; Expr CEnv -> Asm (define (compile-sub1 e0 c) (let ((c0 (compile-e e0 c))) `(,@c0 ,@assert-integer (sub rax ,(arithmetic-shift 1 imm-shift))))) ;; Expr CEnv -> Asm (define (compile-zero? e0 c) (let ((c0 (compile-e e0 c)) (l0 (gensym)) (l1 (gensym))) `(,@c0 ,@assert-integer (cmp rax 0) (mov rax ,imm-val-false) (jne ,l0) (mov rax ,imm-val-true) ,l0))) ;; Expr Expr Expr CEnv -> Asm (define (compile-if e0 e1 e2 c) (let ((c0 (compile-e e0 c)) (c1 (compile-e e1 c)) (c2 (compile-e e2 c)) (l0 (gensym)) (l1 (gensym))) `(,@c0 (cmp rax ,imm-val-false) (je ,l0) ,@c1 (jmp ,l1) ,l0 ,@c2 ,l1))) ;; Variable Expr Expr CEnv -> Asm (define (compile-let b e1 c) (match b [(binding v def) (let ((c0 (compile-e def c)) (c1 (compile-e e1 (cons v c)))) `(,@c0 (mov (offset rsp ,(- (add1 (length c)))) rax) ,@c1))] [_ (error "Compile-let can only handle bindings")])) ;; Expr Expr CEnv -> Asm (define (compile-+ e0 e1 c) (let ((c1 (compile-e e1 c)) (c0 (compile-e e0 (cons #f c)))) `(,@c1 ,@assert-integer (mov (offset rsp ,(- (add1 (length c)))) rax) ,@c0 ,@assert-integer (add rax (offset rsp ,(- (add1 (length c)))))))) ;; Expr Expr CEnv -> Asm (define (compile-- e0 e1 c) (let ((c1 (compile-e e1 c)) (c0 (compile-e e0 (cons #f c)))) `(,@c1 ,@assert-integer (mov (offset rsp ,(- (add1 (length c)))) rax) ,@c0 ,@assert-integer (sub rax (offset rsp ,(- (add1 (length c)))))))) (define (type-pred->mask p) (match p [(or 'box? 'cons? 'string?) result-type-mask] [_ imm-type-mask])) (define (type-pred->tag p) (match p ['box? type-box] ['cons? type-pair] ['string? type-string] ['integer? imm-type-int] ['empty? imm-type-empty] ['char? imm-type-char] ['boolean? imm-type-bool])) ;; Variable CEnv -> Natural (define (lookup x cenv) (match cenv ['() (error "undefined variable:" x)] [(cons y cenv) (match (eq? x y) [#t (length cenv)] [#f (lookup x cenv)])])) (define (assert-type p) `((mov rbx rax) (and rbx ,(type-pred->mask p)) (cmp rbx ,(type-pred->tag p)) (jne err))) (define assert-integer (assert-type 'integer?)) (define assert-box (assert-type 'box?)) (define assert-pair (assert-type 'cons?)) (define assert-string (assert-type 'string?)) (define assert-char (assert-type 'char?)) ;; Asm (define assert-natural `(,@assert-integer (cmp rax -1) (jle err))) ;; Asm (define assert-integer-codepoint `((mov rbx rax) (and rbx ,imm-type-mask) (cmp rbx 0) (jne err) (cmp rax ,(arithmetic-shift -1 imm-shift)) (jle err) (cmp rax ,(arithmetic-shift #x10FFFF imm-shift)) (mov rbx rax) (sar rbx ,(+ 11 imm-shift)) (cmp rbx #b11011) (je err))) ;; Symbol -> Label ;; Produce a symbol that is a valid Nasm label (define (symbol->label s) (string->symbol (string-append "label_" (list->string (map (λ (c) (if (or (char<=? #\a c #\z) (char<=? #\A c #\Z) (char<=? #\0 c #\9) (memq c '(#\_ #\$ #\# #\@ #\~ #\. #\?))) c #\_)) (string->list (symbol->string s)))) "_" (number->string (eq-hash-code s) 16))))