5 Blackmail: incrementing and decrementing
Let’s Do It Again!
5.1 Refinement, take one
We’ve seen all the essential peices (a grammar, an AST data type definition, an operational semantics, an interpreter, a compiler, etc.) for implementing a programming language, albeit for an amazingly simple language.
We will now, through a process of iterative refinement, grow the language to have an interesting set of features.
Our second language, which subsumes Abscond, is Blackmail. Expressions in Blackmail include integer literals and increment and decrement operations. It’s still a dead simple language, but at least programs do something.
5.2 Abstract syntax for Blackmail
A Blackmail program consists of a single expression, and the grammar of expressions is:
So, 0, 120, and -42 are Blackmail programs, but so are '(add1 0), '(sub1 120), '(add1 (add1 (add1 -42))).
A datatype for representing expressions can be defined as:
; type Expr = ; | Integer ; | ‘(add1 ,Expr) ; | ‘(sub1 ,Expr)
A predicate for recognizing well-formed expressions is more involved than Abscond, but still straightforward:
#lang racket (provide (all-defined-out)) (require "ast.rkt") ;; Any -> Boolean ;; Is x a well-formed expression? (define (expr? x) (match x [(? integer? i) #t] [`(add1 ,x) (expr? x)] [`(sub1 ,x) (expr? x)] [_ #f])) ; SExpr -> AST ; convert the s-expr into our AST ; This should be a one-to-one mapping for now. (define (sexpr->ast s) (match s [(? integer? s) (int-e s)] [`(add1 ,e) (add1-e (sexpr->ast e))] [`(sub1 ,e) (sub1-e (sexpr->ast e))] ['get-int (get-i)] [_ (error "operation not supported")]))
5.3 Meaning of Blackmail programs
The meaning of a Blackmail program depends on the form of the expression:
the meaning of an integer literal is just the integer itself,
the meaning of an increment expression is one more than the meaning of its subexpression, and
the meaning of a decrement expression is one less than the meaning of its subexpression.
The operational semantics reflects this dependence on the form of the expression by having three rules, one for each kind of expression:
The first rule looks familiar; it’s exactly the semantics of integers from Abscond. The second and third rule are more involved. In particular, they have premises above the line. If the premises are true, the conclusion below the line is true as well. These rules are conditional on the premises being true. This is in contrast to the first rule, which applies unconditionally.
For all integers i, (i,i) is in .
For expressions e0 and all integers i0 and i1, if (e0,i0) is in and i1 = i0 + 1, then ('(add1 e0), i1) is in .
For expressions e0 and all integers i0 and i1, if (e0,i0) is in and i1 = i0 - 1, then ('(sub1 e0), i1) is in .
These rules are inductive. We start from the meaning of integers and if we have the meaning of an expression, we can construct the meaning of a larger expression.
This may seem a bit strange at the moment, but it helps to view the semantics through its correspondence with an interpreter, which given an expression e, computes an integer i, such that (e,i) is in .
Just as there are three rules, there will be three cases to the interpreter, one for each form of expression:
#lang racket (provide (all-defined-out)) (require "ast.rkt") (require "primitives.rkt") ;; Expr -> Integer (define (interp e) (match e [(int-e i) i] [(add1-e e1) (add1 (interp e1))] [(sub1-e e1) (sub1 (interp e1))] [(get-i) (get-int)]))
> (interp (sexpr->ast 42)) 42
> (interp (sexpr->ast -7)) -7
> (interp (sexpr->ast '(add1 42))) 43
> (interp (sexpr->ast '(sub1 8))) 7
> (interp (sexpr->ast '(add1 (add1 (add1 8))))) 11
Here’s how to connect the dots between the semantics and interpreter: the interpreter is computing, for a given expression e, the integer i, such that (e,i) is in . The interpreter uses pattern matching to determine the form of the expression, which determines which rule of the semantics applies.
if e is an integer i, then we’re done: this is the right-hand-side of the pair (e,i) in .
if e is an expression '(add1 e0), then we recursively use the interpreter to compute i0 such that (e0,i0) is in . But now we can compute the right-hand-side by adding 1 to i0.
if e is an expression '(sub1 e0), then we recursively use the interpreter to compute i0 such that (e0,i0) is in . But now we can compute the right-hand-side by substracting 1 from i0.
This explaination of the correspondence is essentially a proof (by induction) of the interpreter’s correctness:
Interpreter Correctness: For all Blackmail expressions e and integers i, if (e,i) in , then (interp e) equals i.
5.4 An Example of Blackmail compilation
Just as we did with Abscond, let’s approach writing the compiler by first writing an example.
Suppose we want to compile '(add1 (add1 40)). We already know how to compile the 40: '(mov rax 40). To do the increment (and decrement) we need to know a bit more x86-64. In particular, the add (and sub) instruction is relevant. It increments the contents of a register by some given amount.
Concretely, the program that adds 1 twice to 40 looks like:
The runtime stays exactly the same as before.
shell
5.5 A Compiler for Blackmail
To represent these new instructions, we extend the Asm AST data type:
#lang racket ;; type Instruction = ;; ... ;; | `(add ,Arg ,Arg) ;; | `(sub ,Arg ,Arg)
And correspondingly update the printer:
#lang racket ;; Instruction -> String (define (instr->string i) (match i ... [`(add ,a1 ,a2) (string-append "\tadd " (arg->string a1) ", " (arg->string a2) "\n")] [`(sub ,a1 ,a2) (string-append "\tsub " (arg->string a1) ", " (arg->string a2) "\n")]))
We can now print the assembly of our example from an AST:
Examples
> (asm-display '(entry (mov rax 40) (add rax 1) (add rax 1) ret))
global entry
extern get_int
section .text
entry:
mov rax, 40
add rax, 1
add rax, 1
ret
The compiler consists of two functions: the first, which is given a program, emits the entry point and return instructions, invoking another function to compile the expression:
#lang racket (provide (all-defined-out)) (require "ast.rkt") (require "primitives.rkt") ;; Expr -> Asm (define (compile e) (append '(entry) (compile-e e) '(ret))) ;; Expr -> Asm (define (compile-e e) (match e [(int-e i) `((mov rax ,i))] [(add1-e e1) (let ((c1 (compile-e e1))) ; c1 : [ASM] `(,@c1 (add rax 1)))] [(sub1-e e1) (let ((c1 (compile-e e1))) `(,@c1 (sub rax 1)))] [(get-i) get-int-asm]))
Notice that compile-e is defined by structural recursion, much like the interpreter.
We can now try out a few examples:
Examples
> (compile (sexpr->ast '(add1 (add1 40)))) '(entry (mov rax 40) (add rax 1) (add rax 1) ret)
> (compile (sexpr->ast '(sub1 8))) '(entry (mov rax 8) (sub rax 1) ret)
> (compile (sexpr->ast '(add1 (add1 (sub1 (add1 -8)))))) '(entry (mov rax -8) (add rax 1) (sub rax 1) (add rax 1) (add rax 1) ret)
And give a command line wrapper for parsing, checking, and compiling files:
#lang racket (provide (all-defined-out)) (require "compile.rkt" "syntax.rkt" "asm/printer.rkt") ;; String -> Void ;; Compile contents of given file name, ;; emit asm code on stdout (define (main fn) (with-input-from-file fn (λ () (let ((c (read-line))) ; ignore #lang racket (let ((p (read))) (asm-display (compile (sexpr->ast p))))))))
Here it is in action:
shell
And using the same Makefile setup as in Abscond, we capture the whole compilation process with a single command:
shell
> make add1-add1-40.run make[1]: Entering directory `/home/travis/build/cmsc430/www/www/notes/blackmail' racket -t compile-file.rkt -m add1-add1-40.rkt > add1-add1-40.s nasm -f elf64 -o add1-add1-40.o add1-add1-40.s gcc main.o add1-add1-40.o -o add1-add1-40.run rm add1-add1-40.o make[1]: Leaving directory `/home/travis/build/cmsc430/www/www/notes/blackmail' > ./add1-add1-40.run 42
Likewise, to test the compiler from within Racket, we use the same asm/interp.rkt code to encapsulate running assembly code:
Examples
> (asm-interp (compile (sexpr->ast '(add1 (add1 40))))) 42
> (asm-interp (compile (sexpr->ast '(sub1 8)))) 7
> (asm-interp (compile (sexpr->ast '(add1 (add1 (sub1 (add1 -8))))))) -6
5.6 Correctness and random testing
We can state correctness similarly to how it was stated for Abscond:
Compiler Correctness: For all expressions e and integers i, if (e,i) in , then (asm-interp (compile e)) equals i.
And we can test this claim by comparing the results of running compiled and interpreted programs, leading to the following property, which hopefully holds:
Examples
> (define (check-compiler e) (check-eqv? (interp (sexpr->ast e)) (asm-interp (compile (sexpr->ast e)))))
The problem, however, is that generating random Blackmail programs is less obvious compared to generating random Abscond programs (i.e. random integers). Randomly generating programs for testing is its own well studied and active research area. To side-step this wrinkle, we have provided a small utility for generating random Blackmail programs (random.rkt), which you can use, without needing the understand how it was implemented.
Examples
> (require "random.rkt") > (random-expr) 32
> (random-expr) -1
> (random-expr) '(add1 (add1 1))
> (random-expr) '(sub1 -1)
> (random-expr) '(sub1 (sub1 (sub1 (sub1 -1))))
> (asm-display (compile (sexpr->ast (random-expr))))
global entry
extern get_int
section .text
entry:
mov rax, -1
ret
> (for ([i (in-range 10)]) (check-compiler (random-expr)))
5.7 Looking back, looking forward
We’ve now built two compilers; enough to start observing a pattern.
Recall the phases of a compiler described in What does a Compiler look like?. Let’s identify these peices in the two compilers we’ve written:
Parsed into a data structure called an Abstract Syntax Tree
we use read to parse text into a s-expression
we use sexpr->ast to convert an s-expression into our AST
Checked to make sure code is well-formed (and well-typed)
we use a predicate, integer? for Abscond and expr? for Blackmail, to check whether an s-expression is a well-formed AST
Simplified into some convenient Intermediate Representation
we don’t do any; the AST is the IR
Optimized into (equivalent) but faster program
we don’t do any
Generated into assembly x86
we use abscond-compile and compile to generate assembly (in AST form), and use asm-display to print concrete X86-64
Linked against a run-time (usually written in C)
we link against our run-time written in main.c
Our recipe for building compiler involves:
Build intuition with examples,
Model problem with data types,
Implement compiler via type-transforming-functions,
Validate compiler via tests.
As we move forward, the language we are compiling will grow. As the language grows, you should apply this recipe to grow the compiler along with the language.