10 Extort: when errors exist
The greatest mistake is to imagine that we never err.
10.1 Errors
We have added multiple, disjoint types, but mostly swept issues of errors under the rug by considering type mismatches as meaningless. Now let’s redesign the semantics to specify the error behavior of such programs.
We’ll call it Extort.
Nothing changes in the syntax of Extort from the previous language, although we will need to talk about two kinds of results from evaluating programs: values and errors. We will say that evaluation produces an answer, which is either a value or error:
10.2 Meaning of Extort programs
Languages adopt several approaches to type mismatches:
Prohibit such programs statically with a type system (e.g. OCaml, Java)
Coerce values to different types (e.g. JavaScript)
Signal a run-time error (e.g. Racket)
Leave the behavior unspecified (e.g. Scheme, C)
We’ve previously seen the last approach. Now let’s do what Racket does and signal an error.
The meaning of Extort programs that have type errors will now be defined as 'err:
Now what does the semantics say about (add1 #f)? What about (if 7 #t -2)?
The signature of the interpreter is extended to produce answers. Each use of a Racket primitive is guarded by checking the type of the arguments and an error is produced if the check fails. Errors are also propagated when a subexpression produces an error:
#lang racket (provide interp) (require "ast.rkt") (require "interp-prim.rkt") ;; type Value = ;; | Integer ;; | Boolean ;; | Character ;; | Eof ;; | Void ;; Expr -> Answer (define (interp e) (match e [(Lit d) d] [(Eof) eof] [(Prim0 p) (interp-prim0 p)] [(Prim1 p e) (match (interp e) ['err 'err] [v (interp-prim1 p v)])] [(If e1 e2 e3) (match (interp e1) ['err 'err] [v (if v (interp e2) (interp e3))])] [(Begin e1 e2) (match (interp e1) ['err 'err] [v (interp e2)])]))
#lang racket (provide interp-prim0 interp-prim1) ;; Op0 -> Value (define (interp-prim0 op) (match op ['read-byte (read-byte)] ['peek-byte (peek-byte)] ['void (void)])) ;; Op1 Value -> Answer (define (interp-prim1 op v) (match (list op v) [(list 'add1 (? integer?)) (add1 v)] [(list 'sub1 (? integer?)) (sub1 v)] [(list 'zero? (? integer?)) (zero? v)] [(list 'char? v) (char? v)] [(list 'integer->char (? codepoint?)) (integer->char v)] [(list 'char->integer (? char?)) (char->integer v)] [(list 'write-byte (? byte?)) (write-byte v)] [(list 'eof-object? v) (eof-object? v)] [_ 'err])) ;; Any -> Boolean (define (codepoint? v) (and (integer? v) (or (<= 0 v 55295) (<= 57344 v 1114111))))
We can confirm the interpreter computes the right result for the examples given earlier:
Examples
> (interp (Prim1 'add1 (Lit #f))) 'err
> (interp (Prim1 'zero? (Lit #t))) 'err
> (interp (If (Prim1 'zero? (Lit #f)) (Lit 1) (Lit 2))) 'err
The statement of correctness stays the same, but now observe that there is no way to crash the interpreter with any Expr value.
10.3 A Compiler for Extort
Suppose we want to compile (add1 #f), what needs to happen? Just as in the interpreter, we need to check the integerness of the argument’s value before doing the addition operation.
The runtime system is written with a level of indirection between raise_error and the code that prints and exits in error_exit; this is done so that the testing framework can intercede and replace the error function, but it can be ignored.
#include <stdio.h> #include <stdlib.h> #include "values.h" #include "print.h" #include "runtime.h" FILE* in; FILE* out; void (*error_handler)(); void error_exit() { ("err\n"); printf(1); exit} void raise_error() { return error_handler(); } int main(int argc, char** argv) { = stdin; in = stdout; out = &error_exit; error_handler ; val_t result = entry(); result (result); print_resultif (val_typeof(result) != T_VOID) ('\n'); putchar return 0; }
Most of the work of error checking happens in the code emitted for primitive operations. Whenever an error is detected, control jumps to a label called 'err that immediately calls raise_error:
#lang racket (provide compile-op0 compile-op1) (require "ast.rkt") (require "types.rkt") (require a86/ast) (define rax 'rax) (define rdi 'rdi) ; arg (define r9 'r9) ; scratch ;; Op0 -> Asm (define (compile-op0 p) (match p ['void (seq (Mov rax (value->bits (void))))] ['read-byte (seq (Call 'read_byte))] ['peek-byte (seq (Call 'peek_byte))])) ;; Op1 -> Asm (define (compile-op1 p) (match p ['add1 (seq (assert-integer rax) (Add rax (value->bits 1)))] ['sub1 (seq (assert-integer rax) (Sub rax (value->bits 1)))] ['zero? (seq (assert-integer rax) (Cmp rax 0) if-equal)] ['char? (seq (And rax mask-char) (Cmp rax type-char) if-equal)] ['char->integer (seq (assert-char rax) (Sar rax char-shift) (Sal rax int-shift))] ['integer->char (seq (assert-codepoint) (Sar rax int-shift) (Sal rax char-shift) (Xor rax type-char))] ['eof-object? (seq (Cmp rax (value->bits eof)) if-equal)] ['write-byte (seq assert-byte (Mov rdi rax) (Call 'write_byte))])) ;; -> Asm ;; set rax to #t or #f if comparison flag is equal (define if-equal (seq (Mov rax (value->bits #f)) (Mov r9 (value->bits #t)) (Cmove rax r9))) (define (assert-type mask type) (λ (arg) (seq (Mov r9 arg) (And r9 mask) (Cmp r9 type) (Jne 'err)))) (define (type-pred mask type) (seq (And rax mask) (Cmp rax type) if-equal)) (define assert-integer (assert-type mask-int type-int)) (define assert-char (assert-type mask-char type-char)) (define (assert-codepoint) (let ((ok (gensym))) (seq (assert-integer rax) (Cmp rax (value->bits 0)) (Jl 'err) (Cmp rax (value->bits 1114111)) (Jg 'err) (Cmp rax (value->bits 55295)) (Jl ok) (Cmp rax (value->bits 57344)) (Jg ok) (Jmp 'err) (Label ok)))) (define assert-byte (seq (assert-integer rax) (Cmp rax (value->bits 0)) (Jl 'err) (Cmp rax (value->bits 255)) (Jg 'err)))
All that’s left for the top-level compile function to declare an external label 'raise_error that will be defined by the run-time system and to emit a label called 'err that calls raise_error, otherwise this part of the compiler doesn’t change:
#lang racket (provide (all-defined-out)) (require "ast.rkt") (require "compile-ops.rkt") (require "types.rkt") (require a86/ast) (define rax 'rax) (define rsp 'rsp) ; stack ;; Expr -> Asm (define (compile e) (prog (Global 'entry) (Extern 'peek_byte) (Extern 'read_byte) (Extern 'write_byte) (Extern 'raise_error) (Label 'entry) (Sub rsp 8) (compile-e e) (Add rsp 8) (Ret) ;; Error handler (Label 'err) (Call 'raise_error))) ;; Expr -> Asm (define (compile-e e) (match e [(Lit d) (compile-value d)] [(Eof) (compile-value eof)] [(Prim0 p) (compile-prim0 p)] [(Prim1 p e) (compile-prim1 p e)] [(If e1 e2 e3) (compile-if e1 e2 e3)] [(Begin e1 e2) (compile-begin e1 e2)])) ;; Value -> Asm (define (compile-value v) (seq (Mov rax (value->bits v)))) ;; Op0 -> Asm (define (compile-prim0 p) (compile-op0 p)) ;; Op1 Expr -> Asm (define (compile-prim1 p e) (seq (compile-e e) (compile-op1 p))) ;; Expr Expr Expr -> Asm (define (compile-if e1 e2 e3) (let ((l1 (gensym 'if)) (l2 (gensym 'if))) (seq (compile-e e1) (Cmp rax (value->bits #f)) (Je l1) (compile-e e2) (Jmp l2) (Label l1) (compile-e e3) (Label l2)))) ;; Expr Expr -> Asm (define (compile-begin e1 e2) (seq (compile-e e1) (compile-e e2)))
Examples
> (define (show e) (displayln (asm-string (compile-e (parse e))))) > (show '(add1 #f))
mov rax, 7
mov r9, rax
and r9, 1
cmp r9, 0
jne _err
add rax, 2
Examples
> (define (tell e) (match (asm-interp (compile (parse e))) ['err 'err] [b (bits->value b)])) > (tell #t) #t
> (tell #f) #f
> (tell '(zero? 0)) #t
> (tell '(zero? -7)) #f
> (tell '(if #t 1 2)) 1
> (tell '(if #f 1 2)) 2
> (tell '(if (zero? 0) (if (zero? 0) 8 9) 2)) 8
> (tell '(if (zero? (if (zero? 2) 1 0)) 4 5)) 4
> (tell '(add1 #t)) 'err
> (tell '(sub1 (add1 #f))) 'err
> (tell '(if (zero? #t) 1 2)) 'err
Since the interpreter and compiler have well defined specifications for what should happen when type errors occur, we can test in the usual way again:
Examples
> (define (check-correctness e) (check-equal? (match (asm-interp (compile e)) ['err 'err] [b (bits->value b)]) (interp e) e)) > (check-correctness (Prim1 'add1 (Lit 7)))
--------------------
ERROR
name: check-equal?
location: eval:406:0
check-equal?: contract violation
expected: (or/c #f string?)
given: '#s(Prim1 add1 #s(Lit 7))
--------------------
> (check-correctness (Prim1 'add1 (Lit #f)))
--------------------
ERROR
name: check-equal?
location: eval:406:0
check-equal?: contract violation
expected: (or/c #f string?)
given: '#s(Prim1 add1 #s(Lit #f))
--------------------