20 Neerdowell: structures
Structures don’t march in the streets.
20.1 Defining structs
So far we’ve built up a number of important datatypes in our language: integers, booleans, characters, pairs, lists, boxes, vectors, strings, symbols, and functions, to be precise. But despite all of this, programmers may still want more. In fact, we probably can’t anticipate all of the possible datatypes they might want, so let’s instead provide a mechanism for defining new kinds of datatypes. We’ll add the same struct mechanism for user-defined structure types that we’ve been using all semester.
We’ll call it Neerdowell.
20.2 Syntax
We will add the following concrete syntax to Neerdowell programs:
(struct struct-type (field0 ...))
Structure type definitions will only be allowed at the top-level along with other definitions, so a Neerdowell program might look like:
(struct posn (x y)) (posn 1 2)
Each structure definition creates four kinds of things:
A constructor, e.g. posn.
A predicate, e.g. posn?.
A set of accessor functions, e.g. posn-x and posn-y.
A pattern constructor, e.g. posn.
The constructor is a function that takes as many arguments as fields in the structure type definition and creates an instance of that structure type with the given arguments populating the fields.
The predicate is a unary function that takes any value and returns #t is an instance of that structure type.
The accessor functions are each unary functions that take an instance of the structure type and retrieve the value of the corresponding field.
We’re going to take a slightly different approach to parsing and interpreting/compiling structure definitions. Rather than implementing the structure operations directly, we will instead consider a set of generic structure operations that can be used to implement the constructor, predicate, and accessor operations.
A structure definition will be translated into a set of source-level function definitions that use these generic operations to implement the structure-specific operations. That translation will be carried-out by the parser.
Here are the generic operations:
; make-struct : Symbol Value ... -> StructVal ; struct? : Symbol Value -> Boolean ; struct-ref : Symbol Int StructVal -> Value
Here the Symbol argument represents the name of the structure type. So for example, to create an instance of the posn structure type, you’d call (make-struct 'posn 1 2). To check if x is an instance of a posn structure, you’d call (struct? 'posn x). To access the first field of an instance of a posn structure type x, you’d call (struct-ref 'posn 0 x).
With these operations in mind, you can see struct as a kind of shorthand:
(struct posn (x y)) ; means: (define (posn x y) (make-struct 'posn x y)) (define (posn? x) (struct? 'posn x)) (define (posn-x x) (struct-ref 'posn 0 x)) (define (posn-y x) (struct-ref 'posn 1 x))
To accomplish this, we add the following function to the parser:
; parse-struct : S-Expr -> [Listof Defn]
Here’s an example:
Examples
> (parse-struct '(struct posn (x y)))
'(#s(Defn
posn
(x y)
#s(Prim
make-struct
(#s(Quote posn) #s(Var x) #s(Var y))))
#s(Defn
posn?
(x)
#s(Prim struct? (#s(Quote posn) #s(Var x))))
#s(Defn
posn-y
(x)
#s(Prim
struct-ref
(#s(Quote posn) #s(Quote 1) #s(Var x))))
#s(Defn
posn-x
(x)
#s(Prim
struct-ref
(#s(Quote posn) #s(Quote 0) #s(Var x)))))
And here’s an example of an expression that makes use of some of the operations:
Examples
> (parse-e '(posn-x (posn 3 4)))
'#s(App
#s(Var posn-x)
(#s(App
#s(Var posn)
(#s(Quote 3) #s(Quote 4)))))
The generic struct primitives are only used by the code generated by the struct definitions, a property which is achieved by the parser by not treating 'make-struct, 'struct?, and 'struct-ref as keywords. If you write a program that uses these names, they will be treated as variables, not primitives:
Examples
> (parse-e '(struct? 'posn x))
'#s(App
#s(Var struct?)
(#s(Quote posn) #s(Var x)))
The parse-struct function is defined as follows:
;; S-Expr -> [Listof Defn] (define (parse-struct s) (match s [(list 'struct (? symbol? n) flds) (if (andmap symbol? flds) (list* (make-struct-defn-construct n flds) (make-struct-defn-predicate n) (make-struct-defn-accessors n (reverse flds))) (error "parse struct definition error"))] [_ (error "parse struct definition error")])) ;; Id [Listof Id] -> [Listof Defn] (define (make-struct-defn-construct n flds) (Defn n flds (Prim 'make-struct (cons (Quote n) (map Var flds))))) ;; Id -> [Listof Defn] (define (make-struct-defn-predicate n) (Defn (symbol-append n '?) (list 'x) (Prim 'struct? (list (Quote n) (Var 'x))))) ;; Id [Listof Id] -> [Listof Defn] (define (make-struct-defn-accessors n flds) (match flds ['() '()] [(cons f flds) (cons (Defn (symbol-append n '- f) (list 'x) (Prim 'struct-ref (list (Quote n) (Quote (length flds)) (Var 'x)))) (make-struct-defn-accessors n flds))])) ;; Symbol ... -> Symbol (define (symbol-append . ss) (string->symbol (apply string-append (map symbol->string ss))))
20.3 Structure and Interpretation
In the interpreter, we choose to represent a structure instance as an instance of a StructVal structure, which will contain the structure type name (represented as a symbol) and a vector, which holds the values of the fields:
; type Value = ... ; | (StructVal Symbol (Vectorof Val))
So for example the value produced by (posn 3 4) would be (StructVal 'posn #(3 4)). With this representation, implementing the generic operations is pretty simple. Here’s the complete interp-prims function.
;; type Struct = (StructVal Symbol (Vectorof Value)) (struct StructVal (name vals)) ;; Op [Listof Value] -> Answer (define (interp-prim p vs) (match (cons p vs) ;; ... [(list 'struct? s v) (match v [(StructVal n _) (eq? s n)] [_ #f])] [(list 'struct-ref s i (StructVal n vs)) (if (and (eq? s n) (<= 0 i (sub1 (vector-length vs)))) (vector-ref vs i) 'err)] ;; OpN [(cons 'make-struct (cons (? symbol? n) vs)) (StructVal n (list->vector vs))]))
And we can try it out:
Examples
> (define (run . p) (interp (parse p)))
> (run '(struct posn (x y)) '(posn? (posn 3 4))) #t
> (run '(struct posn (x y)) '(posn-x (posn 3 4))) 3
> (run '(struct posn (x y)) '(posn-y (posn 3 4))) 4
20.4 Compiling Structures
Providing support for structures in the compiler largely follows the same outline of previous additions. In particular, we add yet another tagged pointer type to represent structure instances. The runtime is extended to add support for printing structures.
Structure instance are represented as an array of values. The first element is a symbol, which names the structure type and the remaining elements hold the values of the fields for the structure. The length of the structure is not stored in memory since the width of structure instances is statically determined, just like closures.
Compiling the struct? and struct-ref operations are fairly straightforward, requiring just a slight bit more than say vector? and vector-ref. The slight be more being the checking that the structure type symbols match between the (automatically inserted) argument and the structure instance value. The struct-ref operation actually requires slightly less in that there’s no need for bounds checking since it is the compiler itself that inserted indices and we can trust the code we wrote is correct. For a similar reason, there’s no need to do type tag checking on the structure type and index arguments, but there still needs to be checking of the structure instance argument since that comes from the user.
One consequence of the source transformation approach to implementing structures is that, although the type structure symbol and the integer index have known, fixed types (a symbol and an integer, respectively), they will still be represented as arbitrary values, i.e. as tagged pointers or bit-shifted numbers. That means the operations must compute accordingly and there’s a small performance hit because of it. (It wouldn’t be difficult to specialize the compilation of these primitives so these arguments are treated distinctly, thus avoiding the performance hit, but making the compiler itself more complicated.)
Here is the implementation of compile-op for struct? and struct-ref:
;; Op -> Asm (define (compile-op p) (match p ;; ... ['struct? (let ((f (gensym)) (t (gensym))) (seq (Pop r8) ; don't need to do this we generated the code ; (assert-symbol r8) (Mov r9 rax) ; check the tag bits (And r9 ptr-mask) (Cmp r9 type-struct) ; compare tag to the structure tag (Jne f) ; not a structure (Xor rax type-struct) ; untag the structure pointer (Mov rax (Offset rax 0)) ; get the structure type symbol (Cmp r8 rax) ; compare it to the type argument (Mov rax (imm->bits #t)) (Jne f) ; a structure, but not this kind (Jmp t) ; a structure of the same kind (Label f) (Mov rax (imm->bits #f)) (Label t)))] ['struct-ref (seq (Pop r8) (Pop r11) ; don't need to check ; (assert-symbol r11) ; (assert-integer r8) (assert-struct rax) (Xor rax type-struct) (Mov r10 (Offset rax 0)) ; get the structure type symbol (Cmp r11 r10) ; compare it to the type argument (Jne 'raise_error_align) ; a structure, but not this kind (Sar r8 int-shift) ; convert to raw int (Add r8 1) ; +1 to skip symbol element (Sal r8 3) ; *4 for byte offset (Add rax r8) (Mov rax (Offset rax 0)))]))
Handling make-struct is slightly more complicated. The complication comes from the fact that make-struct needs to be a primitive that accepts an arbitrary number of arguments. All other primitives have a fixed arity so they know if and how many arguments to pop from the stack, whereas for make-struct it depends on the kind of structure being created.
We assume the make-struct primitive is given the appropriate number of arguments. (This is consistent with many other assumptions we make about that well-formedness of programs that are not explicitly checked by the compiler.) Under this assumption, the number of arguments indicates how many values to pop, so we have a special case for make-struct in compile-prim to compute the length and call a separate function for handling the make-struct operation:
;; Op (Listof Expr) CEnv -> Asm (define (compile-prim p es c) (seq (compile-es* es c) (match p ['make-struct (compile-make-struct (length es))] [_ (compile-op p)])))
The compile-make-struct function is defined as follows:
;; Nat -> Asm ;; Emit instructions for creating a structure of length n ;; using values on top of stack (define (compile-make-struct n) (seq (compile-make-struct/a n 1) (Mov rax rbx) (Or rax type-struct) (Add rbx (* 8 n)))) ;; Nat Nat -> Asm ;; Pop elements off stack, writing them to heap (define (compile-make-struct/a n i) (if (= n i) (seq (Mov (Offset rbx (* 8 (- n i))) rax)) (seq (Mov (Offset rbx (* 8 (- n i))) rax) (Pop rax) (compile-make-struct/a n (add1 i)))))
We can now see structures in action:
Examples
> (define (run . p) (unload/free (asm-interp (compile (parse p)))))
> (run '(struct posn (x y)) '(posn? (posn 3 4))) #t
> (run '(struct posn (x y)) '(let ((p (posn 3 4))) (cons (posn-x p) (posn-y p)))) '(3 . 4)
> (run '(struct leaf ()) '(struct node (v l r)) '(define (count bt) (if (leaf? bt) 0 (+ 1 (+ (count (node-l bt)) (count (node-r bt)))))) '(count (node 8 (node 3 (leaf) (leaf)) (leaf)))) 2