Assignment 3: Assignment, ANF, and CPS conversion (project)
Your task on assignment 3 is broken up into three distinct phases. Each of these phases will be tested separately and worth a similar number of points.
The first phase runs the output from assignment 2 through two functions: assignment-convert and alphatize. The first removes set! from the language by boxing all mutable local variables (using make-vector, vector-set!, and vector-ref prims). The second function (alphatize) makes all names unique to a single binding point (this removes any shadowing by making names distinct) and need not alter the grammar.
Input language (output from assignment 2; satisfies ir-exp?):
e ::= (let ([x e] ...) e) |
| (lambda (x ...) e) |
| (lambda x e) |
| (apply e e) |
| (e e ...) |
| (prim op e ...) |
| (apply-prim op e) |
| (if e e e) |
| (set! x e) |
| (call/cc e) |
| x |
| (quote dat) |
dat is a datum satisfying datum? from utils.rkt |
x is a variable (satisfies symbol?) |
op is a symbol satisfying prim? from utils.rkt (if not otherwise in scope) |
op ::= promise? | null? | cons | car | + | ... (see utils.rkt) |
Language after assignment-convert (and alphatize):
e ::= (let ([x e] ...) e) |
| (lambda (x ...) e) |
| (lambda x e) |
| (apply e e) |
| (e e ...) |
| (prim op e ...) |
| (apply-prim op e) |
| (if e e e) |
| (call/cc e) |
| x |
| (quote dat) |
The second phase converts to administrative normal form using a function anf-convert. This gives an explicit order of evaluation to evaluation of any subexpressions by administratively let-binding them to a temporary name. Let forms with multiple right-hand sides are also flattened into multiple let forms.
This partitions the grammar into atomic expressions (ae), which may be evaluated immediately (datums, variables, and lambdas), and complex expressions (e), which are not known trivially to terminate. After this phase, only if and let contain more than one true subexpression (if contains two complex expressoins in tail position and let forms extend the stack); other forms such as primitive operations and applications now contain only atomic sub-expressions.
Language after ANF conversion:
e ::= (let ([x e]) e) |
| (apply ae ae) |
| (ae ae ...) |
| (prim op ae ...) |
| (apply-prim op ae) |
| (if ae e e) |
| (call/cc ae) |
| ae |
ae ::= (lambda (x ...) e) |
| (lambda x e) |
| x |
| (quote dat) |
The third phase calls a function cps-convert that will convert to continuation-passing style. This means a continuation is explicitly passed (as stacks were in our CEK interpreter) and no function call ever returns, instead the current continuation is invoked at return points. At this phase, call/cc can effectively be removed. We leave prim and apply-prim as special forms that can (and now must) be on the right-hand side of a let because they need not extend the continuation.
Language after CPS conversion:
e ::= (let ([x (apply-prim op ae)]) e) |
| (let ([x (prim op ae ...)]) e) |
| (let ([x (lambda (x ...) e)]) e) |
| (let ([x (lambda x e)]) e) |
| (let ([x (quote dat)]) e) |
| (apply ae ae) |
| (ae ae ...) |
| (if ae e e) |
ae ::= (lambda (x ...) e) |
| (lambda x e) |
| x |
| (quote dat) |
The assignment 3 starter contains a file utils.rkt which provides useful functions and a specification for the input and output languages in the form of predicates ir-exp?, alphatized-exp?, anf-exp?, and cps-exp?. You can also use the function eval-ir to evaluate programs that are in any of these languages.
Your job is to write your own set of unit tests and define translation functions assignment-convert, alphatize, anf-convert, cps-convert provided in a file cps.rkt, that (put together) simplify programs satisfying ir-exp? into programs satisfying cps-exp? that are (if they don’t error) semantically equivalent according to eval-ir.
Tests (files in tests/*/) that end in .ir, .alpha, and .anf test specific phases for assignment conversion & alphatization, ANF conversion, and CPS conversion, respectively. Tests that end in .scm will be desugared and then run through each phase in turn (and must pass all three phases).
Some unsorted advice for assignment 3:
For assignment conversion, you might write a helper function that returns a set of all mutated variables (i.e., all variables x for which there exists a subexpression (set! x e)). Then, you can use the translation shown in the SSA set of slides. Mutable variable references turn into a use of vector-ref, set! turns into a use of vector-set! and make-vector is used to initialize these "boxes" (single-cell vectors that store mutable values). Pay special attention to how you initialize a mutable variable x in an expression like (lambda (w x y z) ...).
Your alphatize function can pass down a substitution environment mapping old names to new names. When a symbol is encountered, use hash-ref to return its new name. When a lambda or let form is encountered, generate a new name (or list of names) and extend the environment used to alphatize subexpressions. It’s perfectly fine to rename all variables in the program to avoid tracking which variables are actually shadowed.
[New] The cps-convert output grammar has been expanded to allow datums and lambdas to be let-bound, even though they are already atomic expressions. This won’t stop any already-working code but is more permissive in a potentially useful way.
Extra credit opportunities: There are a few extra-credit tests which check the efficiency of the above transformations. Any code that is syntactically and semantically valid will be accepted by non-extra-credit tests, regardless of output efficiency (within reason).