Assignment 3: Conditional forms and parsing
Due: Tues, Sept 17, 11:59PM
The goal of this assignment is to extend a compiler with some simple unary numeric operations and conditional expressions, and to write a parser.
You are given a repository with a starter compiler for the Con language we studied in class. You are tasked with:
extending the language in a number of ways and
implementing a parser for the language
More primitives
Add the following forms of expression to the Con language:
Here’s one possible way to compute the absolute value of the value in rax:
mov rbx, rax |
neg rax |
cmovl rax, rbx |
Update ast.rkt to include these new forms of expression.
Update syntax.rkt to recognize s-expressions that represent valid programs.
Update interp.rkt to correctly interpret these expressions.
Update compile.rkt to correctly compile these expressions.
The neg and cmovl instructions have been included in the given asm code. If you need other x86 instructions, you will need to modify the asm/* code.
Con with Cond
The Con language we studied added a simple form of performing conditional evaluation of sub-expressions:
However, in the original paper on Lisp, Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I, John McCarthy introduced a generalization of if called “conditional expressions,” which we could add to our language with the following syntax:
(cond [(zero? e-p1) e-a1] ... [else e-an])
A cond expression has any number of clauses [(zero? e-pi) e-ai] ..., followed by an “else” clause [else en]. (We are using zero? here to avoid having to introduce booleans as a distinct data type; a restriction we will remove later.)
The meaning of a cond expression is computed by evaluating each (zero? e-pi) in order until the first one that is true is found, in which case, the corresponding e-ai is evaluated and its value is the value of the cond expression. If no such e-pi exists, e-an’s value is the value of the cond.
The formal semantics can be defined as:
Your task is to extend Con with this (restricted) form of cond.
To do this, you should:
Update ast.rkt to include cond expressions.
Update syntax.rkt to recognize s-expressions that represent valid programs.
Update interp.rkt to correctly interpret cond expressions.
Update compile.rkt to correctly compile cond expressions.
The subset of x86 needed to compile this extension of Con should not require anything more than what was used for Con without cond, so you should not need make changes to asm/*.
Reading is Overrated
We have so far side-stepped the issue of parsing by (1) relying on s-expression notation for the concrete syntax of programs and (2) using the built-in read function for parsing s-expressions.
Your task is to design and implement a parser for the extended Con language based on the following grammar:
<expr> ::= integer |
| boolean |
| ( <compound> ) |
| [ <compound> ] |
|
<compound> ::= <prim> <expr> |
| if <question> <expr> <expr> |
| cond <clause>* <else> |
|
<prim> ::= add1 | sub1 | abs | - |
|
<clause> ::= ( <question> <expr> ) |
| [ <question> <expr> ] |
|
<question> ::= ( zero? <expr> ) |
| [ zero? <expr> ] |
|
<else> ::= ( else <expr> ) |
| [ else <expr> ] |
There is a lexer given to you in lex.rkt, which provides two functions: lex-string and lex-port, which consume a string or an input port, respectively, and produce a list of tokens, which are defined as:
; type Token = ; | Integer ; | Boolean ; | 'add1 ; | 'sub1 ; | 'zero? ; | 'if ; | 'cond ; | 'else ; | 'abs ; | '- ; | 'lparen ;; ( ; | 'rparen ;; ) ; | 'lsquare ;; [ ; | 'rsquare ;; ] ; | 'eof ;; end of file
The lexer will take care of reading the #lang racket header and remove any whitespace.
You must complete the code in parse.rkt to implement the parser which constructs an s-expression representing a valid (extended) Con expression, if possible, from a list of tokens. The parse function should have the following signature and must be provided by the module:
; parse : [Listof Token] -> Expr
As an example, parse should produce '(add1 (sub1 7)) if given '(lparen add1 lparen sub1 7 rparen rparen eof).
You should not need to make any changes to lex.rkt.
You may use any approach you’d like to write the parser, but following the recursive descent predictive parsing as studied in CMSC 330 is recommended. See the slides if you need a refresher.
If you want to set things up as done in 330, you can do the following:
(define *input* (box '())) ; [Listof Token] -> Expr (define (parse lot) (set-box! *input* lot) (let ((e (parse-expr!)) (_ (match-tok! 'eof))) e)) ; -> Expr ; EFFECT: consume one expression's worth of tokens (define (parse-expr!) (match (look-ahead) [... ...])) ; -> Token ; Produce (but don't consume) the next token (define (look-ahead) (match (unbox *input*) ['() (error "no look ahead available")] [(cons t _) t])) ; Token -> Token ; EFFECT: consumes one token of input (define (match-tok! t) (match (unbox *input*) ['() (error "no token available")] [(cons next ts) (set-box! *input* ts) (unless (equal? t next) (error "parse error")) t]))
The box, unbox, and set-box! functions correspond to OCaml’s ref, !, and := operators, respectively.
The bang! naming convention is a Scheme convention for marking effectful functions (but it’s just a naming convention).
This construction closely follows the 330 notes.
There is one complication, which is that the grammar requires 2 tokens of look-ahead when parsing a cond in order to determine if the next thing to parse is a <clause> or an <else>.
The simplest solution is just to add a look-ahead2 function that let’s you peek at the second token in the input stream.
As an alternative to the 330 design, you could try to do things functionally with the following set-up:
; [Listof Token] -> Expr (define (parse lot) (match (parse-expr lot) [(cons '(eof) e) e] [_ (error "parse error")])) ; [Listof Token] -> (Pairof [Listof Token] Expr) (define (parse-expr lot) (match lot [... ...]))
Here the idea is that each function that corresponds to a non-terminal is given the list of tokens to parse. It produces a pair of things: the remaining tokens after parsing and the thing it parsed. (The functional approach is much easier to test, IMO.)
Once your parser is complete, you can make the noted changes in compile-file.rkt and interp-file.rkt to make use of your own parser and remove the dependence on Racket’s read function.
Testing
You can test your code in several ways:
Using the command line raco test . from the directory containing the repository to test everything.
Using the command line raco test <file> to test only <file>.
- Pushing to github. You can see test reports at:
(You will need to be signed in in order see results for your private repo.)
Note that only a small number of tests are given to you, so you should write additional test cases.
Ther random.rkt module provides a random-expr function for generating random (extended) Con expressions. It is used in the test/compile-rand.rkt file to randomly test compiler correctness.
There is a property-based random tester for the compiler in test/compile-rand.rkt that compiles and runs 500 random programs.
Submitting
Pushing your local repository to github “submits” your work. We will grade the latest submission that occurs before the deadline.