2 From OCaml to Racket
Racket = OCaml with uniform syntax and no types (for now)
2.1 Basic values
Let’s start by looking at something you know: OCaml. In OCaml, expressions can include literals for numbers, strings, booleans. Here we are using the OCaml read-eval-print-loop (REPL) to type in examples and evaluate their results:
OCaml REPL
8;; # int = 8 - : "ocaml";; # string = "ocaml" - : true;; # bool = true - : false;; # bool = false - :
When the REPL is given an expression, it is evaluated to a value. In the case of OCaml, it prints both the value and its type. OCaml is able to determine the type of the value before evaluating the expression, thanks to its static type system.
Note that the ;; is not part of the expression syntax, but is a terminator token, signalling to the REPL that the expression is complete and ready to be evaluated.
The Racket REPL operates similarly, but doesn’t require a terminator:
Racket REPL
> 8 8
> "racket" "racket"
> #t #t
> #f #f
OCaml prints out the type of each expression, in addition to its value, while Racket only prints the value. The notation for booleans is slightly different, but both languages agree on numbers, strings, and booleans. OCaml uses a # prompt, while Racket uses >, but these differences are immaterial. The languages are essentially the same so far.
Racket doesn’t print the type because it does not have a static type system like OCaml’s and it has no way of predicting the type of value an expression will produce before it’s evaluated.
2.2 Basic operations
OCaml uses an infix notation for writing operations.
OCaml REPL
1 + 2 * 2;; # int = 5 - :
The order of operations follows the usual mathematical precendence rules (which you must memorize), or you can use parentheses to indicate grouping:
OCaml REPL
1 + (2 * 2);; # int = 5 - : 1 + 2) * 2;; # (int = 6 - :
Extraneous parenthesis are fine:
OCaml REPL
1))) + ((2 * 2));; # (((int = 5 - :
Compared to many languages you may know, including OCaml, Racket employs a uniform, minimalistic concrete syntax based on the concept of parenthesized, prefix notation.
In this notation, parentheses play a much more central role. They are not optional and they signal the form of the expression.
Languages, like people, descend from their ancestors and inherit some of their properties. In the case of notation, Racket inherits the Lisp (and Scheme) notation for programs. It takes a bit of getting used to, but once aclimated, the notation should feel lightweight and consistent; there is very little to memorize when it comes to syntax.
So in Racket, we would write:
Racket REPL
> (+ 1 (* 2 2)) 5
> (* (+ 1 2) 2) 6
Note that there are no precendence rules for addition and multiplication: the form of the expression makes it unambiguous.
Parenthesis indicate function applications, so adding extraneous parens means something different than in OCaml:
Racket REPL
> (1) application: not a procedure;
expected a procedure that can be applied to arguments
given: 1
Here the parens are indicating a function application. The “function” is the first subexpression within the parens, i.e. 1. Of course, 1 isn’t a function and can’t be applied, hence the error.
2.3 Numbers
Integers in OCaml and Racket look pretty similar, but the two languages have differing approaches to numbers overall. In OCaml, the int type can represent only fixed bit-width integers. Hence there is a maximal (and minimal) integer. The variables max_int and min_int are bound to these values, respectively:
OCaml REPL
max_int;; # int = 4611686018427387903 - : min_int;; # int = -4611686018427387904 - :
What happens when you do something like add 1 to max_int? Mess around and find out.
In Racket, integers behave like the integers you learned about in math class. There’s no largest or smallest one:
Racket REPL
> (add1 4611686018427387903) 4611686018427387904
In principle, you can represent arbitrarily large (or small) integers. In practice, you are bounded by the amount of memory available that can be used to represent those integers.
Another difference is that in OCaml, integers are disjoint from floating point numbers. They have different literal syntaxes, different types, and different operations. If you want to add a floating point number and an integer together, you’ll have to explicitly convert one of them.
OCaml REPL
1 +. 3.14;; # type int but an expression was expected of type Error: This expression has float float_of_int 1) +. 3.14;; # (float = 4.14000000000000057 - :
In Racket, operations work on different kinds of numbers and can be used without conversions:
Racket REPL
> (+ 1 3.14) 4.140000000000001
Moreover, Racket has certain kinds of numbers that are not supported (without using libraries) in OCaml. For example, you can write 2/3 to mean the rational number two-thirds in Racket:
Racket REPL
> 2/3 2/3
It’s worth noting that while this may look like division, it’s not: we are writing the literal number 2/3. The division operator, like every other operation, would have to be written using prefix notation with parentheses:
Racket REPL
> (/ 2 3) 2/3
But notice that division produces exact rational results, not a floating point approximation as in OCaml:
OCaml REPL
2. /. 3.;; # float = 0.66666666666666663 - :
It’s also possible to use complex numbers in Racket. The sqrt operation computes the square root of its argument:
Racket REPL
> (sqrt 16) 4
> (sqrt 25) 5
But when given a negative number, it computes a complex result:
Racket REPL
> (sqrt -1) 0+1i
> (sqrt -100) 0+10i
Mostly we will be sticking to using integers and will not taking advantage of the advanced numeric system of Racket, but it’s worth knowing it’s there.
2.4 Functions
OCaml also has a notation for writing functions:
OCaml REPL
fun x y -> x + y;; # int -> int -> int = <fun> - :
This make an anonymous function that consumes two integers and produces their sum.
To apply it, we can write it justapoxed with arguments:
OCaml REPL
fun x y -> x + y) 3 4;; # (int = 7 - :
Note that in OCaml, every function is a function of exactly one argument. Therefore fun x y -> x + y is actuallty shorthand for fun x -> fun y -> x + y.
Applying such a function to fewer than 2 arguments will do a partial function application, which will produce a function that take the remaining arguments:
OCaml REPL
fun x y -> x + y) 3;; # (int -> int = <fun> - :
To encode functions that must be given two arguments, a tuple can be used:
OCaml REPL
fun (x, y) -> x + y;; # int * int -> int = <fun> - :
To apply such a function, it must be given a pair of integers:
OCaml REPL
fun (x, y) -> x + y) (3, 4);; # (int = 7 - :
The use of (x, y) here in the function parameters is actually a pattern. This can be understood as shorthand for:
OCaml REPL
fun p -> match p with (x, y) -> x + y;; # int * int -> int = <fun> - :
So even this function is actually taking a single argument (which must be a pair of numbers).
Racket has a similar notation for writing functions:
Racket REPL
> (λ (x) (λ (y) (+ x y))) #<procedure>
You can also write this without the fancy λ by spelling it lambda:
Racket REPL
> (lambda (x) (lambda (y) (+ x y))) #<procedure>
(In DrRacket, to insert a “λ” press Cmd+\.)
To apply it, it must be written in parens, juxtaposed with arguments:
Racket REPL
> (((λ (x) (λ (y) (+ x y))) 3) 4) 7
Functions in Racket do not always consume a single argument. They can consume 0, 1, or more arguments.
Racket REPL
> (λ (x y) (+ x y)) #<procedure>
This is not a shorthand for the function above it; rather it is a function that expects two arguments:
Racket REPL
> ((λ (x y) (+ x y)) 3 4) 7
Applying a function to the wrong number of arguments will result in an error (and not perform partial function application):
Racket REPL
> ((λ (x y) (+ x y)) 3) arity mismatch;
the expected number of arguments does not match the given
number
expected: 2
given: 1
2.5 Definitions
At the top-level in OCaml, variables can be defined with let and let rec:
OCaml REPL
let x = 3;; # val x : int = 3 let y = 4;; # val y : int = 4 # x + y;;int = 7 - : let rec fact = fun n -> # match n with 0 -> 1 | 1));; | n -> n * (fact (n - val fact : int -> int = <fun> 5;; # fact int = 120 - :
In Racket, variables are defined with the define form:
Racket REPL
> (define x 3) > (define y 4) > (+ x y) 7
> (define fact (λ (n) (match n [0 1] [n (* n (fact (- n 1)))]))) > (fact 5) 120
(Note that the use of square brackets here is stylistic: from Racket’s point of view as long as “parentheses” (e.g. ({[) match, any kind is acceptable.)
In OCaml, function definitions can be written as:
OCaml REPL
let rec fact n = # match n with 0 -> 1 | 1));; | n -> n * (fact (n - val fact : int -> int = <fun>
This is just a shorthand for the definition written above in terms of fun.
Similarly in Racket, function definitions can be written as:
Racket REPL
> (define (fact n) (match n [0 1] [n (* n (fact (- n 1)))]))
which is shorthand for the definition above using λ.
Notice both OCaml and Racket have pattern matching forms, which are quite useful for writing function in terms of a number of "cases." More on this in a minute.
2.6 Lists
OCaml has a built-in list datatype. The empty list is written [] and :: is an operation for “consing” an element on to a list. So to build a list with three integer elements, 1, 2, and 3, you’d write:
OCaml REPL
1 :: 2 :: 3 :: [];; # int list = [1; 2; 3] - :
The notation [1; 2; 3] is just a shorthand for the above.
The notation (list 1 2 3) is shorthand for the above.
There is a slight difference here. For one, OCaml lists must be homogeneous. You can have a list of strings or a list of numbers, but you can’t have a list of strings and numbers.
OCaml REPL
"a"; 3];; # [type int but an expression was expected of type Error: This expression has string
In Racket, there is no such restriction:
Racket REPL
> (list "a" 3) '("a" 3)
Also, in Racket, cons plays the role of both tupling (making pairs) and making lists (making a pair of an element and another list).
So in OCaml, you could make a pair ("a", 3). In Racket, you’d write (cons "a" 3). Note this is a pair and not a proper list. In OCaml, tuples and lists are disjoint things. In Racket, lists and tuples (pairs) are made out of the same stuff.
This can be confusing the first time you encounter it, so let’s go over it a bit more.
In Racket (or any Lisp), cons plays the role of both the pair constructor and the list constructor. Non-empty lists are a subset of pairs: they are pairs whose second component is a list (either the empty list or another pair whose second component is a list, etc.).
You can make pairs out of any kind of element and you can make lists out of any kind of elements.
The functions first and rest operate on non-empty lists, producing the first element of the list and the tail of the list, respectively.
Racket REPL
> (first (cons 3 (cons 4 '()))) 3
> (rest (cons 3 (cons 4 '()))) '(4)
These function will produce errors if given something that is a pair but not a list:
Racket REPL
> (first (cons 3 4)) first: contract violation
expected: (and/c list? (not/c empty?))
given: '(3 . 4)
> (rest (cons 3 4)) rest: contract violation
expected: (and/c list? (not/c empty?))
given: '(3 . 4)
On the other hand, the functions car and cdr access the left and right components of a pair (the names are admittedly awful and an artifact of Lisp history):
Racket REPL
> (car (cons 3 4)) 3
> (cdr (cons 3 4)) 4
When given pairs that are also lists, they behave just like first and rest:
Racket REPL
> (car (cons 3 (cons 4 '()))) 3
> (cdr (cons 3 (cons 4 '()))) '(4)
2.7 Pattern matching
OCaml has a very nice pattern matching for letting you express case analysis and decomposition in a concise way.
Each pattern maching expression has a sub-expression that produce a value to be matched against and a number of clauses. Each clause has a pattern and an expression. The pattern potentially consists of data constructors, variables, and literals. If the value matches the first pattern, meaning the value and the template match up on constructors and literals, then the variables are bound to the correspond parts of the value, and the right-hand side expression is evaluated. If the value doesn’t match, the next pattern is tried, and so on. It’s an error if none of the patterns match.
So for example, we can write a function that recognize even digits as:
OCaml REPL
let even_digit n = # match n with 0 -> true | 2 -> true | 4 -> true | 6 -> true | 8 -> true | false;; | _ -> val even_digit : int -> bool = <fun>
The patterns here, save the last one, are just integer literals. If n is the same as any of these integers, the value true is produced. The last case uses a "wildcard," which matches anything and produces false.
Here’s an example that matches a tuple, binding each part of the tuple to a name and then using those names to construct a different tuple:
OCaml REPL
let swap p = # match p with | (x, y) -> (y, x);;val swap : 'a * 'b -> 'b * 'a = <fun>
Here the pattern uses a data constructor (the tuple constructor). It matches any value that is made with the same constructor.
Here is a recursive function for computing the sum of a list of numbers, defined with pattern matching:
OCaml REPL
let rec sum xs = # match xs with 0 | [] -> | x :: xs -> x + (sum xs);;val sum : int list -> int = <fun> 4; 5; 6];; # sum [int = 15 - :
We can do the same in Racket:
Racket REPL
> (define (even-digit n) (match n [0 #t] [2 #t] [4 #t] [6 #t] [8 #t] [_ #f]))
> (define (swap p) (match p [(cons x y) (cons y x)]))
> (define (sum xs) (match xs ['() 0] [(cons x xs) (+ x (sum xs))])) > (sum (list 4 5 6)) 15
2.8 Structures
OCaml has the ability to declare new datatypes using records and variants. For example, we can define type for binary trees of integers:
OCaml REPL
type bt = # | Leafof int * bt * bt;; | Node type bt = Leaf | Node of int * bt * bt
This declares a new type, named bt. There are two variants of the bt type, each with their own constructor: Leaf and Node. The Leaf constructor takes no arguments, so just writing Leaf creates a (empty) binary tree:
OCaml REPL
# Leaf;; - : bt = Leaf
The Node constructor takes three arguments: an integer and two binary trees. Applying the constructor to a tuple of three things, makes a (non-empty) binary tree:
OCaml REPL
3, Leaf, Leaf);; # Node (3, Leaf, Leaf) - : bt = Node (
Binary trees are an example of a recursive datatype, since one of the variants contains binary trees. This means we can build up arbitrarily large binary trees by nesting nodes within nodes:
OCaml REPL
3, Node (4, Leaf, Leaf), Node (7, Leaf, Leaf));; # Node (3, Node (4, Leaf, Leaf), Node (7, Leaf, Leaf)) - : bt = Node (
Pattern matching is used to do case analysis and deconstruct values. So for example, a function that determines if a binary tree is empty can be written as:
OCaml REPL
let bt_is_empty bt = # match bt with true | Leaf -> false;; | Node _ -> val bt_is_empty : bt -> bool = <fun> # bt_is_empty Leaf;;bool = true - : 3, Leaf, Leaf));; # bt_is_empty (Node (bool = false - :
The patterns use the constructor names to discriminate on which constructor was used for a given binary tree. The use of the wildcard here is just saying it doesn’t matter what’s inside a node; if you’re a node, you’re not empty.
Recursive functions work similarly, but use variables inside patterns to bind names to the binary trees contained inside a node:
OCaml REPL
let rec bt_height bt = # match bt with 0 | Leaf -> | Node (_, left, right) ->1 + max (bt_height left) (bt_height right);; val bt_height : bt -> int = <fun> # bt_height Leaf;;int = 0 - : 4, Node (2, Leaf, Leaf), Leaf));; # bt_height (Node (int = 2 - :
We do something very similar in Racket using structures. A structure type is like a (single) variant of a data type in OCaml: it’s a way of combining several things into one new kind of value.
Racket REPL
> (struct leaf ()) > (struct node (i left right))
This declares two new kinds of values: leaf structures and node structures. For each, we get a constructor, which is a function named after the structure type. The leaf constructor takes no arguments. The node constructor takes 3 arguments.
Racket REPL
> (leaf) (leaf)
> (node 5 (leaf) (leaf)) (node 5 (leaf) (leaf))
> (node 3 (node 2 (leaf) (leaf)) (leaf)) (node 3 (node 2 (leaf) (leaf)) (leaf))
With these structure definitions in place, we can represent binary trees of integers just as in OCaml, and functions that process binary trees look very similar:
Racket REPL
> (define (bt-height bt) (match bt [(leaf) 0] [(node _ left right) (+ 1 (max (bt-height left) (bt-height right)))])) > (bt-height (leaf)) 0
> (bt-height (node 4 (node 2 (leaf) (leaf)) (leaf))) 2
One thing to note here is that in OCaml, the Node and Leaf constructors are part of the bt type. You can’t construct a node that doesn’t conform to the bt type definition and you can’t re-use these constructors in the definition of some other type.
This isn’t the case in Racket. Structures, like pairs and lists, can contain any kind of value. So while it doesn’t conform to our idea of what a binary tree is: (node #t "fred" 9) is a value in Racket.
2.9 Symbols
One of the built-in datatypes we will use often in Racket is that of a symbol. A symbol is just an atomic peice of data. A symbol is written using the quote notation (quote symbol-name), which is abbreviated 'symbol-name. What’s allowable as a symbol name follows the same rules as what’s allowable as a Racket identifier.
Symbols don’t have a whole lot of operations. The main thing you do with symbols is tell them apart from eachother:
Racket REPL
> (equal? 'fred 'fred) #t
> (equal? 'fred 'wilma) #f
It is possible to convert between symbols and strings:
Racket REPL
> (symbol->string 'fred) "fred"
> (string->symbol "fred") 'fred
There’s also a convient function that produces a symbol that is guaranteed to have not been used so far each time you call it:
Racket REPL
> (gensym) 'g3032
> (gensym) 'g3033
> (gensym) 'g3034
You can use pattern matching to match symbols:
Racket REPL
> (define (flintstone? x) (match x ['fred #t] ['wilma #t] ['pebbles #t] [_ #f])) > (flintstone? 'fred) #t
> (flintstone? 'barney) #f
There’s really not a precise analog to symbols in OCaml. (There’s something called a polymorphic variant, which plays a similar role to symbols in OCaml, but it wasn’t covered in CMSC 330.)
2.10 Quote
One of the distinguishing features of languages in the Lisp family (such as Scheme and Racket) is the quote form.
The “tick” character 'd is used as a shorthand for (quote d).
You’ve already seen it show up with symbols: 'x is the symbol x. It also shows up in the notation for the empty list: '().
But you can also write quote around non-empty lists like '(x y z). This makes a list of symbols. It is equivalent to saying (list 'x 'y 'z).
In fact, you can nest lists within the quoted list: '((x) y (q r)). This is equivalent to (list (list 'x) 'y (list 'q 'r)).
So, anything you can write with quoted lists, you can write without quoted lists by pushing the quote inward until reaching a symbol or an empty set of parenthesis.
You can also put strings, booleans, and numbers inside of a quote. As you push the quote inward, it simply disappears when reaching a string, boolean or number. So '5 is just 5. Likewise '#t is #t and '"Fred" is "Fred".
You can also write pairs with quote, which uses the . notation for separating the left and right part of the pair. For example, '(1 . 2) is equivalent to (cons 1 2). If you write something like '(1 2 3 . 4), what you are in effect saying is (cons 1 (cons 2 (cons 3 4))), an improper list that ends in 4.
In essence, quote is a shorthand for conveniently constructing data and is a very concise notation for writing down ad-hoc data. It serves much the same purpose as formats like JSON and XML, except there’s even less noise.
To summarize, with quote, you can construct
strings
booleans
numbers
symbols
and... pairs (or lists) of those things (including this one)
The kind of things you can construct with the quote form are often called s-expressions, short for symbolic expressions.
The reason for this name is because anything you can write down as an expression, you can write down inside a quote to obtain a data representation of that expression. You can render an expression as a symbolic representation of itself.
For example, (+ 1 2) is an expression. When run, it applies the function bound to the variable + to the arguments 1 and 2 and produces 3. On the other hand: '(+ 1 2) constructs a peice of data, namely, a list of three elements. The first element is the symbol +, the second element is 2, the third element is 3.
We will be using (subsets of) s-expressions extensively as our data representation of AST and IR expressions, so it’s important to gain a level of fluency with them now.
2.11 Testing, modules, submodules
We will take testing seriously in this class. Primarily this will take the form of unit tests, for which we will use the rackunit library. To use the library, you must require it.
Here is a simple example:
Racket REPL
> (require rackunit) > (check-equal? (add1 4) 5) > (check-equal? (* 2 3) 7)
--------------------
FAILURE
name: check-equal?
location: eval:63:0
actual: 6
expected: 7
--------------------
The check-equal? function takes two arguments (and an optional third for a message to display should the test fail) and checks that the first argument produces something that is equal? to the expected outcome given as the second argument.
There are many other forms of checks and utilities for building up larger test suites, but check-equal? will get us a long way.
As a matter of coding style, we will place tests nearby the function they are testing and locate them within their own module. Let’s talk about modules for a minute.
In Racket, a module is the basic unit of code organization. Every file is a module whose name is derived from the filename, but you can also write modules without saving them in a file. For example:
Racket REPL
> (module bt racket (provide (all-defined-out)) (struct leaf () #:prefab) (struct node (v l r) #:prefab) (define (bt-height bt) (match bt [(leaf) 0] [(node _ left right) (+ 1 (max (bt-height left) (bt-height right)))])))
This declares a module named bt. It provides a single value named bt-height.
We can require the module from the REPL to gain access to the modules provided values:
Racket REPL
> (require 'bt) > (bt-height (leaf)) 0
We could have also used the #lang racket shorthand for (module bt racket ...) and saved this in a file called bt.rkt. To import from a file in the current directory, you’d write (require "bt.rkt"). But this doesn’t work well in REPL.
For the most part we will organize our programs into single module files using the #lang racket shorthand. But we will place tests within a “sub”-module, i.e. a module nested inside of the module that contains the code it tests. We will use a special form called module+ which declares a submodule that has access to the enclosing module. Moreover, repeated uses of module+ will add content to the submodule. By convention, we will name the testing submodule test.
So here’s a second version of the bt module with unit tests included (and more code). Note the use of all-defined-out to provide everything:
Racket REPL
> (module bt2 racket ; provides everything defined in module (provide (all-defined-out)) (module+ test (require rackunit)) (struct leaf () #:prefab) (struct node (v l r) #:prefab) (define (bt-empty? bt) (match bt [(leaf) #t] [_ #f])) (module+ test (check-equal? (bt-empty? (leaf)) #t) (check-equal? (bt-empty? (node 3 (node 7 (leaf) (leaf)) (node 9 (leaf) (leaf)))) #f)) (define (bt-height bt) (match bt [(leaf) 0] [(node _ left right) (+ 1 (max (bt-height left) (bt-height right)))])) (module+ test (check-equal? (bt-height (leaf)) 0) ; intentionally wrong test: (check-equal? (bt-height (node 3 (leaf) (leaf))) 2)))
Requiring this module with make bt-height, but it will not run the tests:
Racket REPL
> (require 'bt2)
Running the tests only happens when the test submodule is required:
Racket REPL
> (require (submod 'bt2 test))
--------------------
FAILURE
name: check-equal?
location: eval:67:0
actual: 1
expected: 2
--------------------
Putting it all together, we can write the following code and save it in a file called bt.rkt. (You can right-click the file name and save the code to try it out.)
it’s organized in a module,
data type definitions occur at the top of the file,
it uses a test submodule to group unit tests,
tests occur immediately after the functions they test,
functions are annotated with type signatures and short purpose statements, and
indentation follows standard conventions (which DrRacket can apply for you).
From the command line, you can run a module’s tests using the Racket command line testing tool raco test:
raco test bt.rkt
Or simply give a directory name and test everything within that directory:
raco test .