Lexing(Tokenizing): Converting a string to a list of tokens
Token: A meaningful string
Typically: keywords, identifiers, numbers,
"The short Wizard" \(\Rightarrow\) [Det;Adj;noun]
type token = Int of int | Add | Sub | LParen | RParen;;
tokenize "2 + ( 4 - 5)";;
= > [Int(2); add; LParen; Int(4); sub; Int(5); RParen]
How to Tokenize?
One way: RE and boring repitition
(* take a regexp *)
let re_num = Str.regexp "[0-9]+" in
let re_add = Str.regexp "+" in
let re_sub = Str.regexp "-" in
let rec mklst text =
if text = "" then [] else
if (Str.string_match re_num text 0) then
let matched = Str.matched_string text in
Int(int_of_string matched)::(mklst (String.sub text 1 ((String.length text)-(String.length matched))))
else if (Str.string_match re_add text 0) then
Add::(mklst (String.sub text 1 ((String.length text)-1)))
else if (Str.string_match re_sub text 0) then
Sub::(mklst (String.sub text 1 ((String.length text)-1)))
else (mklst (String.sub text 1 ((String.length text)-1))) in
mklst "2 + 3";;
Parsing: taken list to AST
can checks if text is grammatically correct
Many types of parsers: we will use recursive decent
RDP is top down; Grammar slides showed bottom up
Consider the basic grammar for polish notation
\(E \rightarrow A\vert + A\ E \vert - A\ E\)
\(A \rightarrow 0\vert 1\vert \dots\vert 9\)
\(E \rightarrow A\vert + A\ E \vert - A\ E\)
\(A \rightarrow 0\vert 1\vert \dots\vert 9\)
let parse_toks tokens =
let parse_num tokens =
if tokens = [] then failwith "error" else
let h::t = tokens in
if h = Int(0) then t else
(* ... *)
if h = Int(9) then t else
failwith "error" in
let rec parse-expr tokens =
if tokens = [] then failwith "error" else
let h::t = tokens in
if h = Add then
parse-expr (parse_num t)
else if h = Sub then
parse-expr (parse num t)
else parse_num tokens
in (parse-expr tokens) = [];;
Important: knowing which branch you are looking for
Important: knowing which branch you are looking for
Backtracking vs Predictive
Predictive: whats the next symbol?
First(nt): set of terminals nt represents
Only so good: conflicting first sets
Only so good: conflicting first sets
Converting to AST
Recall a Tree in OCaml
type tree = Leaf|Node of int * Node * Node;;
Node(2,Node(0,Leaf,Leaf),Leaf);;
Modify for Tokens
type expr = Num of int|Plus of expr * expr|Minus of expr * expr;;
(Add(Num 1, Num 2));;