Learning Compiler 5
Note
The Functionality of the Parser
- Input: sequence of tokens from lexer
- Output: parse tree of the program
Comparison with Lexical Analysis
Phase | Input | Output |
---|---|---|
Lexer | String of characters | String of tokens |
Parser | String of token | Parse tree |
The Role of the Parser
- Not all strings of tokens are programs
- Parser must distinguish between valid and invalid strings of tokens
So we need:
- A language for describing valid strings of tokens
- A method for distinguishing valid from invalid strings of tokens
Context-Free Grammars(CFG)
- Programming language constructs have recursive structure
- Context-free grammars are a natural notation for the recursive structure
A CFG consists of:
- A set of terminals $T$
- A set of non-terminals $N$
- A start symbol $S$ (a non-terminal)
- A set of productions $X \rightarrow Y_1Y_2\dots Y_n$, where $X \in N$ and $Y_i \in T \cup N \cup \{\epsilon\}$.
Examples of CFGs
E \rightarrow E * E
| E + E
| (E)
| id
Key Idea
- Begin with a string consisting of the start symbol “S”
- Replace any non-terminal $X$ in the string by a the right-hand side of some production $X \rightarrow Y_1Y_2\dots Y_n$
- Repeat (2) until there are no non-terminals in the string
The Language of a CFG
Read productions as rules:
$X \rightarrow Y_1Y_2\dots Y_n$means $X$ can be replaced by $Y_1Y_2\dots Y_n$
More formally, write
$X_1\dots X_{i-1}X_i X_{i+1}\dots X_n \rightarrow X_1\dots X_{i-1} Y_1\dots Y_m X_{i+1}\dots X_n$if there is a production
$X_i \rightarrow Y_1Y_2\dots Y_m$Write
$X_1\dots X_n \rightarrow *Y_1\dots Y_m$if
$X_1\dots X_n \rightarrow \dots \rightarrow \dots \rightarrow Y_1\dots Y_m$in 0 or more steps
Let $G$ be a context-free grammer with start symbol $S$. Then the language of $G$ is
$\{a_1\dots a_n | S \rightarrow *a_1\dots a_n \text{and every} a_i \text{is a terminal}\}$Terminals
- Terminals are so-called because there are no rules for replacing them
- Once generated, terminals are permanent
- Terminals ought to be tokens of the language
Notes
The idea of a CFG is a big step. But:
- Membership in a language is “yes” or “no”
- We also need a parse tree of the input
- Must handle errors gracefully
- Need an implementation of CFG’s (e.g. bison)
- Form of the grammar is important
- Many grammars generate the same language
- Tools are sensitive to the grammar
(Tools for regular languages (e.g., flex) are sensitive to the form of the regular expression, but this is rarely a problem in practice)
Derivations and Parse Trees
A derivation is a sequence of productions
$S \rightarrow \dots \rightarrow \dots \dots$A derivation can be drawn as a tree
- Start symbol is the tree’s root
- For a production $X \rightarrow Y_1 \dots Y_n$ add children $Y_1 \dots Y_n$ to node $X$
Letf-most and right-most derivations have the same parse tree, the difference is the order in which branches are added.
Summary of Derivations
We are not just interested in whether s c L(G)
- We need a parse tree for $s \in L(G)$
- A derivation defines a parse tree
- But one parse tree may have many derivations
- Left-most and right-most derivations are important in parser implementation
Ambiguity
- A grammar is ambiguous if it has more than one parse tree for some string
- Equivalently, there is more than one right-most or left-most derivation for some string
- Ambiguity is BAD
- Leaves meaning of some programs ill-defined
Dealing with Ambiguity
- There are several ways to handle ambiguity
- Most direct method is to rewrite grammar unambiguously
- Enforces precedence of * over +
cases:
Ambiguity in Arithmetic Expressions: E -> E + E | E * E | ( E ) | int
The Dangling Else: E -> if E then E | if E then E else E | OTHER
- Fix: else matches the closest unmatched then
No general techniques for handling ambiguity
Impossible to convert automatically an ambiguous grammar to an unambiguous one
- Used with care, ambiguity can simplify the grammar
- Sometimes allows more natural definitions – We need disambiguation mechanisms
Precedence and Associativity Declarations
- Instead of rewriting the grammar
- Use the more natural (ambiguous) grammar
- Along with disambiguating declarations
- Most tools allow precedence and associativity declarations to disambiguate grammars
- Examples:
- Left associativity declaration: %left +
- Precedence declarations: %left +, %left *
- Examples:
Reference
Slides: Introduction to Parsing
Video: Compiler Stanford 2014