# Learning Compiler 5

## Note

### The Functionality of the Parser

• Input: sequence of tokens from lexer
• Output: parse tree of the program

### Comparison with Lexical Analysis

Phase Input Output
Lexer String of characters String of tokens
Parser String of token Parse tree

### The Role of the Parser

1. Not all strings of tokens are programs
2. Parser must distinguish between valid and invalid strings of tokens

So we need:

• A language for describing valid strings of tokens
• A method for distinguishing valid from invalid strings of tokens

### Context-Free Grammars(CFG)

• Programming language constructs have recursive structure
• Context-free grammars are a natural notation for the recursive structure

A CFG consists of:

• A set of terminals $T$
• A set of non-terminals $N$
• A start symbol $S$ (a non-terminal)
• A set of productions $X \rightarrow Y_1Y_2\dots Y_n$, where $X \in N$ and $Y_i \in T \cup N \cup \{\epsilon\}$.

### Examples of CFGs

E \rightarrow E * E
| E + E
| (E)
| id

### Key Idea

1. Begin with a string consisting of the start symbol “S”
2. Replace any non-terminal $X$ in the string by a the right-hand side of some production $X \rightarrow Y_1Y_2\dots Y_n$
3. Repeat (2) until there are no non-terminals in the string

### The Language of a CFG

$X \rightarrow Y_1Y_2\dots Y_n$

means $X$ can be replaced by $Y_1Y_2\dots Y_n$

More formally, write

$X_1\dots X_{i-1}X_i X_{i+1}\dots X_n \rightarrow X_1\dots X_{i-1} Y_1\dots Y_m X_{i+1}\dots X_n$

if there is a production

$X_i \rightarrow Y_1Y_2\dots Y_m$

Write

$X_1\dots X_n \rightarrow *Y_1\dots Y_m$

if

$X_1\dots X_n \rightarrow \dots \rightarrow \dots \rightarrow Y_1\dots Y_m$

in 0 or more steps

Let $G$ be a context-free grammer with start symbol $S$. Then the language of $G$ is

$\{a_1\dots a_n | S \rightarrow *a_1\dots a_n \text{and every} a_i \text{is a terminal}\}$

### Terminals

• Terminals are so-called because there are no rules for replacing them
• Once generated, terminals are permanent
• Terminals ought to be tokens of the language

### Notes

The idea of a CFG is a big step. But:

• Membership in a language is “yes” or “no”
• We also need a parse tree of the input
• Must handle errors gracefully
• Need an implementation of CFG’s (e.g. bison)
• Form of the grammar is important
• Many grammars generate the same language
• Tools are sensitive to the grammar
(Tools for regular languages (e.g., flex) are sensitive to the form of the regular expression, but this is rarely a problem in practice)

### Derivations and Parse Trees

A derivation is a sequence of productions

$S \rightarrow \dots \rightarrow \dots \dots$

A derivation can be drawn as a tree

• Start symbol is the tree’s root
• For a production $X \rightarrow Y_1 \dots Y_n$ add children $Y_1 \dots Y_n$ to node $X$

Letf-most and right-most derivations have the same parse tree, the difference is the order in which branches are added.

### Summary of Derivations

We are not just interested in whether s c L(G)

• We need a parse tree for $s \in L(G)$
• A derivation defines a parse tree
• But one parse tree may have many derivations
• Left-most and right-most derivations are important in parser implementation

### Ambiguity

• A grammar is ambiguous if it has more than one parse tree for some string
• Equivalently, there is more than one right-most or left-most derivation for some string
• Leaves meaning of some programs ill-defined

### Dealing with Ambiguity

• There are several ways to handle ambiguity
• Most direct method is to rewrite grammar unambiguously
• Enforces precedence of * over +

cases:

• Ambiguity in Arithmetic Expressions: E -> E + E | E * E | ( E ) | int

• The Dangling Else: E -> if E then E | if E then E else E | OTHER

• Fix: else matches the closest unmatched then
• No general techniques for handling ambiguity

• Impossible to convert automatically an ambiguous grammar to an unambiguous one

• Used with care, ambiguity can simplify the grammar
• Sometimes allows more natural definitions – We need disambiguation mechanisms

### Precedence and Associativity Declarations

• Instead of rewriting the grammar
• Use the more natural (ambiguous) grammar
• Along with disambiguating declarations
• Most tools allow precedence and associativity declarations to disambiguate grammars
• Examples:
• Left associativity declaration: %left +
• Precedence declarations: %left +, %left *

## Reference

Slides: Introduction to Parsing

Video: Compiler Stanford 2014