Lecture 10 Syntax-Directed Translation grammar disambiguation, Earley parser, syntax-directed translation Ras Bodik Shaon Barman Thibaud Hottelier Hack Your Language! CS164: Introduction to Programming Languages and Compilers, Spring 2012 UC Berkeley 1

Hidden slides This slide deck contains hidden slides that may help in studying the material. These slides show up in the exported pdf file but when you view the ppt file in Slide Show mode. 2 Today Ambiguous grammars pros and cons Grammar disambiguation select desired parse trees without rewriting the grammar Grammar rewriting remove undesirable parse trees by changing the gramamar

Syntax-directed translation its an evaluation / rewrite of the parse tree 3 Ambiguous Grammars One parse tree only! The role of the grammar distinguish between syntactically legal and illegal programs But thats not enough: it must also define a parse tree the parse tree conveys the meaning of the program associativity: left or right precedence: * before +

What if a string is parseable with multiple 5 How many parse trees are in this CYK graph? grammar: E id | E + E | E * E input: id+id*id E11 E9 * E8 E11 E6 + E10 E9 E 6 + E7 E6 id1 id1 ambiguous E10 E7 * E8 E8 id5

E7 id3 + id3 * id5 6 We can see the two trees better here E11 E9 * E8 E9 E 6 + E7 E6 id1 id1 E8 id5 E7 id3

+ id3 * id5 E10 E7 * E8 E11 E6 + E10 7 Ambiguity (Cont.) Ambiguity is bad Leaves meaning of some programs ill-defined Ambiguity is common in programming languages Arithmetic expressions

if E then if E then E else E 9 Ambiguity in expression grammars Grammar E E + E | E * E | ( E ) | int Strings int + int + int int * int + int 10 Ambiguity. Example This string has two parse trees E

E int E E + E E + E in t in E + E t

int int int + is left-associative + E 11 Ambiguity. Example This string has two parse trees E E int E

E + E E * E in t in E + E t int int int

* E * has higher precedence than + 12 Dealing with Ambiguity No general (automatic) way to handle ambiguity Impossible to convert automatically an ambiguous grammar to an unambiguous one (we must state which tree desired) Used with care, ambiguity can simplify the grammar Sometimes allows more natural definitions We need disambiguation mechanisms

There are two ways to remove ambiguity: 1) Declare to the parser which productions to prefer works on most but not all ambiguities 13 Summary G is ambiguous iff there is any s in L(G) such that s has multiple parse trees In expression grammars (such as E+E | E*E), ambiguity shows up as - precedence: does + or * have higher priority? - associaitivity: is 1-2-3 parsed as (1-2)-3 or 1-(23)? 14

Test yourself This question will come up in PA5. Work out the CYK graph for the input id+id*id+id Notice there are multiple ambiguous edges ie, edges inserted by/due to multiple productions How many edges are in the CYK graph? - polynomial? N2 or N3? Exponential? How many parse trees are represented by CYK graph? 15 Disambiguation with precedence

and associativity declarations 16 Precedence and Associativity Declarations Instead of rewriting the grammar Use the more natural (ambiguous) grammar Along with disambiguating declarations Bottom-up parsers like CYK and Earley allow declaration to disambiguate grammars you will implement those in PA5 Examples 17 Associativity Declarations Consider the grammar

E E + E | int Ambiguous: two parse trees of int + int + int E E E int + + E int E E int E +

int E int E + E int Left-associativity declaration: %left + 18 Precedence Declarations Consider the grammar E E + E | E * E | int And the string int + int * int E E E int

* + E E int E E + E int E * int

Precedence declarations: int E int %left + %left * Implementing disambiguation declarations To disambiguate, we need to answer these questions: Assume we reduced the input to E+E*E. Now do we want parse tree (E+E)*E or E+ (E*E)?

Similarly, given E+E+E, do we want parse tree (E+E)+E or E+(E+E)? 21 Example id1 + id3 * id5 22 22 Implementing the declarations in CYK/

Earley precedence declarations when multiple productions compete for being a child in the parse tree, select the one with least precedence left associativity when multiple productions compete for being a child in the parse tree, select the one with largest left subtree Where is ambiguity manifested in CYK? for i=0,N-1 do enqueue( (i,i+1,input[i]) ) -- create terminal edges while queue not empty do (j,k,B)=dequeue() for each edge (i,j,A) do -- for each edge left-adjacent to

(j,k,B) for each rule TAB do if edge (i,k,T) does not exists then add (i,k,T) to graph enqueue( (i,k,T) ) else -- Edge (i,k,T) already exists, hence potential ambiguity: -- Edges (i,j,A)(j,k,B) may be another way to reduce to (i,k,T). -- That is, they may be the desired child of (i,k,T) 26 in the parse tree. %dprec: another declaration Another disambiguating declaration (see bison) E if E then E

| if E then E | OTHER else E % dprec 1 % dprec 2 Without %dprec, wed have to rewrite the grammar: E MIF | UIF -- all then are matched -- some then are unmatched

MIF if E then MIF else MIF | OTHER UIF if E then E | if E then MIF else UIF 28 Rewriting Rewrite the grammar into a unambiguous grammar While describing the same language and eliminating undesirable prase trees Example: Rewrite E E + E | E * E | ( E ) | int

into EE+T | T T T * int | int | ( E ) Draw a few parse trees and you will see that new grammar enforces precedence of * over + enforces left-associativity of + and * 31 Parse tree with the new grammar The int * int + int has ony one parse tree now E E + T

T * int E T E in t in E + E t int int *

E int 32 Syntax-directed translation evaluate the parse (to produce a value, AST, ) 40 Example grammar in CS164 E -> E '+' T | T ; T -> T '*' F | F ; F -> /[0-9]+/

| '(' E ')' ; 41 Build a parse tree for 10+2*3, and evaluate 42 Same SDT in the notation of the cs164 parser Syntax-directed translation for evaluating an expression %% E -> %} | ;

T -> }% | ; F -> E '+' T %{ return n1.val + n3.val T %{ return n1.val %} T '*' F %{ return n1.val * n3.val F

/[0-9]+/ %{ return int(n1.val) }% 43 Build AST for a regular expression %ignore /\n+/ %% // A regular expression grammar in the 164 parser R -> 'a' | R '*' | R R n2.val) %} | R '|' R n3.val) %} | '(' R ')' ;

%{ return n1.val %} %{ return ('*', n1.val) %} %{ return ('.', n1.val, %{ return ('|', n1.val, %{ return n2.val %} 44