Revert to the (unambiguous) natural grammar for our example
E > T + E  T T > int * T  int  (E)Consider the string
int * int + int
Bottomup parsing reduces a string to the start symbol by inverting productions
int * int + int int * T + int T > int int * T + int T + int T > int * T T + int T + T T > int T + T T + E E > T T + E E E > T + E EReading the productions bottomtotop, these are the productions.
What's the point? The productions read backwards, trace a rightmost derivation, i.e., we are producing the rightmost nonterminal at each step
Important fact #1 about bottomup parsing
Bottomup parsing traces a rightmost derivation in reverse
Picture
int * int + int     T  \ /  T   T \ / \ / \ / E
abw
be a step of the bottomup parse
X> b
w
is a string of terminals
aXw > abw
is a step in a rightmost derivation
Idea: split string into two substrings
aXw
A > xy
is a production, then
Cbxyijk => CbAijk
Example : int * int + int
Assume an oracle tells us whether to shift of reduce
int * int + int shift int * int + int shift int * int + int shift int * int + int shift int * T + int reduce T  + int reduce T + int shift T + int shift T + T reduce T + E reduce E reduceLeft string can be implemented by a stack, because we only reduce the suffix of the left string. Top of the stack is 
shift
pushes a terminal on the stack
reduce
pops off of the stack (production rhs) and pushes a nonterminal on the stack (production lhs)
In a given state, more than one action (shift or reduce) may lead to a valid parse.
If it is legal to shift or reduce, there is a shiftreduce conflict. Need some techniques to remove them
If it is legal to reduce using two productions, there is a reducereduce conflict. Are always bad and usually indicate a serious problem with the grammar.
In either case, the parser does not know what to do, and we either need to rewrite the grammar, or need to give the parser a hint on what to do in this situation.
Example : int * int + int
Assume an oracle tells us whether to shift of reduce
int * int + int shift int * int + int shift At this point, we could reduce. T * int + int reduce But this would be a fatal mistake because there is no way we can reduce to E (there is no production that begins with T *).Handles are never to the left of the rightmost nonterminal, and are always at the top of the stack.Intuition: want to reduce only if the result can still be reduced to the start symbol (E)
Assume a rightmost derivation:
S >* \alpha X \omega > \alpha \beta \omegaThen\alpha \beta
is a handle of\alpha \beta \omega
Handles formalize the intuition: A handle is a reduction that allows further reductions back to the start symbol.
We only want to reduce at handles.
Note: we have said what a handle is, not how to find handles.
Important fact #2: In shiftreduce parsing, handles appear only at the top of the stack, never inside. Proof by induction
 True initially, stack is empty
 Immediately after reducing a handle
 rightmost nonterminal on top of stack
 next handle must be to right of rightmost nonterminal, because this is a rightmost derivation.
 i.e., sequence of shift moves reaches the next handle
Hence shiftreduce parsers can only shift or reduce at the top of the stack. However, how to recognize handles, i.e., when to shift and when to reduce?
Heuristics to identify handles. On certain types of CFGs, heuristics are always correct.
Venn diagram: All CFGs \superset Unambiguous CFGs \superset LR(k) CFGs \superset LALR(k) CFGs \superset SLR(k) \superset LR(0)
LR(k) are quite general, but most practical grammers are LALR(k). SLR(k) are simplifications over LALR(k) [simple LR grammars]. There are grammars that are SLR(k) but not LALR(k) and so on.
\alpha is a viable prefix if there is a valid rightmost derivation of the string: S' > ... > \alpha\omega > ... > string. e.g., \epsilon is a viable prefix
In other words, \alpha is a viable prefix if there is an \omega such that \alpha\omega is a state of a shiftreduce parser. Here \alpha is the stack and \omega is the rest of the input. It can lookahead at \omega, but the parser does not know the whole think. The parser knows the whole stack.
What does this mean? A viable prefix does not extend past the right end of the handle. It's a viable prefix because it is a prefix of the handle. As long as a parser has viable prefixes on the stack, no parsing error has been detected.
An item is a production with a "." somewhere on the RHS in the production:
e.g., the items for T > (E)
are: T > .(E)
, T > (.E)
, T > (E.)
, T > (E).
.
The only item for an \epsilon production, X > \epsilon
is X > .
. Items are often called "LR(0) items".
The problem in recognizing viable prefixes is that the stack has only bits and pieces of the RHS of productions
Consider the input (int)
for the grammar:
E > T + E  T T > int * T  int  (E)
(E)
is a state of a shiftreduce parse
(E
is a prefix of the RHS of T > (E)
T > (E.)
says that so far we have seen (E
of this production and hope to see )
The stack may have many prefixes on RHS's:
Prefix1 Prefix2 Prefix3 ... Prefix(n1) Prefix(n)Let
Prefix(i)
be a prefix of RHS of Xi > \alpha_i
Prefix(i)
will eventually reduce to X(i)
X(i1) > Prefix(i1)Xi\beta
production
X(i2) > Prefix(i2)X(i1)\gamma
production.
Prefix(k+1) ... Prefix(n1) Prefix(n)
eventually reduces to the missing part of \alpha(k)
Important fact #3 about bottomup parsing: For any grammar, the set of viable prefixes is a regular language. The regular language represents the language formed by concatenating 0 or more prefixes of the productions (items).
For example, the language of viable prefixes for the example grammar:
S > \epsilon  [S]is
\epsilon  "["*  "["*S  "["*S"]"
The language of viable prefixes for the example grammar:
S > \epsilon  [S]  S.Sis
X = \epsilon  "["  "["*  "["*S  "["*S"]" Y = "["*S.(X  Y)* Z = X + Y
Conversely, In a string is parseable through the bottomup parser, then every potential state of the stack should always be a viable prefix.
Consider the string (int * int)
:
int *int)
is a state of a shiftreduce parse
T > (E)
E > T
. This is an interesting case as we are considering "\epsilon" as a prefix too!
T > int * T
T > (.E) E > .T T > int * .Tsays that
T > (E)
E > T
T > int * T
In other words, every viable prefix can be represented as a stack of items, where the (n+1)th item is a production for a nonterminal that follows the "." in the nth item.
To recognize viable prefixes, we must
S' > S
to G
S'
) is used only in one place which is the LHS of this production
NFA(stack) = yes if stack is a viable prefix = no otherwise
S' > S
E > \alpha.X\beta
, add transition from
(E > \alpha.X\beta) X> (E > \alphaX.\beta)
E > \alpha.X\beta
and production X > \gamma
, add
(E > \alpha.X\beta) \epsilon> (X > .\gamma)
X > \beta.\gamma
is valid for a viable prefix \alpha\beta
if
S' >* \alpha X \omega > \alpha \beta \gamma \omegaby a rightmost derivation.
After parsing \alpha \beta
, the valid items are the possible tops of the stack of items
An item I
is valid for a viable prefix \alpha
if the DFA recognizing viable prefixes terminates on input \alpha
in a state s
containing I
.
The items in s
describe what the top of the item stack might be after reading input \alpha
An item is often valid for many prefixes. e.g., The item T > (.E)
is valid for prefixes
( (( ((( (((( ...We can see this by looking at the DFA, which will keep looping into the same DFA state for each open paren.
Need to show the NFA and DFA construction for our example grammar, and the valid items for these string prefixes. Will need a laptop and a projector!
t
s
X > \beta
if
s
contains item X > \beta
s
contains item X > \beta.t\omega
s
has a transition labeled t
X > \beta.
and Y > \omega.
X > \beta.
and Y > \omega.t\delta
SLR = "Simple LR": improves on LR(0) shift/reduce heuristics so fewer state have conflicts
Idea: Assume
t
s
X > \beta
if
X > \beta
t \in Follow(X) [only change to LR(0)]
Shift if

s
contains item X > \beta.t\omega
If there are conflicts under these rules, the grammar is not SLR
 In other words, SLR grammars are those where the heuristics detect exactly the handles
 SLR(1) grammars work with the SLR algorithm and one lookahead
 Lots of grammars are not SLR
 including all ambiguous grammars
 We can parse more grammars by using precedence declarations
 Instructions for resolving conflicts
 Consider the ambiguous grammar:
E > E+EE*E(E)int
The DFA for this grammar contains a state with the following items:

E>E*E.
and E>E.+E
 shift/reduce conflict!
Declaring "* has higher precedence than +" resolves this conflict in favor of reducing
 The term "precedence declaration" is misleading
 These declarations do not define precedence; they define conflict resolutions
 Not quite the same thing!
 Tools allow you to print out the parsing automaton, and you will be able to see the conflict resolution in the automaton (recommended)
SLR Parsing algorithm
 Let
M
be DFA for viable prefixes of G
 Let
x1...xn$
be initial configuration
 Repeat until configuration is
S$
 Let
\alpha\omega
be current configuration
 Run
M
on current stack \alpha
 If
M
rejects \alpha
, report parsing error
 Stack
\alpha
is not a viable prefix
 If
M
accepts \alpha
with items I
, let a
be next input
Shift if X > \beta.a\gamma \in I
Reduce if X > \beta. \in I
and a \in Follow(X)
Report parsing error if neither applies
If there is a conflict in the last step, grammar is not SLR(k). k is the amount of lookahead (in practice, k = 1)
SLR Improvements
 Rerunning the viable prefixes automaton on the stack at each step is wasteful
 Most of the work is repeated
 Remember the state of the automaton on each prefix of the stack
 Change stack to contain pairs
<Symbol, DFA state>
 For a stack:
<sym1,state1>...<symn,staten>
staten
is the final state of the DFA on sym1..symn
 Detail: the bottom of the stack is
<any,start>
where

any
is any dummy symbol

start
is the start state of the DFA
 Define
goto[i,A] = j
if statei >A statej
goto
is just the transition function of the DFA
Shift x
 Push <a,x> on the stack
a
is current input
x
is a DFA state
Reduce X > \alpha
 As before
 Accept
 Error
Action table: For each state si
and terminal a
 If
si
has item X > \alpha.a\beta
and goto[i,a] = j
, then action[i,a] = shift j
 If
si
has item X > \alpha.
and a \in Follow(X)
and X != S'
, then action[i,a] = reduce X>\alpha
 If
si
has item S'>S.
, then action[i,$] = accept
Otherwise, action[i,a]=error
SLR Improvements
 Let I = w$ be initial input
 Let j = 0
 Let DFA state 1 have item
S'>.S
 Let stack = <dummy, 1>
 repeat
case action[top_state(stack),I[j]] of
 shift k: push < I[j++], k >
 reduce X>A:
 pop
A
pairs
 push
<X,goto[top_state(stack),X]>
 accept: halt normally
 error: halt and report error
 Note that the algorithm uses only the DFA states and the input
 The stack symbols are never used!
 However, we still need the symbols for semantic actions (e.g., code generation)
 Some common constructs are not SLR(1)
 LR(1) is more powerful
 Build lookahead into the items (finegrained disambiguation for each item, rather than just checking the follow set)
 An LR(1) item is a pair: LR(0) item, lookahead
[T > .int *T,$]
means
 After seeing
T> int * T
, reduce if lookahead is $
 More accurate than just using follow sets
 Take a look at the LALR(1) automaton for your parser! (uses these items, but has a slight optimization over it)
 LALR automaton is composed of states containing LR(1) items, with the added optimization that if two states differ only in lookahead then we combine those states. If the resulting states (after this minimization), have no conflict in the grammar, then that grammar is LALR also.
SLR Examples
S' > S
S > Sa
S > b
SLR parsers do not mind leftrecursive grammars
The first state in the corresponding DFA will look like (\epsilon closure of the NFA state): (state 1)
S' > .S
S > .Sa
S > .b
If we see a b
in this state, we get another DFA state: (state 2)
S > b.
Alternatively, if we see a S
in this state, we get another DFA state: (state 3)
S' > S.
S > S.a
From this state, if we see a
, we get (state 4)
S > Sa.
The only state with a shiftreduce conflict is state3. Here, if we look at follow of S', we have only "$", and hence we can resolve this conflict by one lookahead. Hence this is an SLR grammar
Another example grammar
S' > S
S > SaS
S > b
Looking at the corresponding DFA: (state 1)
S' > .S
S' > .SaS
S > .b
One possibility is that we see b
in this state to get: (state 2)
S > b.
Another possibility is that we see S
in this state to get: (state 3)
S' > S.
S' > S.aS
If we get a
in this state, we get (state 4)
S' > Sa.S
S > .SaS
S > .b
(notice that we formed an \epsilonclosure of the first item to add more items
From here, if we get S
, we get the following state: (state 5)
S > SaS.
S > S.aS
From here, if we get a
again, we go back to state 4! If we get b
, we go to state 2
The only states that have conflicts are: state3 (resolved by follow as follow(S') = $) and state5 (has a shift/reduce conflict because a
\in follow(S))
Thus this is not an SLR(1) grammar