### Lazy code motion

Loop invariant operations

for (i = 0; i < n; i++) {
A[i] = m + n;
}

to (loop invariant code motion)
t = m + n;
for (i = 0; i < n; i++) {
A[i] = t;
}


Partial Redundancy Elimination (aka lazy code motion)

Subsumes

• Common subexpression elimination
• Loop invariant code motion
• plus new capability
BB1->BB2
BB1->BB3
BB2->BB4
BB3->BB4

BB2: a = b + c
BB3: b = 7
BB4: d = b + c

can be changed to
BB1->BB2
BB1->BB3
BB2->BB4
BB3->BB4

BB2: t = b + c
a = t
BB3: b = 7
t = b + c
BB3: d = t

Can we place calculations of b+c such that no path re-executes the same expression? Algorithm based on PLDI92 paper by Knoop et. al. on "Lazy code motion"
• Propose separate forward and backward passes.
• Problem: critical edges
BB1 --> BB3
BB2 --> BB3
BB2 --> BB4

BB1: c = a+b
BB3: d = a+b

A critical edge:
• Source basic block has multiple successors
• Destination basic block has multiple predecessors
• Introduce a new basic block to eliminate critical edges:
BB1 --> BB3
BB2 --> newBB
BB2 --> BB4
newBB --> BB4

BB1: t = a+b
c = t
newBB: t = a+b
BB3: d = t


Algorithm Prelims:

• Assume every statement is a basic block
• We will only place statements at the beginning of a basic block
• Add a basic block for every edge that leads to a basic block with multiple predecessors

Algorithm Overview:

• Determine where the expressions are to be computed
• Consists of four data-flow problems:
• Optimal computation (passes 1 and 2)
• No calculation of expression that is not used, along any execution path.
• No redundancy along any execution path.
• Operations are executed at the latest possible time to minimize register usage (pass 3)
• No unnecessary copy assignments (pass 4)

### Some examples

Example1:

BB1-->BB2
BB1-->BB3
BB1-->BB4
BB2-->BB5
BB3-->BB5
BB4-->BB6

BB2: x = a + b
BB5: y = a + b
BB4: a = 10
BB6: z = a + b

change to
BB1-->BB2
BB1-->BB3
BB1-->BB4
BB2-->BB5
BB3-->BB5
BB4-->BB6

BB2: x = a + b
BB3: t = a + b
BB5: y = t
BB4: a = 10
t = a + b
BB6: z = t

Notice that we put t = a+b where a+b is "needed", i.e., it is needed on all paths in future

Example 2

BB1-->BB2
BB2-->BB3
BB3-->BB2
BB3-->BB4

BB2: x = a + b

change to
BB1-->BB2
BB2-->BB3
BB3-->BB2
BB3-->BB4

BB1: t = a+b
BB2: x = t

Again, t=a+b is "needed" on all paths in future

Example 3

BB1-->BB2
BB2-->BB3
BB3-->BB4
BB4-->BB5
BB4-->BB6
BB5-->BB3
BB6-->BB3

BB1: x = a+b
BB4: y=a+b
BB6: a=10

change to
BB1-->BB2
BB2-->BB3
BB3-->BB4
BB4-->BB5
BB4-->BB6
BB5-->BB3
BB6-->newBB
newBB-->BB3

BB1: t = a+b
x=t
BB4: y=t
BB6: a=10
newBB: t=a+b


Example 4 (not all redundancy can be eliminated)

BB1-->BB2
BB2-->BB3
BB3-->BB4
BB4-->BB5
BB5-->BB3
BB3-->BB6
BB6-->exit

BB1: x=a+b
BB5: y=a+b
BB6: a=10

change to
BB1-->BB2
BB2-->BB3
BB3-->BB4
BB4-->BB5
BB3-->BB6
BB6-->newBB
newBB-->BB3
BB6-->exit

BB1: t=a+b
x=t
BB5: y=t
BB6: a=10
newBB: t=a+b

There is some redundancy in the transformed code which is unavoidable. Notice that the path from newBB to exit computes a+b even though it was not required. The other option is:
BB1-->BB2
BB2-->BB3
BB3-->BB4
BB4-->BB5
BB5-->BB3
BB3-->BB6
BB6-->exit

BB1: t=a+b
x=t
BB5: t = a+b
y=t
BB6: a=10

But now a+b is computed twice on the path B2->B3->B4->B5 (this is the solution that our analysis will yield).

### Pass 1: Needed expressions

Data-flow analysis
• Domain of values: sets of expressions
• Direction: backward
• Meet operator: intersection. Dictates the transfer function on successor-predecessor edges
• Transfer function across an instruction: F(x) = EUse \union (x - EKill). Euse is the set of expressions used by the instruction (e.g., {a+b} for x=a+b). Ekill is the set of expressions killed by this expression, e.g., all expressions containing x are killed by x=a+b.
• Boundary condition: Needed(exit,in)=empty.
• Initialization: for all b (not exit), in[b] = set of all expressions
Notice that while we are doing all expressions at once, we could also have done one expression at a time. In which case, my domain of values would simply be "needed" (definitely-needed) or "not-needed" (can't be sure if needed or not).

Proposal 1:

• Place the expression at the frontier of "needed expressions" (between "not needed" and "needed")
• But does it eliminate all redundancies? (e.g., may retain some redundancies)
• And is that the best placement? (e.g., may be too early)
Some examples where proposal1 will fail:

Example1

BB1->BB2
BB2->BB3
BB3->BB4
BB3->exit

BB1: x=a+b
BB4: y=a+b

Now, a+b is needed at BB4 entry but not needed at BB3 exit. Yet computing t=a+b on the edge BB3->BB4 is redundant because the expression a+b is already available (or not missing) at BB4.

Example2

BB1->BB2
BB1->BB3
BB1->BB4
BB2->BB5
BB3->BB5

BB4: a=10
BB5: y=a+b

Here, the frontier occurs at the end of BB1 but that is too early to compute t=a+b as we will take up an extra register for a long time unnecessarily!

Example3 (a more convincing example of why you should not compute a value too early)

BB1->BB2
BB1->BB3
BB2->BB4
BB4->BB2
BB4->BB5
BB5->BB6
BB3->BB7
BB7->BB6

BB3: a=b+c
BB6: y=b+c

The frontier is at the end of BB1 but that would eat-up a register for the entire duration of the BB2-BB4 loop which seems too wasteful.

### Pass 2: Missing Expressions

• Assume: an expression will be calculated at the earliest point it is needed.
• Forward pass: Missing expression
• An expression is missing at p if its value has not been "needed" by any basic block along SOME path reaching p. Will solve the redundancy problem with the first example.
• Domain of values: sets of expressions
• Direction: forward
• Transfer function at instruction b: f(x) = EKill \union (x - Needed.in[b]). It is missing if it has been killed. If it is needed then it is not missing.
• Meet = union
• Boundary condition: out[entry] = set of all expressions.
• Initialization for all other b (not entry): out[b] = empty set.

### Proposal 2

Place the expression at the earliest point we need it and is missing

• earliest(b)=needed.in[b] \intersect missing.in[b]
• Algorithm: for all basic block b, if x+y \in earliest[b], at beginning of b, create a new variable t, such that t=x+y, and replace every original x+y by t.
• But it may be too early! Solves example1 but the problems with example2 and example3 remain.

### Postponed expressions

An expression e is postponed at a program point p if: for all paths leading to p, we have seen the earliest placement of e but not a subsequent use. Take the third example again:
BB1->BB2
BB1->BB3
BB2->BB4
BB4->BB2
BB4->BB5
BB5->BB6
BB3->BB7
BB7->BB6

BB3: a=b+c
BB6: y=b+c

Here, b+c is postponed at the beginning of BB3 and on the edge BB5->BB6.
• Domain of values: sets of expressions
• Direction: forward
• Transfer function for instruction b: f(x)=(earliest[b] \union x) - EUse
• Meet : Intersection
• Boundary condition: out[entry] = empty set
• Initialization for all b (not entry): out[b] = set of all expressions
Latest: frontier at the end of "postponed" wave.
• latest[b] = (earliest[b] \union postponed.in[b]) \intersection (EUse \union ExprsForWhichSomeSuccessorIsNotPostponedOrEarliest)
• ExprsForWhichSomeSuccessorIsNotThePostponedOrEarliest = \union_over_successors (\not earliest[b] \and \not postponed.in[b]).
This will solve the issues with Example2 and Example3

### Pass 4: Cleaning up

BB1->BB2
BB2->BB3

BB1: x=a+b

No need to generate temporaries in this case. But our algorithm will. Do a liveness analysis to see if the expression is used later (without an intervening latest as a latest indicates recomputation).
• Domain of values: set of expressions
• Direction: backward
• Transfer function: f(x) = (EUse \union x) - latest[b]
• Meet operator: union
• Boundary condition: in[exit] = empty set
• Initialization: in[b] = empty set

### Lazy code motion (or partial-redundancy elimination) algorithm

• Backward pass to compute needed expressions
• Forward pass to compute missing expressions
• Earliest = intersection of needed and missing
• Forward pass using earliest placement to compute (can be)-postponed expressions
• Latest = frontier of (can be)-postponed expressions
• Backward pass to eliminate unused temporary variable assignments
• For all basic blocks b: if x+y \in (latest[b] \intersect used.out[b]):
• at beginning of b: add new t=x+y
• replace original x+y by t only if:
• x+y \in (EUse \intersect \not(latest[b] \insersect \not(used.out[b])))
• If it is in latest[b] and not in used.out[b], this use of the expression does not need to be (and has not been) replaced by t, so let it be as it is.