An example loop transformation: loop inversion
t = m + n;
for (i = 0; i < n; i++) {
A[i] = t;
}
to
t = m + n;
if (n > 0) {
i = 0;
do {
A[i] = t;
i++;
} while (i < n);
}
This can be thought-of as a special case of loop peeling (peeling of the first few iterations in the loop header). Here, we peel-off the check of the first iteration. It happens to be a very common optimization in modern compilers as it can be applied to any loop.
Optimizations that make a loop faster are usually very consequential and so much effort has gone into this.
What is a loop?
Goal:
- Define a loop in graph-theoretic terms (control-flow graph)
- Not sensitive to input syntax: a uniform treatment of all loops (e.g., for, while, repeat-until, do-while, etc.).
Informally, a natural loop has
- edges that form at least a cycle.
- a single entry point.
Example:
BB1->BB2
BB1->BB3
BB2->BB4
BB3->BB4
BB4->BB5
BB5->BB4
BB5->BB2
While BB4-BB5 are a natural loop, BB2-BB4-BB5 are not a natural loop (BB3 can jump into the middle of this loop).
Most loops in common programs are natural loops.
Dominators
- Node
d
dominates node n
in a graph (d dom n
) if:
- Every path from the start node to
n
goes through d
.
- A node dominates itself.
- Immediate dominance:
d idom n
: d dom n
, d != n
, \not_exists m, s.t., d dom m and m dom n
.
- Immediate dominance relationships form a tree.
Definition of Natural loops
- A single entry point: header(d): a header dominates all nodes in the loop.
- A back edge (n -> d) in a flow graph is an edge whose destination dominates its source (
d dom n
)
- The natural loop of a back edge (n->d) is
d
+ {nodes that can reach n
without going through d
}
.
Algorithm to find natural loops
- Find the dominator relations in a flow graph.
- Identify the back edges.
- Find the natural loop associated with the back edge.
Finding dominators
A node d
dominates n
in a graph (d dom n
) if every path from the start node to
n
goes through d
.
- Domain of values: all sets of basic blocks.
- Direction: forward
- Meet operator: intersection
- Transfer function: f(x) = x \union b
- Boundary condition: out[entry] = {start}
- Initialization to Top for all non-entry blocks: out[b] = set of all blocks
- Bottom: empty set
Can do in one reverse postorder pass.
Finding back edges
Any edge whose destination dominate source.
Constructing natural loops
Remove d
from the graph, find all predecessors of n
.
Inner loops
- If two loops do not have the same header:
- They are either disjoint, or
- One is entirely contained (nested within) the other
- Inner loop: the one that contains no other loop.
- If two loops share the same header:
- Hard to tell which is the inner loop
- Combine as one.
Induction variables
Example
BB1->BB2
BB2->BB2
BB2->BB3
BB1: a=0
BB2: a = a+1
- Summarize the loop body with a transfer function fb: m' = fb(m)
m'(a)=m(a)+1
; m'(x)=m(x)
for x != a
- Summarize the effect after i iterations: m' = fb^i(m)
m'(a)=m(a)+i
; m'(x)=m(x)
for x != a
- Can now execute each iteration independently (e.g., in parallel).
- Summarize the effect of the entire loop with n iterations: m' = fb^n(m)
- If n is a constant:
m'(a)=m(a)+n
; m'(x)=m(x)
for x != a
- If n is unknown:
m'(a)=unknown
; m'(x)=m(x)
for x != a
- Can now execute each iteration independently (e.g., in parallel).
- Closure of a function: direct evaluation of i applications of a function.
Region-based analysis
- A region: a group of nodes with a single entry and possibly multiple exits
- Assume a reducible flow graph
- Find natural loops: each defines:
- loop body (acyclic graph of sub-regions)
- loop (one sub-region and back edges)
- Bottom-up analysis: compute transfer function from entry to all exits for each sub-region.
- Top-down analysis: Apply the summary transfer functions (for each sub-region) to compute data flow values of the entire region with its loop's back-edges.
Example: a semi-lattice of transfer functions
- Example: reaching definition:
fi(x) = geni \union (x - killi)
.
- o : composition of transfer functions (for paths):
(f2 o f1)(x) = gen2 \union ((gen1 \union (x-kill1)) - kill2)
.
- ^ : meet of transfer functions (for confluence points): union:
(f1 ^ f2)(x) = (gen1 \union (x-kill1)) \union (gen2 \union (x-kill2))
- *(kleene star) operator: closure of transfer functions (for cycles): f* = meet over f^0, f^1, f^2, ..., f^n for n->\infinity. f^0 is the identity function.
- In this example: f^2(x) = f(x)
- f*(x) = x \union f(x) = gen \union x
Bottom-up analysis
- Acyclic region: find transfer function to end of every subregion. Apply composition and meet of transfer functions based on paths.
- Cyclic region:
- After (i-1)st complete iterations
- fj: transfer function from loop header to back-edge source.
- fj^(i-1)
- For each loop exit
e
:
- fe: transfer function from loop head to exit
- After i iterations: fe o fj^(i-1)
- After an unknown number of iterations: fe o fj*
Top-down analysis
- Given boundary value at the beginning of program: in[entry]
- Acyclic region R, for each subregion S:
- out[S] = f(in[R]), where f is the transfer function from in[R] to the exit of S
- in[S] = in[R] if S is the header of region R
- in[S] = \meet (out[p]) for all predecessors p of S (if S is not the header of region R)
- Cyclic region R, with one subregion S:
- in[S] = fj*(in[R]), where fj is the transfer function from in[R] to back-edge source
Region-based analysis broken down into bottom-up and top-down analysis is useful for more
precise analysis by passing appropriate summaries from bottom-up analysis to top-down analysis.
For example, if we have two functions f and g (callers), calling the same function h (callee).
Using a summary callee function for h allows us to avoid flow of information from f to g and g to f.
In this case, the call to h can be summarized as a transfer function with one entry and one exit.
Induction variables
Example1
for (j = ...) {
//A[j] = 0
t1 = 4*j
t2 = &A
t3 = t1+t2
*t3 = 0
}
//t1 = 4j; t3 = &A+4j in iteration j
to
t3 = &A
for (j = ...) {
*t3=0
t3 = t3+4
}
Example2
loop:
b = a
a = a + 1
c = a + b
d = r + b
e = f
f = read()
g = 2 * f
a = a + 1
p = p + 1
q = p + a
Let m
be a map of var->val at start of loop body.
At the end of each statement:
b = m(a)
a = m(a) + 1
c = 2*m(a) + 1
d = m(r) + m(a)
e = m(f)
f = unknown
g = unknown
a = m(a) + 2
p = m(p) + 1
q = m(p) + m(a) + 3
Let m'
be a map of var->val at start of loop at end of i-1 iteration
m'(r)=m(r)
m'(a)=m(a)+2i-2
m'(p)=m(p)+i-1
m'(f)=unknown
At end of each statement in i'th iter:
b=m(a)+2i-2
a=m(a)+2i-1
c=2m(a)+4i-3
d=m(r)+m(a)+2i-2
e=unknown
f=unknown
g=unknown
a=m(a)+2i
p=m(p)+i
q=m(p)+m(a)+3i
Wherever possible, represent values as affine expressions on the loop iteration index.
Strength reduction
Replace multiplications with additions
for (i = ....) {
j = c0 + c1*i
... j ...
}
to
j = c0 + c1;
for (i = ...) {
... j ...
j += c1
}
Induction Variable
Variables
- All the variables in the program
- A normalized index
i
for each loop, with values 1, 2, ... for 1st, 2nd iterations etc.
Find variables that are afffine expressions of loop indices and loop-invariant variables. Formulate as a bottom-up region based analysis.
Transfer function of a statement
- Let
f
be the transfer function of statement: x = c0+c1*y + c2*z
- m' = f(m)
- For each variable v, m'(v) =
- m(v) if v != x
- c0 + c1*m(y) + c2*m(z) if v = x
- unknown (bottom) otherwise
Composition of transfer functions
- Composition of transfer function: f2 o f1 : --> f1 --> f2 -->
- Let m1=f1(m) and m'=f2of1(m)
- Let f2 be the transfer function of the statement:
x = c0 + c1*y + c2*z
- For each variable v, m'(v) =
- m1(v) if v != x
- c0+c1*m1(y)+c2*m2(z) if v=x and m1(y),m1(z) is not unknown
- unknown otherwise
Meet of transfer function
- Meet of transfer function: f2 ^ f1
- Let m1 = f1(m), m2=f2(m), m' = f1^f2(m)
- For each variable v,
m'(v) =
- m1(v) if m1(v)=m2(v)
- unknown otherwise
Closure
- Closure of transffer function: f^i
- Let m1=f(m0), m(i-1) = f^(i-1)(m0), m(i) = fof^(i-1)(m0)
- After i-1 iterations
- Loop invariant: m1(v) = m0(v) => m(i-1)(v) = m0(v)
- Basic induction: m1(v) = m0(v) + c => m(i-1)(v) = m0(v) + c(i-1)
- After the ith iteration:
- m1(v) = c0 + \sum_k{ck*m0(vk)} + \sum_j{cj*m0(vj)} where vk is a loop invariant variable, and vj is a basic induction variable incremented by Cj (capital C is different from small c)
- => mi(v) = c0 + \sum_k{ck*m(i-1)(vk)} + \sum_j{cj*m(i-1)(vj)}
- => mi(v) = c0 + \sum_k{ck*m0(vk)} + \sum_j{cj*(m0(vj) + Cj(i-1))}
- otherwise => mi(v) = unknown
- After unknown number of iterations: Let m* = f*(m)
- m1(v) = c0 + \sum_k{ck*m0(vk)} where vk is a loop invariant variable
- => m*(v) = c0 + \sum_k{ck*m0(vk)}
- otherwise => m*(v) = unknown
Summary: region-based analysis is an alternate algorithm for iterative data-flow. It works in transfer function space with composition (o), meet (^) and kleene start (*) operators. Bottom-up analysis summarizes the effect of regions with transfer functions.
Induction variable analysis is useful for strength reduction and for symbolic analysis for parallelization.