An example loop transformation: loop inversion
t = m + n;
for (i = 0; i < n; i++) {
A[i] = t;
}
to
t = m + n;
if (n > 0) {
i = 0;
do {
A[i] = t;
i++;
} while (i < n);
}
This can be thought-of as a special case of loop peeling (peeling of the first few iterations in the loop header). Here, we peel-off the check of the first iteration. It happens to be a very common optimization in modern compilers as it can be applied to any loop.
Optimizations that make a loop faster are usually very consequential and so much effort has gone into this.
What is a loop?
Goal:
- Define a loop in graph-theoretic terms (control-flow graph)
- Not sensitive to input syntax: a uniform treatment of all loops (e.g., for, while, repeat-until, do-while, etc.).
Informally, a natural loop has
- edges that form at least a cycle.
- a single entry point.
Example:
BB1->BB2
BB1->BB3
BB2->BB4
BB3->BB4
BB4->BB5
BB5->BB4
BB5->BB2
While BB4-BB5 are a natural loop, BB2-BB4-BB5 are not a natural loop (BB3 can jump into the middle of this loop).
Most loops in common programs are natural loops.
Dominators
- Node
d
dominates node n
in a graph (d dom n
) if:
- Every path from the start node to
n
goes through d
.
- A node dominates itself.
- Immediate dominance:
d idom n
: d dom n
, d != n
, \not_exists m, s.t., d dom m and m dom n
.
- Immediate dominance relationships form a tree.
Definition of Natural loops
- A single entry point: header(d): a header dominates all nodes in the loop.
- A back edge (n -> d) in a flow graph is an edge whose destination dominates its source (
d dom n
)
- The natural loop of a back edge (n->d) is
d
+ {nodes that can reach n
without going through d
}
.
Algorithm to find natural loops
- Find the dominator relations in a flow graph.
- Identify the back edges.
- Find the natural loop associated with the back edge.
Finding dominators
A node d
dominates n
in a graph (d dom n
) if every path from the start node to
n
goes through d
.
- Domain of values: all sets of basic blocks.
- Direction: forward
- Meet operator: intersection
- Transfer function: f(x) = x \union b
- Boundary condition: out[entry] = {start}
- Initialization to Top for all non-entry blocks: out[b] = set of all blocks
- Bottom: empty set
Can do in one reverse postorder pass (can also do in arbitrary order until we reach fixpoint but that would be slower). Reverse postorder is also called depth-first ordering.
Depth-first (or reverse postorder) ordering
Here is an algorithm for computing the reverse-postorder:
void main()
{
foreach (node n : graph) {
visited[n] = false;
}
c = number of nodes in G;
search(entrynode);
}
void search(node n)
{
visited[n] = true;
for (s : successors(n)) {
if (!visited[s]) {
search(s);
}
}
dfn[n] = c;
c = c - 1;
}
//Output: dfn[] contains the depth-first (aka reverse postorder) numbering of each node
An edge m->n is a retreating edge iff dfn[m] > dfn[n] (because this means that during the DFS traversal as in the algorithm above, m was reached through n starting from entry, or time taken by search(m) was a sub-interval of the time taken by search(n)).
Notice that there are multiple depth-first orders possible (depending on the order in which the successors are chosen), and each depth-first order may result in a different set of retreating edges.
Finding back edges
Any edge whose destination dominate source. For any flow graph, every back edge is retreating (why?), but not every retreating edge is a back edge.
A flow graph is said to be reducible if all its retreating edges
in any depth-first spanning tree are also back edges. In other words, all
depth-first orders have the same set of retreating edges, and all of them
are back edges.
Example of a non-reducible graph:
BB1->BB2
BB1->BB3
BB2->BB3
BB3->BB2
Example of non-reducible graph in general: often if we reverse
the edges of a regular CFG, we may often end up with a non-reducible
flow graph. Intuitively: while typical programs have loops with single
entries, those loops sometimes have several exits.
Depth of a reducible graph
The depth is the largest number of retreating edges on any
cycle-free path. For reducible graphs, all retreating edges are back edges.
Intutively: this is the same as the maximum loop nesting depth of a graph.
Constructing natural loops
Remove d
from the graph, find all predecessors of n
.
Inner loops
- If two loops do not have the same header:
- They are either disjoint, or
- One is entirely contained (nested within) the other
- Inner loop: the one that contains no other loop.
- If two loops share the same header:
- Hard to tell which is the inner loop
- Combine as one.
Induction variables
Example
BB1->BB2
BB2->BB2
BB2->BB3
BB1: a=0
BB2: a = a+1
- Summarize the loop body with a transfer function fb: m' = fb(m)
m'(a)=m(a)+1
; m'(x)=m(x)
for x != a
- Summarize the effect after i iterations: m' = fb^i(m)
m'(a)=m(a)+i
; m'(x)=m(x)
for x != a
- Can now execute each iteration independently (e.g., in parallel).
- Summarize the effect of the entire loop with n iterations: m' = fb^n(m)
- If n is a constant:
m'(a)=m(a)+n
; m'(x)=m(x)
for x != a
- If n is unknown:
m'(a)=unknown
; m'(x)=m(x)
for x != a
- Can now execute each iteration independently (e.g., in parallel).
- Closure of a function: direct evaluation of i applications of a function.
Region-based analysis
The iterative data-flow analysis algorithm we have discussed so far is just one
approach to solving data-flow problems. We discuss another approach
called region-based analysis. In the iterative-analysis approach, we
create transfer functions for basic blocks and then find the fixedpoint solution by
repeated passes over the blocks. Instead of creating transfer functions just for
individual blocks, a region-based analysis finds transfer functions that summarize the execution of progressively larger regions of the program. Ultimately,
transfer functions for entire procedures are constructed and then applied, to get
the desired data-flow values directly.
While a data-flow framework using an iterative algorithm is specified by
a semilattice of data-flow values and a family of transfer functions closed under composition, region-based analysis requires more elements. A region-based
framework includes both a semilattice of data-flow values and a semilattice
of transfer functions that must possess a meet operator,
a composition operator, and a closure operator.
A region-based analysis is particularly useful for data-flow problems where
paths that have cycles may change the data-flow values. The closure operator
allows the effect of a loop to be summarized more effectively than does
iterative
analysis. The technique is also useful for interprocedural analysis, where
transfer functions associated with a procedure call may be treated
like the transfer
functions associated with basic blocks.
For simplicity, we shall consider only forward data-flow problems in this
section. We first illustrate how region-based analysis works by
using the familiar example of reaching definitions.
In region-based analysis, a program is viewed as a hierarchy of regions, which
are (roughly) portions of a flow graph that have only one point of entry.
We
should find this concept of viewing code as a hierarchy of regions intuitive,
because a block-structured procedure is naturally organized as a hierarchy of
regions. Each statement in a block-structured program is a region, as control
flow can only enter at the beginning of a statement. Each level of statement
nesting corresponds to a level in the region hierarchy.
Formally, a region of a flow graph is a collection of nodes N and edges E
such that
- There is a header h in N that dominates all the nodes in N.
- If some node m can reach a node n in N without going through h, then
m is also in N.
- E is the set of all the control flow edges between nodes n1 and n2
in N,
except (possibly) for some that enter h.
Clearly, a natural loop represents a region. Also, a basic block represents
a region; similarly a single statement represents a region.
Example of non-region:
B1->B2
B2->B3
B3->B4
B1->B3
B2->B4
The subgraph formed by B2-B3-B4 is not a region, but the entire graph
is a region.
For the rest of the discussion, let's assume that the CFG is reducible.
Every non-reducible graph can be converted to a reducible graph by
duplicating subgraphs, but this may increase the code-size exponentially.
For our example of a non-reducible graph above
- A region: a group of nodes with a single entry and possibly multiple exits
- Assume a reducible flow graph
- Find natural loops: each defines:
- loop body (acyclic graph of sub-regions)
- loop (one sub-region and back edges)
- Bottom-up analysis: compute transfer function from entry to all exits for each sub-region.
- Top-down analysis: Apply the summary transfer functions (for each sub-region) to compute data flow values of the entire region with its loop's back-edges.
Region hierarchies for reducible flow graphs
To construct a hierarchy of regions, we identify the natural loops.
The process of "parsing" a
reducible flow graph into its hierarchy of loops begins with every block as a
region by itself. We call these regions leaf regions. Then, we order the natural
loops from the inside out, i.e., starting with the innermost loops. To process a
loop, we replace the entire loop by a node in two steps:
- First, the body of the loop L (all nodes and edges except the back edges to
the header) is replaced by a node representing a region R. Edges to the
header of L now enter the node for R. An edge from any exit of loop L is
replaced by an edge from R to the same destination. However, if the edge
is a back edge, then it becomes a loop on R. We call R a body region.
- Next, we construct a region R' that represents the entire natural loop L.
We call R' a loop region. The only difference between R and R' is that
the latter includes the back edges to the header of loop L. Put another
way, when R' replaces R in the flow graph, all we have to do is remove
the edge from R to itself
Eventually, all natural loops get reduced to single nodes; the top-level
program is also a single region at a higher-level of hierarchy.
Example: a semi-lattice of transfer functions
- Example: reaching definition:
fi(x) = geni \union (x - killi)
.
- o : composition of transfer functions (for paths):
(f2 o f1)(x) = gen2 \union ((gen1 \union (x-kill1)) - kill2)
.
- ^ : meet of transfer functions (for confluence points): union:
(f1 ^ f2)(x) = (gen1 \union (x-kill1)) \union (gen2 \union (x-kill2))
- *(kleene star) operator: closure of transfer functions (for cycles): f* = meet over f^0, f^1, f^2, ..., f^n for n->\infinity. f^0 is the identity function.
- In this example: f^2(x) = f(x)
- f*(x) = x \union f(x) = gen \union x
Bottom-up analysis
- Acyclic region: find transfer function to end of every subregion. Apply composition and meet of transfer functions based on paths.
- Cyclic region:
- After (i-1)st complete iterations
- fj: transfer function from loop header to back-edge source.
- fj^(i-1)
- For each loop exit
e
:
- fe: transfer function from loop head to exit
- After i iterations: fe o fj^(i-1)
- After an unknown number of iterations: fe o fj*
Top-down analysis
- Given boundary value at the beginning of program: in[entry]
- Acyclic region R, for each subregion S:
- out[S] = f(in[R]), where f is the transfer function from in[R] to the exit of S
- in[S] = in[R] if S is the header of region R
- in[S] = \meet (out[p]) for all predecessors p of S (if S is not the header of region R)
- Cyclic region R, with one subregion S:
- in[S] = fj*(in[R]), where fj is the transfer function from in[R] to back-edge source
Region-based analysis broken down into bottom-up and top-down analysis is useful for more
precise analysis by passing appropriate summaries from bottom-up analysis to top-down analysis.
For example, if we have two functions f and g (callers), calling the same function h (callee).
Using a summary callee function for h allows us to avoid flow of information from f to g and g to f.
In this case, the call to h can be summarized as a transfer function with one entry and one exit.
Induction variables
Example1
for (j = ...) {
//A[j] = 0
t1 = 4*j
t2 = &A
t3 = t1+t2
*t3 = 0
}
//t1 = 4j; t3 = &A+4j in iteration j
to
t3 = &A
for (j = ...) {
*t3=0
t3 = t3+4
}
Example2
loop:
b = a
a = a + 1
c = a + b
d = r + b
e = f
f = read()
g = 2 * f
a = a + 1
p = p + 1
q = p + a
Let m
be a map of var->val at start of loop body.
At the end of each statement:
b = m(a)
a = m(a) + 1
c = 2*m(a) + 1
d = m(r) + m(a)
e = m(f)
f = unknown
g = unknown
a = m(a) + 2
p = m(p) + 1
q = m(p) + m(a) + 3
Let m'
be a map of var->val at start of loop at end of i-1 iteration
m'(r)=m(r)
m'(a)=m(a)+2i-2
m'(p)=m(p)+i-1
m'(f)=unknown
At end of each statement in i'th iter:
b=m(a)+2i-2
a=m(a)+2i-1
c=2m(a)+4i-3
d=m(r)+m(a)+2i-2
e=unknown
f=unknown
g=unknown
a=m(a)+2i
p=m(p)+i
q=m(p)+m(a)+3i
Wherever possible, represent values as affine expressions on the loop iteration index.
Strength reduction
Replace multiplications with additions
for (i = ....) {
j = c0 + c1*i
... j ...
}
to
j = c0 + c1;
for (i = ...) {
... j ...
j += c1
}
Induction Variable
Variables
- All the variables in the program
- A normalized index
i
for each loop, with values 1, 2, ... for 1st, 2nd iterations etc.
Find variables that are afffine expressions of loop indices and loop-invariant variables. Formulate as a bottom-up region based analysis.
Transfer function of a statement
- Let
f
be the transfer function of statement: x = c0+c1*y + c2*z
- m' = f(m)
- For each variable v, m'(v) =
- m(v) if v != x
- c0 + c1*m(y) + c2*m(z) if v = x
- unknown (bottom) otherwise
Composition of transfer functions
- Composition of transfer function: f2 o f1 : --> f1 --> f2 -->
- Let m1=f1(m) and m'=f2of1(m)
- Let f2 be the transfer function of the statement:
x = c0 + c1*y + c2*z
- For each variable v, m'(v) =
- m1(v) if v != x
- c0+c1*m1(y)+c2*m2(z) if v=x and m1(y),m1(z) is not unknown
- unknown otherwise
Meet of transfer function
- Meet of transfer function: f2 ^ f1
- Let m1 = f1(m), m2=f2(m), m' = f1^f2(m)
- For each variable v,
m'(v) =
- m1(v) if m1(v)=m2(v)
- unknown otherwise
Closure
- Closure of transffer function: f^i
- Let m1=f(m0), m(i-1) = f^(i-1)(m0), m(i) = fof^(i-1)(m0)
- After i-1 iterations
- Loop invariant: m1(v) = m0(v) => m(i-1)(v) = m0(v)
- Basic induction: m1(v) = m0(v) + c => m(i-1)(v) = m0(v) + c(i-1)
- After the ith iteration:
- m1(v) = c0 + \sum_k{ck*m0(vk)} + \sum_j{cj*m0(vj)} where vk is a loop invariant variable, and vj is a basic induction variable incremented by Cj (capital C is different from small c)
- => mi(v) = c0 + \sum_k{ck*m(i-1)(vk)} + \sum_j{cj*m(i-1)(vj)}
- => mi(v) = c0 + \sum_k{ck*m0(vk)} + \sum_j{cj*(m0(vj) + Cj(i-1))}
- otherwise => mi(v) = unknown
- After unknown number of iterations: Let m* = f*(m)
- m1(v) = c0 + \sum_k{ck*m0(vk)} where vk is a loop invariant variable
- => m*(v) = c0 + \sum_k{ck*m0(vk)}
- otherwise => m*(v) = unknown
Summary: region-based analysis is an alternate algorithm for iterative data-flow. It works in transfer function space with composition (o), meet (^) and kleene start (*) operators. Bottom-up analysis summarizes the effect of regions with transfer functions.
Induction variable analysis is useful for strength reduction and for symbolic analysis for parallelization.