An example loop transformation: loop inversion

t = m + n;
for (i = 0; i < n; i++) {
  A[i] = t;
}

t = m + n;
if (n > 0) {
  i = 0;
  do {
    A[i] = t;
    i++;
  } while (i < n);
}

This can be thought-of as a special case of loop peeling (peeling of the first few iterations in the loop header). Here, we peel-off the check of the first iteration. It happens to be a very common optimization in modern compilers as it can be applied to any loop.

Optimizations that make a loop faster are usually very consequential and so much effort has gone into this.

What is a loop?

Goal:

Define a loop in graph-theoretic terms (control-flow graph)
Not sensitive to input syntax: a uniform treatment of all loops (e.g., for, while, repeat-until, do-while, etc.).

Informally, a natural loop has

edges that form at least a cycle.
a single entry point.

Example:

BB1->BB2
BB1->BB3
BB2->BB4
BB3->BB4
BB4->BB5
BB5->BB4
BB5->BB2

While BB4-BB5 are a natural loop, BB2-BB4-BB5 are not a natural loop (BB3 can jump into the middle of this loop).

Most loops in common programs are natural loops.

Dominators

Node d dominates node n in a graph (d dom n) if:
- Every path from the start node to n goes through d.
- A node dominates itself.
Immediate dominance: d idom n: d dom n, d != n, \not_exists m, s.t., d dom m and m dom n.
Immediate dominance relationships form a tree.

Definition of Natural loops

A single entry point: header(d): a header dominates all nodes in the loop.
A back edge (n -> d) in a flow graph is an edge whose destination dominates its source (d dom n)
The natural loop of a back edge (n->d) is d+ {nodes that can reach n without going through d}.

Algorithm to find natural loops

Find the dominator relations in a flow graph.
Identify the back edges.
Find the natural loop associated with the back edge.

Finding dominators

A node d dominates n in a graph (d dom n) if every path from the start node to n goes through d.

Domain of values: all sets of basic blocks.
Direction: forward
Meet operator: intersection
Transfer function: f(x) = x \union b
Boundary condition: out[entry] = {start}
Initialization to Top for all non-entry blocks: out[b] = set of all blocks
Bottom: empty set

Can do in one reverse postorder pass (can also do in arbitrary order until we reach fixpoint but that would be slower). Reverse postorder is also called depth-first ordering.

Depth-first (or reverse postorder) ordering

Here is an algorithm for computing the reverse-postorder:

void main()
{
  foreach (node n : graph) {
    visited[n] = false;
  }
  c = number of nodes in G;
  search(entrynode);
}

void search(node n)
{
  visited[n] = true;
  for (s : successors(n)) {
    if (!visited[s]) {
      search(s);
    }
  }
  dfn[n] = c;
  c = c - 1;
}

//Output: dfn[] contains the depth-first (aka reverse postorder) numbering of each node

An edge m->n is a retreating edge iff dfn[m] > dfn[n] (because this means that during the DFS traversal as in the algorithm above, m was reached through n starting from entry, or time taken by search(m) was a sub-interval of the time taken by search(n)). Notice that there are multiple depth-first orders possible (depending on the order in which the successors are chosen), and each depth-first order may result in a different set of retreating edges.

Finding back edges

Any edge whose destination dominate source. For any flow graph, every back edge is retreating (why?), but not every retreating edge is a back edge. A flow graph is said to be reducible if all its retreating edges in any depth-first spanning tree are also back edges. In other words, all depth-first orders have the same set of retreating edges, and all of them are back edges.

Example of a non-reducible graph:

BB1->BB2
BB1->BB3
BB2->BB3
BB3->BB2

Example of non-reducible graph in general: often if we reverse the edges of a regular CFG, we may often end up with a non-reducible flow graph. Intuitively: while typical programs have loops with single entries, those loops sometimes have several exits.

Depth of a reducible graph

The depth is the largest number of retreating edges on any cycle-free path. For reducible graphs, all retreating edges are back edges. Intutively: this is the same as the maximum loop nesting depth of a graph.

Constructing natural loops

Remove d from the graph, find all predecessors of n.

Inner loops

If two loops do not have the same header:
- They are either disjoint, or
- One is entirely contained (nested within) the other
  - Inner loop: the one that contains no other loop.
If two loops share the same header:
- Hard to tell which is the inner loop
- Combine as one.

Induction variables

Example

BB1->BB2
BB2->BB2
BB2->BB3

BB1: a=0
BB2: a = a+1

Summarize the loop body with a transfer function fb: m' = fb(m)
- m'(a)=m(a)+1; m'(x)=m(x) for x != a
Summarize the effect after i iterations: m' = fb^i(m)
- m'(a)=m(a)+i; m'(x)=m(x) for x != a
- Can now execute each iteration independently (e.g., in parallel).
Summarize the effect of the entire loop with n iterations: m' = fb^n(m)
- If n is a constant: m'(a)=m(a)+n; m'(x)=m(x) for x != a
- If n is unknown: m'(a)=unknown; m'(x)=m(x) for x != a
- Can now execute each iteration independently (e.g., in parallel).
Closure of a function: direct evaluation of i applications of a function.

Region-based analysis

The iterative data-flow analysis algorithm we have discussed so far is just one approach to solving data-flow problems. We discuss another approach called region-based analysis. In the iterative-analysis approach, we create transfer functions for basic blocks and then find the fixedpoint solution by repeated passes over the blocks. Instead of creating transfer functions just for individual blocks, a region-based analysis finds transfer functions that summarize the execution of progressively larger regions of the program. Ultimately, transfer functions for entire procedures are constructed and then applied, to get the desired data-flow values directly. While a data-flow framework using an iterative algorithm is specified by a semilattice of data-flow values and a family of transfer functions closed under composition, region-based analysis requires more elements. A region-based framework includes both a semilattice of data-flow values and a semilattice of transfer functions that must possess a meet operator, a composition operator, and a closure operator. A region-based analysis is particularly useful for data-flow problems where paths that have cycles may change the data-flow values. The closure operator allows the effect of a loop to be summarized more effectively than does iterative analysis. The technique is also useful for interprocedural analysis, where transfer functions associated with a procedure call may be treated like the transfer functions associated with basic blocks. For simplicity, we shall consider only forward data-flow problems in this section. We first illustrate how region-based analysis works by using the familiar example of reaching definitions. In region-based analysis, a program is viewed as a hierarchy of regions, which are (roughly) portions of a flow graph that have only one point of entry. We should find this concept of viewing code as a hierarchy of regions intuitive, because a block-structured procedure is naturally organized as a hierarchy of regions. Each statement in a block-structured program is a region, as control flow can only enter at the beginning of a statement. Each level of statement nesting corresponds to a level in the region hierarchy. Formally, a region of a flow graph is a collection of nodes N and edges E such that

There is a header h in N that dominates all the nodes in N.
If some node m can reach a node n in N without going through h, then m is also in N.
E is the set of all the control flow edges between nodes n1 and n2 in N, except (possibly) for some that enter h.

Clearly, a natural loop represents a region. Also, a basic block represents a region; similarly a single statement represents a region.

Example of non-region:

B1->B2
B2->B3
B3->B4
B1->B3
B2->B4

The subgraph formed by B2-B3-B4 is not a region, but the entire graph is a region.

For the rest of the discussion, let's assume that the CFG is reducible. Every non-reducible graph can be converted to a reducible graph by duplicating subgraphs, but this may increase the code-size exponentially. For our example of a non-reducible graph above

A region: a group of nodes with a single entry and possibly multiple exits
Assume a reducible flow graph
Find natural loops: each defines:
- loop body (acyclic graph of sub-regions)
- loop (one sub-region and back edges)
Bottom-up analysis: compute transfer function from entry to all exits for each sub-region.
Top-down analysis: Apply the summary transfer functions (for each sub-region) to compute data flow values of the entire region with its loop's back-edges.

Region hierarchies for reducible flow graphs

To construct a hierarchy of regions, we identify the natural loops. The process of "parsing" a reducible flow graph into its hierarchy of loops begins with every block as a region by itself. We call these regions leaf regions. Then, we order the natural loops from the inside out, i.e., starting with the innermost loops. To process a loop, we replace the entire loop by a node in two steps:

First, the body of the loop L (all nodes and edges except the back edges to the header) is replaced by a node representing a region R. Edges to the header of L now enter the node for R. An edge from any exit of loop L is replaced by an edge from R to the same destination. However, if the edge is a back edge, then it becomes a loop on R. We call R a body region.
Next, we construct a region R' that represents the entire natural loop L. We call R' a loop region. The only difference between R and R' is that the latter includes the back edges to the header of loop L. Put another way, when R' replaces R in the flow graph, all we have to do is remove the edge from R to itself

Eventually, all natural loops get reduced to single nodes; the top-level program is also a single region at a higher-level of hierarchy.

Example: a semi-lattice of transfer functions

Example: reaching definition: fi(x) = geni \union (x - killi).
o : composition of transfer functions (for paths): (f2 o f1)(x) = gen2 \union ((gen1 \union (x-kill1)) - kill2).
^ : meet of transfer functions (for confluence points): union: (f1 ^ f2)(x) = (gen1 \union (x-kill1)) \union (gen2 \union (x-kill2))
*(kleene star) operator: closure of transfer functions (for cycles): f* = meet over f^0, f^1, f^2, ..., f^n for n->\infinity. f^0 is the identity function.
In this example: f^2(x) = f(x)
f*(x) = x \union f(x) = gen \union x

Bottom-up analysis

Acyclic region: find transfer function to end of every subregion. Apply composition and meet of transfer functions based on paths.
Cyclic region:
- After (i-1)st complete iterations
  - fj: transfer function from loop header to back-edge source.
  - fj^(i-1)
- For each loop exit e:
  - fe: transfer function from loop head to exit
  - After i iterations: fe o fj^(i-1)
  - After an unknown number of iterations: fe o fj*

Top-down analysis

Given boundary value at the beginning of program: in[entry]
Acyclic region R, for each subregion S:
- out[S] = f(in[R]), where f is the transfer function from in[R] to the exit of S
- in[S] = in[R] if S is the header of region R
- in[S] = \meet (out[p]) for all predecessors p of S (if S is not the header of region R)
Cyclic region R, with one subregion S:
- in[S] = fj*(in[R]), where fj is the transfer function from in[R] to back-edge source

Region-based analysis broken down into bottom-up and top-down analysis is useful for more precise analysis by passing appropriate summaries from bottom-up analysis to top-down analysis. For example, if we have two functions f and g (callers), calling the same function h (callee). Using a summary callee function for h allows us to avoid flow of information from f to g and g to f. In this case, the call to h can be summarized as a transfer function with one entry and one exit.

Induction variables

Example1

for (j = ...) {
  //A[j] = 0
  t1 = 4*j
  t2 = &A
  t3 = t1+t2
  *t3 = 0
}
//t1 = 4j; t3 = &A+4j in iteration j

t3 = &A
for (j = ...) {
  *t3=0
  t3 = t3+4
}

Example2

loop:
b = a
a = a + 1
c = a + b
d = r + b
e = f
f = read()
g = 2 * f
a = a + 1
p = p + 1
q = p + a

Let m be a map of var->val at start of loop body.

At the end of each statement:

b = m(a)
a = m(a) + 1
c = 2*m(a) + 1
d = m(r) + m(a)
e = m(f)
f = unknown
g = unknown
a = m(a) + 2
p = m(p) + 1
q = m(p) + m(a) + 3

Let m' be a map of var->val at start of loop at end of i-1 iteration

m'(r)=m(r)
m'(a)=m(a)+2i-2
m'(p)=m(p)+i-1
m'(f)=unknown

At end of each statement in i'th iter:

b=m(a)+2i-2
a=m(a)+2i-1
c=2m(a)+4i-3
d=m(r)+m(a)+2i-2
e=unknown
f=unknown
g=unknown
a=m(a)+2i
p=m(p)+i
q=m(p)+m(a)+3i

Wherever possible, represent values as affine expressions on the loop iteration index.

Strength reduction

Replace multiplications with additions

for (i = ....) {
  j = c0 + c1*i
  ... j ...
}

j = c0 + c1;
for (i = ...) {
  ... j ...
  j += c1
}

Induction Variable

Variables

All the variables in the program
A normalized index i for each loop, with values 1, 2, ... for 1st, 2nd iterations etc.

Find variables that are afffine expressions of loop indices and loop-invariant variables. Formulate as a bottom-up region based analysis.

Transfer function of a statement

Let f be the transfer function of statement: x = c0+c1*y + c2*z
- m' = f(m)
- For each variable v, m'(v) =
  - m(v) if v != x
  - c0 + c1*m(y) + c2*m(z) if v = x
  - unknown (bottom) otherwise

Composition of transfer functions

Composition of transfer function: f2 o f1 : --> f1 --> f2 -->
Let m1=f1(m) and m'=f2of1(m)
Let f2 be the transfer function of the statement: x = c0 + c1*y + c2*z
For each variable v, m'(v) =
- m1(v) if v != x
- c0+c1*m1(y)+c2*m2(z) if v=x and m1(y),m1(z) is not unknown
- unknown otherwise

Meet of transfer function

Meet of transfer function: f2 ^ f1
Let m1 = f1(m), m2=f2(m), m' = f1^f2(m)
For each variable v, m'(v) =
- m1(v) if m1(v)=m2(v)
- unknown otherwise

Closure

Closure of transffer function: f^i
Let m1=f(m0), m(i-1) = f^(i-1)(m0), m(i) = fof^(i-1)(m0)
After i-1 iterations
- Loop invariant: m1(v) = m0(v) => m(i-1)(v) = m0(v)
- Basic induction: m1(v) = m0(v) + c => m(i-1)(v) = m0(v) + c(i-1)
After the ith iteration:
- m1(v) = c0 + \sum_k{ck*m0(vk)} + \sum_j{cj*m0(vj)} where vk is a loop invariant variable, and vj is a basic induction variable incremented by Cj (capital C is different from small c)
- => mi(v) = c0 + \sum_k{ck*m(i-1)(vk)} + \sum_j{cj*m(i-1)(vj)}
- => mi(v) = c0 + \sum_k{ck*m0(vk)} + \sum_j{cj*(m0(vj) + Cj(i-1))}
- otherwise => mi(v) = unknown
After unknown number of iterations: Let m* = f*(m)
- m1(v) = c0 + \sum_k{ck*m0(vk)} where vk is a loop invariant variable
- => m*(v) = c0 + \sum_k{ck*m0(vk)}
- otherwise => m*(v) = unknown

Summary: region-based analysis is an alternate algorithm for iterative data-flow. It works in transfer function space with composition (o), meet (^) and kleene start (*) operators. Bottom-up analysis summarizes the effect of regions with transfer functions.

Induction variable analysis is useful for strength reduction and for symbolic analysis for parallelization.