An example loop transformation: loop inversion

t = m + n;
for (i = 0; i < n; i++) {
  A[i] = t;
}
to
t = m + n;
if (n > 0) {
  i = 0;
  do {
    A[i] = t;
    i++;
  } while (i < n);
}
This can be thought-of as a special case of loop peeling (peeling of the first few iterations in the loop header). Here, we peel-off the check of the first iteration. It happens to be a very common optimization in modern compilers as it can be applied to any loop.

Optimizations that make a loop faster are usually very consequential and so much effort has gone into this.

What is a loop?

Goal: Informally, a natural loop has Example:
BB1->BB2
BB1->BB3
BB2->BB4
BB3->BB4
BB4->BB5
BB5->BB4
BB5->BB2
While BB4-BB5 are a natural loop, BB2-BB4-BB5 are not a natural loop (BB3 can jump into the middle of this loop).

Most loops in common programs are natural loops.

Dominators

Definition of Natural loops

Algorithm to find natural loops

  1. Find the dominator relations in a flow graph.
  2. Identify the back edges.
  3. Find the natural loop associated with the back edge.

Finding dominators

A node d dominates n in a graph (d dom n) if every path from the start node to n goes through d. Can do in one reverse postorder pass (can also do in arbitrary order until we reach fixpoint but that would be slower). Reverse postorder is also called depth-first ordering.

Depth-first (or reverse postorder) ordering

Here is an algorithm for computing the reverse-postorder:
void main()
{
  foreach (node n : graph) {
    visited[n] = false;
  }
  c = number of nodes in G;
  search(entrynode);
}

void search(node n)
{
  visited[n] = true;
  for (s : successors(n)) {
    if (!visited[s]) {
      search(s);
    }
  }
  dfn[n] = c;
  c = c - 1;
}

//Output: dfn[] contains the depth-first (aka reverse postorder) numbering of each node
An edge m->n is a retreating edge iff dfn[m] > dfn[n] (because this means that during the DFS traversal as in the algorithm above, m was reached through n starting from entry, or time taken by search(m) was a sub-interval of the time taken by search(n)). Notice that there are multiple depth-first orders possible (depending on the order in which the successors are chosen), and each depth-first order may result in a different set of retreating edges.

Finding back edges

Any edge whose destination dominate source. For any flow graph, every back edge is retreating (why?), but not every retreating edge is a back edge. A flow graph is said to be reducible if all its retreating edges in any depth-first spanning tree are also back edges. In other words, all depth-first orders have the same set of retreating edges, and all of them are back edges.

Example of a non-reducible graph:

BB1->BB2
BB1->BB3
BB2->BB3
BB3->BB2

Example of non-reducible graph in general: often if we reverse the edges of a regular CFG, we may often end up with a non-reducible flow graph. Intuitively: while typical programs have loops with single entries, those loops sometimes have several exits.

Depth of a reducible graph

The depth is the largest number of retreating edges on any cycle-free path. For reducible graphs, all retreating edges are back edges. Intutively: this is the same as the maximum loop nesting depth of a graph.

Constructing natural loops

Remove d from the graph, find all predecessors of n.

Inner loops

Induction variables

Example
BB1->BB2
BB2->BB2
BB2->BB3

BB1: a=0
BB2: a = a+1

Region-based analysis

The iterative data-flow analysis algorithm we have discussed so far is just one approach to solving data-flow problems. We discuss another approach called region-based analysis. In the iterative-analysis approach, we create transfer functions for basic blocks and then find the fixedpoint solution by repeated passes over the blocks. Instead of creating transfer functions just for individual blocks, a region-based analysis finds transfer functions that summarize the execution of progressively larger regions of the program. Ultimately, transfer functions for entire procedures are constructed and then applied, to get the desired data-flow values directly. While a data-flow framework using an iterative algorithm is specified by a semilattice of data-flow values and a family of transfer functions closed under composition, region-based analysis requires more elements. A region-based framework includes both a semilattice of data-flow values and a semilattice of transfer functions that must possess a meet operator, a composition operator, and a closure operator. A region-based analysis is particularly useful for data-flow problems where paths that have cycles may change the data-flow values. The closure operator allows the effect of a loop to be summarized more effectively than does iterative analysis. The technique is also useful for interprocedural analysis, where transfer functions associated with a procedure call may be treated like the transfer functions associated with basic blocks. For simplicity, we shall consider only forward data-flow problems in this section. We first illustrate how region-based analysis works by using the familiar example of reaching definitions. In region-based analysis, a program is viewed as a hierarchy of regions, which are (roughly) portions of a flow graph that have only one point of entry. We should find this concept of viewing code as a hierarchy of regions intuitive, because a block-structured procedure is naturally organized as a hierarchy of regions. Each statement in a block-structured program is a region, as control flow can only enter at the beginning of a statement. Each level of statement nesting corresponds to a level in the region hierarchy. Formally, a region of a flow graph is a collection of nodes N and edges E such that
  1. There is a header h in N that dominates all the nodes in N.
  2. If some node m can reach a node n in N without going through h, then m is also in N.
  3. E is the set of all the control flow edges between nodes n1 and n2 in N, except (possibly) for some that enter h.
Clearly, a natural loop represents a region. Also, a basic block represents a region; similarly a single statement represents a region.

Example of non-region:

B1->B2
B2->B3
B3->B4
B1->B3
B2->B4
The subgraph formed by B2-B3-B4 is not a region, but the entire graph is a region.

For the rest of the discussion, let's assume that the CFG is reducible. Every non-reducible graph can be converted to a reducible graph by duplicating subgraphs, but this may increase the code-size exponentially. For our example of a non-reducible graph above

Region hierarchies for reducible flow graphs

To construct a hierarchy of regions, we identify the natural loops. The process of "parsing" a reducible flow graph into its hierarchy of loops begins with every block as a region by itself. We call these regions leaf regions. Then, we order the natural loops from the inside out, i.e., starting with the innermost loops. To process a loop, we replace the entire loop by a node in two steps:
  1. First, the body of the loop L (all nodes and edges except the back edges to the header) is replaced by a node representing a region R. Edges to the header of L now enter the node for R. An edge from any exit of loop L is replaced by an edge from R to the same destination. However, if the edge is a back edge, then it becomes a loop on R. We call R a body region.
  2. Next, we construct a region R' that represents the entire natural loop L. We call R' a loop region. The only difference between R and R' is that the latter includes the back edges to the header of loop L. Put another way, when R' replaces R in the flow graph, all we have to do is remove the edge from R to itself
Eventually, all natural loops get reduced to single nodes; the top-level program is also a single region at a higher-level of hierarchy.

Example: a semi-lattice of transfer functions

Bottom-up analysis

Top-down analysis

Region-based analysis broken down into bottom-up and top-down analysis is useful for more precise analysis by passing appropriate summaries from bottom-up analysis to top-down analysis. For example, if we have two functions f and g (callers), calling the same function h (callee). Using a summary callee function for h allows us to avoid flow of information from f to g and g to f. In this case, the call to h can be summarized as a transfer function with one entry and one exit.

Induction variables

Example1
for (j = ...) {
  //A[j] = 0
  t1 = 4*j
  t2 = &A
  t3 = t1+t2
  *t3 = 0
}
//t1 = 4j; t3 = &A+4j in iteration j
to
t3 = &A
for (j = ...) {
  *t3=0
  t3 = t3+4
}

Example2

loop:
b = a
a = a + 1
c = a + b
d = r + b
e = f
f = read()
g = 2 * f
a = a + 1
p = p + 1
q = p + a
Let m be a map of var->val at start of loop body.

At the end of each statement:

b = m(a)
a = m(a) + 1
c = 2*m(a) + 1
d = m(r) + m(a)
e = m(f)
f = unknown
g = unknown
a = m(a) + 2
p = m(p) + 1
q = m(p) + m(a) + 3

Let m' be a map of var->val at start of loop at end of i-1 iteration

m'(r)=m(r)
m'(a)=m(a)+2i-2
m'(p)=m(p)+i-1
m'(f)=unknown

At end of each statement in i'th iter:

b=m(a)+2i-2
a=m(a)+2i-1
c=2m(a)+4i-3
d=m(r)+m(a)+2i-2
e=unknown
f=unknown
g=unknown
a=m(a)+2i
p=m(p)+i
q=m(p)+m(a)+3i
Wherever possible, represent values as affine expressions on the loop iteration index.

Strength reduction

Replace multiplications with additions
for (i = ....) {
  j = c0 + c1*i
  ... j ...
}
to
j = c0 + c1;
for (i = ...) {
  ... j ...
  j += c1
}

Induction Variable

Variables Find variables that are afffine expressions of loop indices and loop-invariant variables. Formulate as a bottom-up region based analysis.

Transfer function of a statement

Composition of transfer functions

Meet of transfer function

Closure

Summary: region-based analysis is an alternate algorithm for iterative data-flow. It works in transfer function space with composition (o), meet (^) and kleene start (*) operators. Bottom-up analysis summarizes the effect of regions with transfer functions.

Induction variable analysis is useful for strength reduction and for symbolic analysis for parallelization.