An example loop transformation: loop inversion

t = m + n;
for (i = 0; i < n; i++) {
  A[i] = t;
}

t = m + n;
if (n > 0) {
  i = 0;
  do {
    A[i] = t;
    i++;
  } while (i < n);
}

This can be thought-of as a special case of loop peeling (peeling of the first few iterations in the loop header). Here, we peel-off the check of the first iteration. It happens to be a very common optimization in modern compilers as it can be applied to any loop.

Optimizations that make a loop faster are usually very consequential and so much effort has gone into this.

What is a loop?

Goal:

Define a loop in graph-theoretic terms (control-flow graph)
Not sensitive to input syntax: a uniform treatment of all loops (e.g., for, while, repeat-until, do-while, etc.).

Informally, a natural loop has

edges that form at least a cycle.
a single entry point.

Example:

BB1->BB2
BB1->BB3
BB2->BB4
BB3->BB4
BB4->BB5
BB5->BB4
BB5->BB2

While BB4-BB5 are a natural loop, BB2-BB4-BB5 are not a natural loop (BB3 can jump into the middle of this loop).

Most loops in common programs are natural loops.

Dominators

Node d dominates node n in a graph (d dom n) if:
- Every path from the start node to n goes through d.
- A node dominates itself.
Immediate dominance: d idom n: d dom n, d != n, \not_exists m, s.t., d dom m and m dom n.
Immediate dominance relationships form a tree.

Definition of Natural loops

A single entry point: header(d): a header dominates all nodes in the loop.
A back edge (n -> d) in a flow graph is an edge whose destination dominates its source (d dom n)
The natural loop of a back edge (n->d) is d+ {nodes that can reach n without going through d}.

Algorithm to find natural loops

Find the dominator relations in a flow graph.
Identify the back edges.
Find the natural loop associated with the back edge.

Finding dominators

A node d dominates n in a graph (d dom n) if every path from the start node to n goes through d.

Domain of values: all sets of basic blocks.
Direction: forward
Meet operator: intersection
Transfer function: f(x) = x \union b
Boundary condition: out[entry] = {start}
Initialization to Top for all non-entry blocks: out[b] = set of all blocks
Bottom: empty set

Can do in one reverse postorder pass.

Finding back edges

Any edge whose destination dominate source.

Constructing natural loops

Remove d from the graph, find all predecessors of n.

Inner loops

If two loops do not have the same header:
- They are either disjoint, or
- One is entirely contained (nested within) the other
  - Inner loop: the one that contains no other loop.
If two loops share the same header:
- Hard to tell which is the inner loop
- Combine as one.

Induction variables

Example

BB1->BB2
BB2->BB2
BB2->BB3

BB1: a=0
BB2: a = a+1

Summarize the loop body with a transfer function fb: m' = fb(m)
- m'(a)=m(a)+1; m'(x)=m(x) for x != a
Summarize the effect after i iterations: m' = fb^i(m)
- m'(a)=m(a)+i; m'(x)=m(x) for x != a
- Can now execute each iteration independently (e.g., in parallel).
Summarize the effect of the entire loop with n iterations: m' = fb^n(m)
- If n is a constant: m'(a)=m(a)+n; m'(x)=m(x) for x != a
- If n is unknown: m'(a)=unknown; m'(x)=m(x) for x != a
- Can now execute each iteration independently (e.g., in parallel).
Closure of a function: direct evaluation of i applications of a function.

Region-based analysis

A region: a group of nodes with a single entry and possibly multiple exits
Assume a reducible flow graph
Find natural loops: each defines:
- loop body (acyclic graph of sub-regions)
- loop (one sub-region and back edges)
Bottom-up analysis: compute transfer function from entry to all exits for each sub-region.
Top-down analysis: Apply the summary transfer functions (for each sub-region) to compute data flow values of the entire region with its loop's back-edges.

Example: a semi-lattice of transfer functions

Example: reaching definition: fi(x) = geni \union (x - killi).
o : composition of transfer functions (for paths): (f2 o f1)(x) = gen2 \union ((gen1 \union (x-kill1)) - kill2).
^ : meet of transfer functions (for confluence points): union: (f1 ^ f2)(x) = (gen1 \union (x-kill1)) \union (gen2 \union (x-kill2))
*(kleene star) operator: closure of transfer functions (for cycles): f* = meet over f^0, f^1, f^2, ..., f^n for n->\infinity. f^0 is the identity function.
In this example: f^2(x) = f(x)
f*(x) = x \union f(x) = gen \union x

Bottom-up analysis

Acyclic region: find transfer function to end of every subregion. Apply composition and meet of transfer functions based on paths.
Cyclic region:
- After (i-1)st complete iterations
  - fj: transfer function from loop header to back-edge source.
  - fj^(i-1)
- For each loop exit e:
  - fe: transfer function from loop head to exit
  - After i iterations: fe o fj^(i-1)
  - After an unknown number of iterations: fe o fj*

Top-down analysis

Given boundary value at the beginning of program: in[entry]
Acyclic region R, for each subregion S:
- out[S] = f(in[R]), where f is the transfer function from in[R] to the exit of S
- in[S] = in[R] if S is the header of region R
- in[S] = \meet (out[p]) for all predecessors p of S (if S is not the header of region R)
Cyclic region R, with one subregion S:
- in[S] = fj*(in[R]), where fj is the transfer function from in[R] to back-edge source

Region-based analysis broken down into bottom-up and top-down analysis is useful for more precise analysis by passing appropriate summaries from bottom-up analysis to top-down analysis. For example, if we have two functions f and g (callers), calling the same function h (callee). Using a summary callee function for h allows us to avoid flow of information from f to g and g to f. In this case, the call to h can be summarized as a transfer function with one entry and one exit.

Induction variables

Example1

for (j = ...) {
  //A[j] = 0
  t1 = 4*j
  t2 = &A
  t3 = t1+t2
  *t3 = 0
}
//t1 = 4j; t3 = &A+4j in iteration j

t3 = &A
for (j = ...) {
  *t3=0
  t3 = t3+4
}

Example2

loop:
b = a
a = a + 1
c = a + b
d = r + b
e = f
f = read()
g = 2 * f
a = a + 1
p = p + 1
q = p + a

Let m be a map of var->val at start of loop body.

At the end of each statement:

b = m(a)
a = m(a) + 1
c = 2*m(a) + 1
d = m(r) + m(a)
e = m(f)
f = unknown
g = unknown
a = m(a) + 2
p = m(p) + 1
q = m(p) + m(a) + 3

Let m' be a map of var->val at start of loop at end of i-1 iteration

m'(r)=m(r)
m'(a)=m(a)+2i-2
m'(p)=m(p)+i-1
m'(f)=unknown

At end of each statement in i'th iter:

b=m(a)+2i-2
a=m(a)+2i-1
c=2m(a)+4i-3
d=m(r)+m(a)+2i-2
e=unknown
f=unknown
g=unknown
a=m(a)+2i
p=m(p)+i
q=m(p)+m(a)+3i

Wherever possible, represent values as affine expressions on the loop iteration index.

Strength reduction

Replace multiplications with additions

for (i = ....) {
  j = c0 + c1*i
  ... j ...
}

j = c0 + c1;
for (i = ...) {
  ... j ...
  j += c1
}

Induction Variable

Variables

All the variables in the program
A normalized index i for each loop, with values 1, 2, ... for 1st, 2nd iterations etc.

Find variables that are afffine expressions of loop indices and loop-invariant variables. Formulate as a bottom-up region based analysis.

Transfer function of a statement

Let f be the transfer function of statement: x = c0+c1*y + c2*z
- m' = f(m)
- For each variable v, m'(v) =
  - m(v) if v != x
  - c0 + c1*m(y) + c2*m(z) if v = x
  - unknown (bottom) otherwise

Composition of transfer functions

Composition of transfer function: f2 o f1 : --> f1 --> f2 -->
Let m1=f1(m) and m'=f2of1(m)
Let f2 be the transfer function of the statement: x = c0 + c1*y + c2*z
For each variable v, m'(v) =
- m1(v) if v != x
- c0+c1*m1(y)+c2*m2(z) if v=x and m1(y),m1(z) is not unknown
- unknown otherwise

Meet of transfer function

Meet of transfer function: f2 ^ f1
Let m1 = f1(m), m2=f2(m), m' = f1^f2(m)
For each variable v, m'(v) =
- m1(v) if m1(v)=m2(v)
- unknown otherwise

Closure

Closure of transffer function: f^i
Let m1=f(m0), m(i-1) = f^(i-1)(m0), m(i) = fof^(i-1)(m0)
After i-1 iterations
- Loop invariant: m1(v) = m0(v) => m(i-1)(v) = m0(v)
- Basic induction: m1(v) = m0(v) + c => m(i-1)(v) = m0(v) + c(i-1)
After the ith iteration:
- m1(v) = c0 + \sum_k{ck*m0(vk)} + \sum_j{cj*m0(vj)} where vk is a loop invariant variable, and vj is a basic induction variable incremented by Cj (capital C is different from small c)
- => mi(v) = c0 + \sum_k{ck*m(i-1)(vk)} + \sum_j{cj*m(i-1)(vj)}
- => mi(v) = c0 + \sum_k{ck*m0(vk)} + \sum_j{cj*(m0(vj) + Cj(i-1))}
- otherwise => mi(v) = unknown
After unknown number of iterations: Let m* = f*(m)
- m1(v) = c0 + \sum_k{ck*m0(vk)} where vk is a loop invariant variable
- => m*(v) = c0 + \sum_k{ck*m0(vk)}
- otherwise => m*(v) = unknown

Summary: region-based analysis is an alternate algorithm for iterative data-flow. It works in transfer function space with composition (o), meet (^) and kleene start (*) operators. Bottom-up analysis summarizes the effect of regions with transfer functions.

Induction variable analysis is useful for strength reduction and for symbolic analysis for parallelization.