Foundations of Dataflow Analysis

Global constant propagation algorithm

Consider the following example:

d0: y = 3
d1: x = y + 4
d2: y = 11
d3: if e ...
If we do CP for each variable separately, then the order of picking the variables has an effect on the precision of the final solution. In the example above, if we pick y before x, we get a more precise solution.

Consider an alternate representation of constant information using sets where all variables are tracked simultaneously. This representation can provide both better precision and better efficiency, than running the single-variable implementation multiple times.

There are two options for the transfer function for d1: either use the in value to determine that x is a constant too, or simply represent the transfer function as a GEN/KILL set (as a function of the statement, but independent of the input value).

This representation allows transfer functions to be written using generate (Gen[s]) and propagate (in[s] - Kill[s]) definitions.

Can summarize a basic block through its own composite transfer function as follows:

out[B] = fd2(fd1(fd0(in[B[))) = Gen[d2] U (Gen[d1] U (Gen[d0] U (in[B] - Kill[d0]) - Kill[d1]) - Kill[d2]) = Gen[d2] U (Gen[d1] U (Gen[d0] - Kill[d1]) - Kill[d2]) U (in[B] - (Kill[d0] U Kill[d1] U Kill[d2])) = Gen[B] U (in[B] - Kill[B])

Gen[B]: locally exposed constant definitions, definitions available at the end of the basic block B

Kill[B]: the set of constant definitions killed by B

Effects of edges: nodes with multiple predecessor: meet operator is Intersection: in[b] = out[p1] n out[p2] ... n out[pn], for p1..pn predecessors of b

Cyclic graphs: equations still hold:

Find fixed point solution (actually maximum fixed point, as we will discuss later).
input: control flow graph CFG = (N, E, Entry, Exit)

//Boundary condition
OUT[Entry] = empty set

//Initialization for iterative algorithm
For each basic block B other than Entry
  OUT[B] = {(x,\Top) for all x}

//Iterate
While (changes to any OUT occur) {
    For each basic block B other than Entry {
      in[B] = intersection over (out[p]), for all preds p of B
      out[B] = fB(in[B]) //out[B] = gen[B] u (in[B] - kill[B])
    }
}

Summary of reaching definitions
Reaching Definitions
DomainSets of constant definitions
Transfer function fb(x)forward:
out[b] = fb(in[b])
out[b] = Gen[b] U (x - Kill[b])
Meet operationin[b] = n out[predecessors]
intersection
Boundary conditionout[entry] = empty
Initial interior pointsout[b] = {(x,\top): for all variables x}

Live Variable analysis

Liveness : Iterative algorithm

input: control flow graph CFG = (N, E, Entry, Exit)

//Boundary condition
In[Exit] = empty

//Initialization for iterative algorithm
For each basic block B other than Exit
    In[B] = empty

//Iterate
While (changes to any IN occur) {
  For each basic block B other than Exit {
    out[B] = U (in[s]) for all successors s of B
    in[B] = fB(out[B]) //in[B]=Use[B] U (out[B] - Def[B])
  }
}

Framework
Reaching DefinitionsLive variables
DomainSets of constant definitionsSets of variables
Direction and Transfer function fb(x)forward:
out[b] = fb(in[b])
out[b] = Gen[b] U (x - Kill[b])
in[b] = n out[predecessors]
backward:
in[B] = fb(out[B])
in[b] = Use[b] U (x - Def[b])
out[B] = U in[succ[B]]
Meet operationintersectionunion
Boundary conditionout[entry] = emptyin[exit] = empty
Initial interior pointsout[b] = {(x,\top): for all variables x}in[b] = empty

Exercise: must-reach definitions

How do we formulate the data flow algorithm for this problem?

Exercise: reaching definitions

What would be the data flow algorithm formulation for this problem?

Would the following be a legal solution to reaching definitions?

Entry --> BB1

BB1 --> BB2

BB2 --> BB2

BB2 --> BB3

BB3 contains (d1: b = 1) definition

BB3 --> exit
Candidate solution:
in[BB1] = empty
out[BB1] = empty
in[BB2] = {d1}
out[BB2] = {d1}
in[BB3] = {d1}
out[BB3] = {d1}
in[exit] = {d1}
Will our iterative worklist algorithm generate this answer?

Answer: this is a legal fixed-point solution to the system of equations. However this is not a maximum fixed-point (later).

Framework

Why do we need a framework:
  1. Prove properties of entire family of problems once and for all, e.g., will the program converge? what do the solution to the set of equations mean? how do we ensure the best solution?
  2. Aid in software engineering: re-use code

Data-flow problems (F, V, ^) are defined by

Example of a semi-lattice diagram: V = { x | x is a subset of {d1,d2,d3}}, ^ = set-union

Draw the semi-lattice with arrows pointing downwards (>= relation)

Some interesting properties

A meet-operator defines a partial-order and vice-versa

Summary:

Another example: semi-lattice with V = {x | such that x is a subset of {d1, d2, d3}}, ^ = \intersection

One element at a time

Descending chain

Transfer functions

Monotonicity

Example: reaching definitions: f(x) = Gen \union (x - Kill), ^ = \union

Alternatively,

Important: monotone framework does not mean that f(x) ≤ x.

Distributivity: A framework (F, V, ^) is distributive if and only if:

Meet input, then apply f is equal to apply the transfer function individually and then merge result.

Distributivity implies monotonicity, but not vice-versa.

Example: the framework for constant propagation is not distributive: example: if (...) { x = 2; y = 3; } else { x = 3; y = 2; }; z = x + y; f(x)^f(y) = 5; f(x^y) = \Bottom.

Properties of Iterative Algorithm

Behaviour of iterative algorithm (intuitively): For each IN/OUT of an interior program point:

What does the solution mean?

IDEAL data flow solution

Meet over paths MOP solution:

What is the difference between MOP and MFP of data flow equations? Consider two paths F1.F3 and F2.F3. MOP = F3F1(x) ^ F3F2(x). MFP (through iterative algo) = F3(F1(x) ^ F2(x)). Therefore