Global constant propagation

As an example of a general global transformation involving global dataflow analysis.

Recall: To replace a use of x by a constant k, we must know: On every path to the use of x, the last assignment to x is x := k. Let's call this condition AA.

Global constant propagation can be performed at any point where AA holds.

Let's first consider the case of computing AA for a single variable X at all program points. One can potentially repeat this procedure for each variable (inefficient but okay; improvements possible by doing all at once -- later).

To make the problem precise, we associate one of the following values with X at every program point:

value interpretation

\top This statement is unreachable

C X = constant C on all incoming paths

\bot X may not be a constant

\bot is our safe situation; we can always say that X may not be a constant (means that we do not know if it is a constant or not). In other words, \bot has the least amount of information --- even if we do not do anything, we can always safely use the \bot value.

\top has the most amount of information. Notice that if a statement s is unreachable, then X=constant C holds at the beginning (and end) of S vacuously.

Said differently, we can arrange the values with a higher value having more information than a lower value, such that a higher value implies the lower value. For example \top implies c (for any c). Similarly, c implies \bot. Thus, if we determine a value v for X at a program point, then we can always use a value w<=v at that point.

The algorithm optimistically assumes the maximum amount of information (\top) in our solution initially. However this assumption may be refuted by the possible executions of a program. The \bot solution can never be refuted (because it imposes no restrictions). Whenever we have evidence that the assumption is refutable, we decrease the amount of information in our solution. We keep doing this until no refutations are possible. The act of "decreasing the amount of information" in a solution is also called weakening a solution. In the worst case, the solution will get weakened to \bot at all program points.

This can be modeled as a fixed-point algorithm. We first formulate a set of constraints that must hold for a solution to be irrefutable. We start with the most optimistic solution, and if a constraint is violated, we weaken it the minimum amount such that the constraint is now satisfied --- recall that it is always possible to weaken a solution such that a constraint gets satisfied because \bot will always satisfy all the constraints. The weakening by a "minimum amount" is the key to obtain useful (non-\bot) solutions.

BB1:
               === X = \bot (at this program point)
X := 3
               === X = 3
B > 0 then goto BB2
         else goto BB3
              === X = 3 on both branches

BB2:
             ==== X = 3
Y := Z + W
             ==== X = 3
X := 4
             ==== X = 4

BB3:
             ==== X = 3
Y := 0
             ==== X = 3

BB4:
             ==== X = \bot
A := 2 * X

Given global constant information, it is easy to perform the optimization

Simply inspect the x = ? associated with a statement using x
If x is a constant at that point, replace all uses of x with that constant.
But how do we compute x = ? at each program point.

The analysis of a compicated program can be expressed as a combination of simple rules relating the change in information between adjacent statements.

The idea is to "push" or "transfer" information from one statement to the next.

For each statement s, we compute information about the value of x immediately before and after s

Define a function C that stands for "constant information".
C(x,s,in) = value of x before s.
C(x,s,out) = value of x after s.

We use <= to denote the "less information" relation. A lower value has less information than a higher value.

In the following rules, let statement s have immediate predecessor statements p1, ..., pn.

         |
         |
         v
 ---> s <---

if C(x, pi, out) = \bot for any i, then C(s, x, in) = \bot. For all we know, the execution may come down that predecessor, and so in that case we can make no prediction about the value of x.
if C(x, pi, out) = c and C(x, pi, out) = d and c <> d, then C(x, s, in) = \bot. We saw this in the example that we did by hand.
if C(x, pi, out) = c or \top for all i (with at least one c), then C(x, s, in) <= c. If we come along a path where x=c, this is trivial to see. For a path with \top, it means that this never executes and so it can be ignored.
If C(x, pi, out) = \top for all i, then C(x, s, in) <= \top. Every predecessor is unreachable, and so x itself is unreachable.
If C(x, s, in)=\top then C(x, s, out)<=\top, for all s. Let's call this rule 5.
If C(x, s, in)<>\top then C(x, x:=c, out) <= c (where c is a constant).
If C(x, s, in)<>\top then C(x:=f(...), x, out) = \bot if the value of x cannot be determined to be a constant after this statement.
If C(x, s, in)<>\top then C(y:=..., x, out) <= C(y:=..., x, in) if x<>y

Rules 1-4 relate the out of one statement to the in of the next statement. Rules 5-8 relate the in of a statement to the out of the same statement.

Notice that the rules are sound, in that they determine something to be a constant only if it is definitely a constant but incomplete, in that they miss out certain cases where an unreachable or a constant value may be used (by using a bottom value). This can happen due to a coarse-grained interpretation of f for example. Or if the program was written as x := 2; x := x + 1. Or if there was a condition branch of the form if x = 3 goto BB, etc.

Algorithm

For every entry s to the program, set C(s,x,in) = \bot
For every other statement (non-entry), set C(s,x,in)=C(s,x,out)=\top.
Repeat until all points satisfy rules 1-8:
- Pick s not satisfying a rule and minimally update the value in the "then" clause of that rule, so that the constraint gets satisfied.

Notice we expect that at some point all rules will be satisfied at all points. This is called the fixed-point and the corresponding solution, the fixed-point solution.

When constraints are checked across a statement (rules 5-8), then the "out" value is minimally updated to satisfy a constraint. This identification of a maximum "out" value from an "in" value (to help "minimally" update the current value) can be represented as a function, which we call a transfer function.

Let's run the algorithm on this example:

BB1:
               === X = \bot (at this program point)
X := 3
               === X = \top
B > 0 then goto BB2
         else goto BB3
              === X = \top

BB2:
             ==== X = \top
Y := Z + W
             ==== X = \top
X := 4
             ==== X = \top

BB3:
             ==== X = \top
Y := 0
             ==== X = \top

BB4:
             ==== X = \bot
A := 2 * X

Some guarantees: the fixed-point solution will satisfy the equations at each program point.

Some un-answered questions: what is the guarantee that this analysis will converge? What is the guarantee that this algorithm will result in the best possible solution for this system of solutions?

Analysis of loops

BB1:
X := 3
B > 0 then goto BB2
         else goto BB3

BB2:
Y := Z + W
X := 4

BB3:
Y := 0

BB4:
A := 2 * X

Assume there is an edge from BB4 to BB3.

Now, when we are computing C(BB3,x,in), we need to know the value of C(BB4,x,out) and in turn C(BB4, x, in) and so on... And we are in a loop (we get back to C(BB3,x,in)).

Consider the statement Y:=0
To compute whether X is constant at this point, we need to know whether X is constant at the two predecessors
- X:=3
- A:=2*X
But info for A:=2*X depends on its predecessors including Y:=0!

Because of cycles, all points must have values at all times.

Intuitively, assigning some initial value allows the analysis to break cycles.

The initial value \top means "So far as we know, control never reaches this point". The initial value is not what is expected at the end but allows us to get going (by breaking the cycle). Moreover, it is the most optimistic value --- if it does not satisfy a constraint, it will get weakened anyway. If we start with a less optimistic value, then we will still obtain a fixed-point solution that will satisfy all the constraints, but the solution obtained may have less information than what was actually possible.

Analyzing the example: if we run the algorithm assuming that C(BB4,x,out) is \top, then we will be able to reach a fixed-point solution.

Orderings

We can simplify the presentation of the analysis by ordering the (abstract) values.

\bot < c < \top

Drawing a picture with "lower" (having less information) values drawn lower, we get

      \top
  /   / | \  \
... -1  0  1 ...
  \   \ | /  /
      \bot

Notice that this is a partial order; not all elements are comparable to each other. e.g., 0 and 1 are not comparable.

With orderings defined:

\top is the greatest value, \bot is the least.
- All constants are in between and incomparable.
Let glb be the greatest-lower bound in this ordering:
- glb(\top, 1) = 1
- glb(\bot, \top) = \bot
- glb(\bot, 1) = \bot
- glb(1, 2) = \bot
Rules 1-4 can be written using glb:
- C(s,x,in) = glb{C(p,x,out) | p is a predecessor of s}.

Simply saying "repeat until nothing changes" doesn't guarantee that eventually nothing changes (it could oscillate forever for example). The use of glb explains why the algorithm terminates:

Values start as \top can only decrease.
\top can change to a constant, and a constant to \bot.
Thus, C(s,x,_) (for every statement, for every variable) can change at most twice.

Also, we maintain the invariant that the value at any intermediate point of the algorithm is always greater than the best solution value. This is because we initialize all values to \top. Also, we relax minimally --- assuming this invariant is true for the predecessors, it is guaranteed to be true for the successor. Thus, at a fixed point, we reach the best solution value.

Thus the constant propagation algorithm is linear in program size. Number of steps per variable = Number of C(...) values computed * 2 = Number of program statements * 4 (two C values per program statement, in and out).

Generalization to multiple variables

As the number of variables in a program are proportional to the size of a program, computing the "constant" information for each variable separately would have quadratic time complexity. We discussed this in the context of a single variable, just for its ease of exposition. In practice, the constant information is computed simultaneously for all variables.

For example, if there are two variables, say (x,y), then we could have a constant value, say Cx for x, and a constant value, say Cy for y. The partial order is defined so that (Cx1,Cy1)<=(Cx2,Cy2) if and only if (Cx1<=Cx2 and Cy1<=Cy2). This can also be drawn pictorially, and can be generalized to n variables. Further, we do not need separate \top values for x and y, and can use a common \top value (that indicates unreachability): this is because if one is unreachable, the other will also be unreachable.

Advantages of computing constant-information for multiple variables in one go:

The total number of possible values decreases (because \top does not need to be maintained separately for each variable).
We can maintain a map from variables to their constant-values, with the optimization that if a value is \bot (common case), then we don't have an entry for that. This can significantly reduce the iteration and copying work.
When applying the transfer function across a statement s, we only need to potentially change the mapping for the variable x assigned-to in s; for all other variables, the constant-information can simply be copied.
The analysis can be made more precise by leveraging the constant-values of other variables. For example, if (x,y) map to (0,0) at the beginning of a program x := y + 1; y := x + 2, then it is now possible to define more transfer functions that leverage the constant-value of the other variable to obtain a precise solution (1,3) at the end of this program. However, if we run the analysis separately for x and y, the results may be precise or imprecise depending on the order in which the analysis is run (e.g., imprecise if x analysis is run before the y analysis) even if we substitute after each analysis. Further, here is an example of a program where the separate-analysis approach would be imprecise irrespective of the order of running the analyses: x := y + 1; y := x + 2; x := y + 3;.

Liveness analysis

Once constants have been globally propagated, we would like to eliminate dead code.

BB1:
X := 3
B > 0 then goto BB2
         else goto BB3

BB2:
Y := Z + W

BB3:
Y := 0

BB4:
A := 2 * 3

After constant-propagating X at BB4, the assignment to X in BB1 is no longer useful. In other words, X:=3 is dead (assuming X not used elsewhere).

Defining what is used (live) and what is not used (dead).

X := 3
X := 4
Y := X

The first value of x is dead (never used).
The second value of x is live (may be used).
Liveness is an important concept.

A precise definitionA variable x is live at statement s if its value at s can have an effect on the observable behaviour of the program's execution in the future. This, however, is an undecidable problem, as the halting problem can be reduced to it. Consider the program x:=e; f(); print x where x is live at the end of x:=e only if f() terminates.

An over-approximate definition that is computableA variable x is live at statement s if

There exists a statement s' that uses x.
There is a path from s to s'.
That path has no intervening assignment to x.

A statement x:=... is dead code if x is dead after the assignment. Dead statements can be eliminated from the program. But we need liveness information first . . .

We can express liveness in terms of information transferred between adjacent statements, just as in constant propagation.

Liveness is simpler than constant propagation, since it is a boolean property (true or false).

Here the set of values is False (definitely not live) and True (may be live). False > True. The glb function is the boolean-OR function in this set of values.

 <------ p ------->
         |
         |
         v

Rules:

L(x,p,out) = OR{L(x,s,in) | s is a successor of p}. x is live at p if x is live at one of the successor nodes of p.
s: ... := f(x). L(x,s,in) = true if s refers to x on the RHS.
s: x := e where e does not refer to x. L(x,x:=e,in)<=false if e does not refer to x. x is dead before an assignment to x because the current value of x at that point will not be used in the future.
L(x,s,in) <= L(x,s,out) if s does not refer to x.

This can be made more precise by adding a more precise transfer function, such that L(x,x:=f(x),in)<=false if L(x,x:=f(x),out)<=false.

Algorithm

Let all L(...) = false initially.
Repeat until all statements s satisfy rules 1-4
- Pick s where one of 1-4 does not hold and minimally update the value in the "then" of that rule, so that the constraint gets satisfied.

Example:

    ==== L(x) = false
x := 0
    ==== L(x) = false
while (x!= 10) {
		==== L(x) = false -> true (step 1)
  x = x + 1;
    ==== L(x) = false
}
    ==== L(x) = false
return;
    ==== L(x) = false

    ==== L(x) = false
x := 0
		==== L(x) = false -> true (step 3)
while (x!= 10) {
		==== L(x) = false -> true (step 1)
  x = x + 1;
    ==== L(x) = false -> true (step 2)
}
    ==== L(x) = false
return;
    ==== L(x) = false

A value can change from false to true, but not the other way round. So we use the ordering false > true
Each value can change only once, so termination is guaranteed.
Once the analysis is computed, it is simple to eliminate dead code.

Notice that information flowed in the forward direction (in the direction of the program execution) for constant propagation but flowed in the reverse direction (against the direction of the program execution) for liveness analysis. The former types of analyses are called forward dataflow analyses. The latter types of analyses are called backward dataflow analyses.

Liveness of multiple variables

We can maintain a set of live variables at each program point. Just like constant-propagation, we can derive the efficiency benefits of maintaining liveness for all variables in one go. Further, we can also derive precision benefits by making our transfer functions more precise (by relying on liveness of other variables). For example, L(y,x:=y+1,in) can be false if L(x,x:=y+1,out) is false.

Some other analyses that can be modeled as dataflow analyses

Common-subexpression elimination: maintain a set of available expressions of the form x = op(y,z) at each statement. If we see another expression of the type u = op(y,z), replace by u=x. Available expressions is a forward data-flow analysis. A value is available at a program location only if it is available at all the predecessor program locations, i.e., the available expressions at a statement is the intersection of the available expressions at the predecessors. In other words, the meet operator is intersection. Works best with single-assignment form because can maintain a larger set of available expressions.
Copy-propagation: an analysis similar to available-expressions, except only for expressions that are of the form x=y. The main difference is in the transformation (substituting variable names vs. substituting expressions by variable names).

value	interpretation
\top	This statement is unreachable
C	X = constant C on all incoming paths
\bot	X may not be a constant