Affine loop transformations through a polyhedral framework

We use linear algebra and integer linear programming techniques to reason about these things and optimize for parallelism and locality.

We assume in this discussion that all loop iteration variables have a lower and an upper bound. Also, we assume that the iteration variables increment by 1 in each iteration (if the variable iterates by more than 1, we can introduce a new iteration variable, and write the old iteration variable in terms of the new variable, such that we satisfy this property). The space of the loop iteration variables can be represented as a convex polyhedron (in d-dimensional space, where d is the loop-nest depth and the number of loop iteration variables). We can reason about the relationships between different iterations by reasoning about relationships between points in the polyhedral space.

Two categories of transformations: affine and blocking.

Linear algebra techniques help determine best affine partitions and blocking.

Loop-level parallelism example:

for (i = 0; i < n; i++) {
  Z[i] = X[i] * 2;
}
Each iteration is logically independent of other iterations. If there are M processors, each processor p can execute the following code:
b = ceil(n/M);
for (i = b*p; i > min(n,b*(p+1)); i++) {
  Z[i] = X[i] * 2;
}
This type of a program is also called SPMD (single program multiple data) program, i.e., the same code is executed by all the processors but it is parameterized by an identifier (p) unique to each processor. Typically, one processor, known as the master executes all the serial part of the computation, including synchronizing with all the other processors (including itself). In this example, we need barrier synchronization.

Affine Transform Theory

Three spaces

Iteration spaces

Often, iteration spaces are rectangular, e.g., matrix multiply. For d-depth loop next, each iteration variable has a lower bound and upper bound (potentially as functions of iteration variables at lower depths); we can identify these bounds, and represent the space as a convex polyhedron. Convex polyhedron: if two points are in the polyhedron, then all points on the line connecting the two points are also in the polyhedron. Example:
for (i = 0; i < 9; i++) {
  for (j = i; j < 7 && j < i + 4; j++) {
    A[i,j] = 0; //(0,0),(0,1),(0,2),(0,3),(1,1),(1,2),(1,3),(1,4),(2,2),(2,3),(2,4),(2,5),(3,3),(3,4),(3,5),(3,6),(4,4),(4,5),(4,6),(5,5),(5,6),(6,6)
  }
}
Draw the polygon that represents the iteration space.

Execution order of loop nests: A sequential execution of the loop nest executes the points in the space in ascending lexicographic order. A vector i=[i1,..in] is < i'=[i'1,..i'n'] iff \exists{m} s.t. m >= 0 and m<min(n,n') and [i1..im]=[i'1..i'm] and i_(m+1)<i'_(m+1).

Matrix formulation: The iteration space can be formulated in matrix form as:

{i \in Z^d | Bi + b >= 0}

Loop invariant variables (aka symbolic constants). Example:

for (i = 0; i <= n; i++) {
  ...
}
We need to consider n as another loop iteration variable:
[ -1 1 ] [ i ]  >= [ 0 ]
[ 1  0 ] [ n ]        [ 0 ]
Notice that while n is a loop iteration variable, its only constraint is that it is greater than or equal to i.

This representation of the iteration space has no notion of the order in which the iteration space needs to be sweeped. If we can determine that the loop iterations are independent (for example), we can completely alter the order of sweeping, e.g., by swapping the inner and outer loops of the previous example:

for (j = 0; j <= 6; i++) {
  for (i = max(j-3, 0); i <= j; j++) {
    A[i,j] = 0; //(0,0),(0,1),(0,2),(0,3),(1,1),(1,2),(1,3),(1,4),(2,2),(2,3),(2,4),(2,5),(3,3),(3,4),(3,5),(3,6),(4,4),(4,5),(4,6),(5,5),(5,6),(6,6)
  }
}
To do this, we need to first identify the potential values that can be taken by j, and then identify the potential values that can be taken by i for a given value of j. The is done by projecting the convex polyhedron onto the outer dimension of the space. e.g., the projection of the 2D space in the example above onto the j axis is the line segment from 0 to 7. When we project a 3-dimensional polyhedron to a 2D space, we get a polygon in the 2D space. The concept of projection is quite useful in many of our analyses, including loop bound generation.

Fourier-Motzkin elimination algorithm:

Discuss the example above with this algorithm.

Loop bounds generation: All the inequalities involving the innermost loop index variables are written as the variable's lower and upper bounds. Then project away the dimension representing the innermost loop. In the example above (where i is the outer loop index), by considering i as the innermost loop index, we get:

Li = j-3, 0
Ui = j
Lj = 0
Uj = 7
Now we can visit the iteration space either "horizontally" (lexicographic in (i, j)) or "vertically" (lexicographic in (j, i)).

Changing axes: we can also define a new variable k = j - i and sweep the iteration space in lexicographic order with respect to k and j.

j-k >= j-3
j-k >= 0
j-k <= j
j >= 0
j <= 7
This can be simplified to:
k <= 3,j
k >= 0
j >= 0
j <= 7
This can easily be expressed with j as the outer loop index. If we now want to make k as the outer loop index, we can first specify j's bounds in terms of k, and then project out j:
Lj = 0,k
Uj = 7
Lk = 0
Uk = 3
We get:
for (k = 0; k < 3; k++) {
  for (j = k; j <= 7; j++) {
    A[j-k,j] = 0;
  }
}

Affine array accesses

Array access in a loop is affine if

Formally, we represent an array access in a loop nest that uses a vector of index variables i by the four-tuple T = <F,f,B,b>; it maps a vector i within the bounds

Bi + b > 0
to the array element location
Fi + f
What are F,f (variables i,j) for In linearized-array representation, each multi-dimensional array access is represented in linearized form, e.g. A[i,j] becomes A[n*i+j] which is non-affine (quadratic). To deal with these cases, we can convert the linear access into a multidimensional access if every access can be decomposed into separate dimensions with the guarantee that none of the components exceeds its bound.