int A[200]; for (i = 0; i < 100; i++) { A[2*i] = 4; }Here the array address is an affine function of the loop iteration variable
i
.
A[2*i] = (A[2*i - 1] * 31) % 23;We cannot parallelize this loop either (even though all accesses are affine) because there are data-dependencies across different iterations of the loop.
We assume in this discussion that all loop iteration variables have a lower and an upper bound. Also, we assume that the iteration variables increment by 1 in each iteration (if the variable iterates by more than 1, we can introduce a new iteration variable, and write the old iteration variable in terms of the new variable, such that we satisfy this property). The space of the loop iteration variables can be represented as a convex polyhedron (in d-dimensional space, where d is the loop-nest depth and the number of loop iteration variables). We can reason about the relationships between different iterations by reasoning about relationships between points in the polyhedral space.
Two categories of transformations: affine and blocking.
Loop-level parallelism example:
for (i = 0; i < n; i++) { Z[i] = X[i] * 2; }Each iteration is logically independent of other iterations. If there are M processors, each processor
p
can execute the following code:
b = ceil(n/M); for (i = b*p; i > min(n,b*(p+1)); i++) { Z[i] = X[i] * 2; }This type of a program is also called SPMD (single program multiple data) program, i.e., the same code is executed by all the processors but it is parameterized by an identifier (
p
) unique to each processor. Typically, one processor, known as the master executes all the serial part of the computation, including synchronizing with all the other processors (including itself). In this example, we need barrier synchronization.
for (i = 0; i < 9; i++) { for (j = i; j < 7 && j < i + 4; j++) { A[i,j] = 0; //(0,0),(0,1),(0,2),(0,3),(1,1),(1,2),(1,3),(1,4),(2,2),(2,3),(2,4),(2,5),(3,3),(3,4),(3,5),(3,6),(4,4),(4,5),(4,6),(5,5),(5,6),(6,6) } }Draw the polygon that represents the iteration space.
Execution order of loop nests: A sequential execution of the loop nest executes the points in the space in ascending lexicographic order. A vector i=[i1,..in] is < i'=[i'1,..i'n'] iff \exists{m} s.t. m >= 0 and m<min(n,n') and [i1..im]=[i'1..i'm] and i_(m+1)<i'_(m+1).
Matrix formulation: The iteration space can be formulated in matrix form as:
{i \in Z^d | Bi + b >= 0}
Loop invariant variables (aka symbolic constants). Example:
for (i = 0; i <= n; i++) { ... }We need to consider
n
as another loop iteration variable:
[ -1 1 ] [ i ] >= [ 0 ] [ 1 0 ] [ n ] [ 0 ]Notice that while
n
is a loop iteration variable, its only constraint is that it is greater than or equal to i
.
This representation of the iteration space has no notion of the order in which the iteration space needs to be sweeped. If we can determine that the loop iterations are independent (for example), we can completely alter the order of sweeping, e.g., by swapping the inner and outer loops of the previous example:
for (j = 0; j <= 6; i++) { for (i = max(j-3, 0); i <= j; j++) { A[i,j] = 0; //(0,0),(0,1),(0,2),(0,3),(1,1),(1,2),(1,3),(1,4),(2,2),(2,3),(2,4),(2,5),(3,3),(3,4),(3,5),(3,6),(4,4),(4,5),(4,6),(5,5),(5,6),(6,6) } }To do this, we need to first identify the potential values that can be taken by
j
, and then identify the potential values that can be taken by
i
for a given value of j
. The is done by projecting the convex polyhedron onto the outer dimension of the space. e.g., the
projection of the 2D space in the example above onto the j
axis
is the line segment from 0 to 7. When we project a 3-dimensional polyhedron
to a 2D space, we get a polygon in the 2D space. The concept of projection
is quite useful in many of our analyses, including loop bound generation.
Fourier-Motzkin elimination algorithm:
L <= c1*xm c2*xm <= Ucreate the new constraint:
c2*L < c1*U
Loop bounds generation: All the inequalities involving the innermost loop index variables are written as the variable's lower and upper bounds. Then project away the dimension representing the innermost loop. In the example above (where i is the outer loop index), by considering i as the innermost loop index, we get:
Li = j-3, 0 Ui = j Lj = 0 Uj = 7Now we can visit the iteration space either "horizontally" (lexicographic in (i, j)) or "vertically" (lexicographic in (j, i)).
Changing axes: we can also define a new variable k = j - i
and
sweep the iteration space in lexicographic order with respect to k
and j
.
j-k >= j-3 j-k >= 0 j-k <= j j >= 0 j <= 7This can be simplified to:
k <= 3,j k >= 0 j >= 0 j <= 7This can easily be expressed with j as the outer loop index. If we now want to make k as the outer loop index, we can first specify j's bounds in terms of k, and then project out j:
Lj = 0,k Uj = 7 Lk = 0 Uk = 3We get:
for (k = 0; k < 3; k++) { for (j = k; j <= 7; j++) { A[j-k,j] = 0; } }
Formally, we represent an array access in a loop nest that uses a vector of index variables i by the four-tuple T = <F,f,B,b>
; it maps a vector i within the bounds
Bi + b > 0to the array element location
Fi + fWhat are F,f (variables i,j) for
[1 0] [0 1]f = [0 0]^T