The topic of this page is to determine whether or not a Loop
can be run in parallel. This is determined by looking at the
structure of the loop and figuring out what the data dependancies
are, and whether they will allow the compiler to Unroll the loop.
The primary data dependancies we need to look at are between
two instructions INSIDE the loop, and between two instructions
accross iterations, or an inter-iteration dependance.
The first dependance will not cause a problem for unrolling,
save that the instructions must be kept in the same order, when
the loop is unrolled. The second dependance may or maynot cause
the loop to be Non-Parallel. If the a statement requires a value
it produces in a previous iteration, then thatstatement is said
to have a Circular Dependance. The occurance of a Circular
Dependance, means the loop cannot be made Parallel.
Just because a loop has an inter-iteration dependance does not
mean that the loop cannot be unrolled, as long as the dependance
is NOT Circualr, than it can. The only thing that is neccessary
is that the inter-iteration dependance needs to be removed by
moving peices of the code around.
Example 1
Can the following Loop be made Parallel?
for (i=1;i<=100;i=i+1){
A[i+1] = A[i] + C[i]; /*S1*/
B[i+1] = B[i] + A[i+1]; /*S2*/
}
To find the solution, we first need to look at the dependancies.
Inside one iteration:
S2 is dependant on S1 for the value of A[i+1].
Between iterations:
S1 depends on S1 for
the value of A[i] computed in iteration i-1.
S2 depends on S2 for
the value of B[i] computed in iteration i-1.
Note that S1 depends on S1, and S2 depends on S2 inbetween loop
iterations. This is the RED flag we discussed earlier. Because
the loop containscircular data dependancies, this loop can not
be unrolled.
Example 2
Lets look at another loop.
for (i=1;i<=100;i=i+1)
{
A[i] = A[i] + B[i]; /*S1*/
B[i+1] = C[i] + D[i]; /*S2/
}
First start at the dependancies inside one interation of the loop. The
only dependance inside the loop is A[i] as a destination and source, but
this is not really an issue.
Ok, since that is not an issue, lets take a look at the dependancies
between iterations:
B[i+1] is computed, so for iteration i, S1 depends on S2 in iteration i-1.
Since no line contains a circular dependance, the loop can be unrolled.
However, it will need a little arranging to remove the one dependance that
is still there.
Since S1 does not depend on S2, the instructions do not need to be
executed in the order that they are above. This is important to note, since
we want to remove the inter-iteration dependance. To remove this dependance
we need to shift the loop a little.
If we execute S2 for iteration i, then S1 for iteration i+1 as the
loop. Doing this will cause S1 to be 'missed' on the first iteration, so
to fix that, we will need to run it once before entering the loop, with an
index of 1. In effect this executes half of the first iteration of the loop
and so we need to run S2 once the loop is complete to finish the other half
of it. (Remember here that we are actually executing half of each iteration
in our new loop) With those modifications the loop now looks like this:
A[1] = A[1] + B[1];
for (i=1;i<=99;i=i+1)
{
B[i+1] = C[i] +D[i];
A[i+1] = A[i+1] + B[i+1];
}
B[101] = C[100] + D[100];
This loop can be unrolled, and then run multiple iterations with fewer
branches, which, of course, eliminates stalls.
A quick summary of the material presented here:
If there is a circular dependance in between loop iterations, Loop can not
be unrolled. End of discussion.
If there are dependancies inbetween too iterations, but they are not
circular, you may be able to rewrite the code so that the dependancies
are located in a inside a single iteration, and then the loop can be
unrolled. If you can not rewrite the loop to remove the inter-iteration
dependancies then the loop is not parrallel.
If there are no data dependancies, then the loop can be unrolled, as it
is.