Many coded loops perform only a few instructions, or are very ordered. For example, this loop in C:
might be compiled into this code in DLX
In that DLX fragment, fully half of the instructions in the loop are merely loop overhead -- needed to control the flow of the loop, and access, but adding commands. Also, you are required (in a simple DLX pipelike pipe) to stall for the branch control hazard every ninth instruction. Futhermore, as the load, add, and store form a partial ordering, you will be forced to stall waiting for each command to reach a certain point (with forwarding) or to finish (without) before executing the next instruction.
The goal, then, is to limit the amount of loop overhead as a proportion of the commands, to decrease the number of control hazards in the total run of the loop, and to fill the unavoidable stall spots with independent instructions. The first two can be done somewhat through loop unrolling, and the last through rescheduling.
Prev | Next |