CMSC411 - Chapter 4 Notes

Chapter 4

4.1) Basic Compiler Techniques for Exposing ILP (page 304)

Basic Pipeline Scheduling and Loop Unrolling (page 304)

To keep a pipeline full, parallelism among instructions must be exploited by finding sequences of unrelated instructions that can be overlapped in the pipeline.
Definition: To avoid a pipeline stall, a dependent instruction must be separated from the source instruction by a distance in clock cycles equal to the pipeline latency of that source instruction.
Definition: A simple scheme for increasing the number of instructions relative to the branch and overhead instructions relative to the branch and overhead instructions is loop unrolling. Unrolling simply replicates the loop body multiple times, adjusting the loop termination code.
Loop unrolling can also be used to improve scheduling. Because it eliminates the branch, it allows instructions from different iterations to be scheduled together.
In this case, we can eliminate the data use stall by creating additional independent instructions within the loop body. If we simply replicated the instructions when we unrolled the loop, the resulting use of the same registers could prevent us from effectively scheduling the loop.

Summary of the Loop Unrolling Example (page 308)

The key to taking advantage of ILP to fully utilize the potential of the functional units in a processor is to know when and how the ordering among instructions may be changed.
The key requirement underlying all of these transformations is an understanding of how an instruction depends on another and how the instructions can be changed or re-ordered given the dependencies.
There are three different types of limits to the gains that can be achieved by loop unrolling:

1) A decrease in the amount of overhead amortized with each unroll. (loop overhead)
2) Code size limitations.
3) Compiler limitations.

The more you unroll a loop, the more loop overhead you have.
For larger loops, the code size growth may be a concern either in the embedded space where memory may be at a premium.
Definition: The potential shortfall in registers that is created by aggressive unrolling and scheduling is called register pressure. The transformed code, while theoretically faster, may lose some or all of its advantage because it generates a shortage of registers.

Using Loop Unrolling and Pipeline Scheduling With Static Multiple Issue (page 312)

Meh.

Alex Baglione @UMCP
CMSC 411 Summer 2002
Hennessey and Patterson, Computer Architecture: A Quantitative Approach; Third Edition
Chapter 4 Notes - for ~~educational~~ agricultural use only

Web Accessibility