Chapter 4
4.1) Basic Compiler Techniques for Exposing ILP (page 304)
-
Basic Pipeline Scheduling and Loop Unrolling (page 304)
-
To keep a pipeline full, parallelism among instructions must be exploited
by finding sequences of unrelated instructions that can be overlapped in
the pipeline.
-
Definition: To avoid a pipeline stall, a dependent instruction must be
separated from the source instruction by a distance in clock cycles equal
to the pipeline latency of that source instruction.
-
Definition: A simple scheme for increasing the number of instructions relative
to the branch and overhead instructions relative to the branch and overhead
instructions is loop unrolling. Unrolling simply replicates
the loop body multiple times, adjusting the loop termination code.
-
Loop unrolling can also be used to improve scheduling. Because it
eliminates the branch, it allows instructions from different iterations
to be scheduled together.
-
In this case, we can eliminate the data use stall by creating additional
independent instructions within the loop body. If we simply replicated
the instructions when we unrolled the loop, the resulting use of the same
registers could prevent us from effectively scheduling the loop.
-
Summary of the Loop Unrolling Example (page 308)
-
The key to taking advantage of ILP to fully utilize the potential of the
functional units in a processor is to know when and how the ordering among
instructions may be changed.
-
The key requirement underlying all of these transformations is an understanding
of how an instruction depends on another and how the instructions can be
changed or re-ordered given the dependencies.
-
There are three different types of limits to the gains that can be achieved
by loop unrolling:
-
1) A decrease in the amount of overhead amortized with each unroll.
(loop overhead)
-
2) Code size limitations.
-
3) Compiler limitations.
-
The more you unroll a loop, the more loop overhead you have.
-
For larger loops, the code size growth may be a concern either in the embedded
space where memory may be at a premium.
-
Definition: The potential shortfall in registers that is created
by aggressive unrolling and scheduling is called register pressure.
The transformed code, while theoretically faster, may lose some or all
of its advantage because it generates a shortage of registers.
-
Using Loop Unrolling and Pipeline Scheduling With Static Multiple Issue
(page 312)
Alex Baglione @UMCP
CMSC 411 Summer 2002
Hennessey and Patterson, Computer Architecture: A Quantitative
Approach; Third Edition
Chapter 4 Notes - for educational agricultural
use only
Web Accessibility