Subsections
Because Meesh loves to put it on exams - and it's a very simple
technique used for lowering CPI when branches are involved.
In section 3.5 of the book, they go into glorious dense detail about
the joys of control hazards. Basically, when you encounter some sort
of conditional branch instruction, there is always a one cycle delay
between the issue of the branch instruction and the branch test -
assumed here in the execute phase. This branch delay slot will
be filled with a NOP by a dumb compiler, the next instruction after
the branch (assume branch-not-taken) by a smarter compiler, and
ANYTHING by the smartest compilers. Not just anything, mind you. This
branch delay slot should contain a ``friendly'' non-destructive short
(one-cycle) instruction such as a load or store. But there's nothing
wrong with filling it with something destructive: it's the compiler's
job to make sure that the results of the bds doesn't affect the
following code.
The other ``wasted cycles'' come from loop overhead: bookkeeping code
like array index registers or counters is executed once for each loop
execution. This means tight loops are a no-no: for one loop execution,
you may have three total instructions to execute per loop - 2/3 of
the loop is overhead, 1/3 is actual work. However, if you do more loop
iterations before branching, you can reduce this overhead to a smaller
percentage - say we iterate 5 instructions at a time. This increaces
the work percentage to 5/8, and the overhead to 3/8.
Loop unrolling reduces the proportions of overhead to work in the
loop. Really what we are after is a decrease in CPI. Plug that in your
CPU execution equation, and you can find the speedup.
There's only one hope left: after we unroll the loop, we can play God
with our code, and reschedule it - move the instructions around to
minimize (hopefully eliminate) stalls. This is almost arbitrary: you
need to see what kinds of instructions you have to play with, how long
they take, how many instructions you *can* move around, and how many
other instructions you can stuff between multicycle operations. Yes,
the EVIL multicycle pipeline will rear its ugly head, once
again.
Previous: Contents
Home: Contents
Next: Oh crap, more numbers...