The delayed branch is a difficult topic to grasp. In the DLX 5-stage pipeline we have found it easy to misunderstand the purpose of filling the branch delay slot with a single necessary instruction. Our focus is to remove the mystery of delayed branches with examples and explanations that clarify the topic. We will consider the case where machines with delayed branches have a single instruction delay, as the Hennessey and Patterson book explains in great detail.
In some examples, it is hard to figure out why certain instructions should be placed after the branch. Also, it might be confusing to some that only one instruction would absorb the stall that would normally occur while a branch instruction is executed.
With the help of key term definitions, it will be easier to learn how to unroll a loop as well as reschedule it. Then, determine which instruction best fills the branch delay slot. Keep the following guidlines in mind while solving the problems.
- Each time a branch is encountered by the compiler place a useful instruction in the following slot.
- What to put in the slot?
* Instruction from before the branch
- Branch must not depend on moved instruction
- Always improves performance
* From branch target
- Must be OK to execute moved instruction when the branch is not taken
- Improves performance when branch is taken
* From fall through
- Must be OK to execute moved instruction when branch is taken
- Improves performance when branch is not taken
Read through the explanations and examples in this tutorial using the
Dictionary of Terms located at the bottom of each page. Then, come back
to the Main Page to complete these problems. They will be helpful both
in understanding the topic and studying for exams!