Q & A
Questions:
1. What are some advantes of superpipelining (longer pipelines)? Disadvantages?
2. What are the pipeline stages that can be sources for forwarding/bypassing?
3. Show how the following code executes on the 5-Stage DLX machine and on the MIPS R4000. Compare their CPIs. (Assume forwarding paths, separate data and instruction, split memory R/W). What does this tell you about longer pipelines?
4. If FP Add is issued at clock cycle 0, what clock cycles could you issue an FP Divide so that it will cause a stall in the floating point pipeline? FP Subtract? FP Compare?
5. Looking at the structure of the Integer and FP pipelines, can you identify the four major causes of pipeline stalls?
Answers:
1. Good: By making more stages that do less work, we can theoretically increase throughput, and increase the clock rate.
Bad: Longer pipelines lead to more stalls, there is more overhead between each stage, and the pipeline requires more complicated logic (eg: forwarding paths)
2. EX/DF, DF/DS, DS/TC, TC/WB
3. 5-Stage DLX
Instruction | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
LW R3, 0(R1) | F | D | X | M | W | |||||
ADD R2, R3, R3 |
F | D | S | X | M | W | ||||
AND R4, R3, R2 | F | D | S | X | M | W | ||||
SUB R4, R3, R2 | F | D | S | X | M | W |
CPI = 9/4 = 2.25
R4000:
Instruction | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
LW R3, 0(R1) | IF | IS | RF | EX | DF | DS | TC | WB | ||||||
ADD R2, R3, R3 |
IF | IS | RF | stall | stall | EX | DF | DS | TC | WB | ||||
AND R4, R3, R2 | IF | IS | stall | stall | RF | EX | DF | DS | TC | WB | ||||
SUB R4, R3, R2 | IF | stall | stall | IS | RF | EX | DF | DS | TC | WB |
Stalls occur as all the instructions must wait for the first instruction to complete the DS stage, which is where the LW can be forwarded from.
CPI = 13/4 = 3.25
The CPIs show that superpipelining actually takes more cycles per instruction, especially if the instructions are not independent, therefore the performance gains inherent in superpipelining are due to the fact that the clock rate can be as much as doubled.
4. FP Add followed by FP Divide:
Operation | Issue/Stall | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
Add | Issue | U | S+A | A+R | R+S | ||||||||
Divide | Stall | U | A | R | D | D | D | D | D | D | D | D | |
Issue | U | A | R | D | D | D | D | D | D | D | |||
Issue | U | A | R | D | D | D | D | D | D |
If you issue a Divide 1 cycle after an Add, it will cause a stall.
FP Add followed by FP Subtract
Operation | Issue/Stall | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
Add | Issue | U | S+A | A+R | R+S | |||||
Subtract | Stall | U | S+A | A+R | R+S | |||||
Stall | U | S+A | A+R | R+S | ||||||
Issue | U | S+A | A+R | R+S |
If you issue a subtract 1 or 2 cycles after an Add, it will cause a stall.
FP Add followed by FP Compare
Operation | Issue/Stall | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
Add | Issue | U | S+A | A+R | R+S | |||||
Compare | Stall | U | A | R | ||||||
Issue | U | A | R |
If you issue a Compare 1 cycle after an Add, it will cause a stall.
5.