At last, we have a branch delay slot that has room for two instructions. We have to think of the two instructions that can be put in our branch delay slot. Let's number the instructions of the loop and inspect them one by one.
no. | Instruction | Inspection |
1 | L.S F2,100(R1) | Writes a value that is read by instruction no. 3. Therefore it cannot be executed after instruction no. 3. |
2 | L.S F3,500(R1) | Writes a value that is read by instruction no. 3. Therefore it cannot be executed after instruction no. 3. |
3 | SUB.S F5,F3,F2 | Writes a value that is read by instruction no. 4. Therefore it cannot be executed after instruction no. 4. |
4 | ADD.S F5,F5,F4 | Writes a value that is read by instruction no. 5. Therefore it cannot be executed after instruction no. 5. |
5 | S.S 1000(R1),F5 | Reads register ![]() |
6 | DADDI R1,R1,#4 | Writes a value that is read by instruction no. 7. Therefore it cannot be executed after instruction no. 7. |
7 | DADDI R5,R1,#-400 | Writes a value that is read by instruction no. 8. Therefore it cannot be executed after instruction no. 8. |
8 | BNEZ R5,Loop | The branch instruction itself. |
We notice that there is some relation between each instruction and its
successor that prevents moving any of them to the branch delay slot.
But, observe that instruction no. 6 changes the value of by the
immediate value 4. So, if we changed instruction no. 5 to be
S.S 996(R1),F5, it could be executed after instruction no. 6 without changing the semantics of the instructions. Then,
doing that, we can move the couple of instructions 4 and 5 to the
branch delay slot. So, our loop now looks like
L.S | F2,100(R1) |
L.S | F3,500(R1) |
SUB.S | F5,F3,F2 |
DADDI | R1,R1,#4 |
DADDI | R5,R1,#-400 |
BNEZ | R5,Loop |
ADD.S | F5,F5,F4 |
S.S | 996(R1),F5 |
Unrolling the first two iterations of the loop after the modification we have made gives us enough information about the pattern of the remaining iterations. We get the following table:
Instruction | IF | ID | EX | MEM | WB | Comments |
L.S F2,100(R1) | 5 | 6 | 7 | 8 | 9 | |
L.S F3,500(R1) | 6 | 7 | 8 | 9 | 10 | |
SUB.S F5,F3,F2 | 7 | 8-9 | 10-14 | 15 | 16 | Stall for RAW hazard with F3 in ID stage. |
DADDI R1,R1,#4 | 8-9 | 10 | 11 | 12 | 13 | |
DADDI R5,R1,#-400 | 10 | 11 | 12 | 13 | 14 | |
BNEZ R5,Loop | 11 | 12 | 13 | 14 | 15 | |
ADD.S F5,F5,F4 | 12 | 13-14 | 15-19 | 20 | 21 | Stall for RAW hazard with F5 in ID stage. |
S.S 996(R1),F5 | 13-14 | 15 | 16-20 | 21 | 22 | Stall in EX stage for both RAW hazard with F5 and structural hazard with MEM. |
L.S F2,100(R1) | 15 | 16-20 | 21 | 22 | 23 | |
L.S F3,500(R1) | 16-20 | 21 | 22 | 23 | 24 | |
SUB.S F5,F3,F2 | 21 | 22-23 | 24-28 | 29 | 30 | Stall for RAW hazard with F3 in ID stage. |
DADDI R1,R1,#4 | 22-23 | 24 | 25 | 26 | 27 | |
DADDI R5,R1,#-400 | 24 | 25 | 26 | 27 | 28 | |
BNEZ R5,Loop | 25 | 26 | 27 | 28 | 29 | |
ADD.S F5,F5,F4 | 26 | 27-28 | 29-33 | 34 | 35 | Stall for RAW hazard with F5 in ID stage. |
S.S 996(R1),F5 | 27-28 | 29 | 30-34 | 35 | 36 | Stall in EX stage for both RAW hazard with F5 and structural hazard with MEM. |
From the table above, we see that the number of clock cycles for the first iteration
is 18 and the number of clock cycles for the ramaining iterations is 22. Also,
we find that the last eight clock cycles of each iteration are the first eight
clock cycles of its next iteration. Then we have