Here, we assume that our branch prediction is perfect and we have no stalls caused by branching. Assume that the numbering of clock cycles starts from 1 and the first instruction in the given code fragment is fetched in the first clock cycle. Then we have the following table:
Instruction | IF | ID | EX | MEM | WB | Comments |
L.S F2,100(R1) | 5 | 6 | 7 | 8 | 9 | |
L.S F3,500(R1) | 6 | 7 | 8 | 9 | 10 | |
SUB.S F5,F3,F2 | 7 | 8-9 | 10-14 | 15 | 16 | Stall for RAW hazard with F3 in ID stage. |
ADD.S F5,F5,F4 | 8-9 | 10-14 | 15-19 | 20 | 21 | Stall for RAW hazard with F5 in ID stage. |
S.S 1000(R1),F5 | 10-14 | 15 | 16-20 | 21 | 22 | Stall in EX stage for both RAW hazard with F5 and structural hazard with MEM. |
DADDI R1,R1,#4 | 15 | 16-20 | 21 | 22 | 23 | |
DADDI R5,R1,#-400 | 16-20 | 21 | 22 | 23 | 24 | |
BNEZ R5,Loop | 21 | 22 | 23 | 24 | 25 |
For one iteration of the loop, we have the total number of clock cycles is 21
and we have eight instructions. Then
For the entire loop, we have the last four clock cycles of each iteration
are the first four clock cycles of the next iteration. So, the number of
clock cycles of the entire loop will be
because we have
100 iterations. We have 800 instructions for the entire loop. Then