next up previous
Next: About this document ... Up: work3-ans Previous: Problem 2

Problem 3

At last, we have a branch delay slot that has room for two instructions. We have to think of the two instructions that can be put in our branch delay slot. Let's number the instructions of the loop and inspect them one by one.



no. Instruction Inspection
1 L.S F2,100(R1) Writes a value that is read by instruction no. 3. Therefore it cannot be executed after instruction no. 3.
2 L.S F3,500(R1) Writes a value that is read by instruction no. 3. Therefore it cannot be executed after instruction no. 3.
3 SUB.S F5,F3,F2 Writes a value that is read by instruction no. 4. Therefore it cannot be executed after instruction no. 4.
4 ADD.S F5,F5,F4 Writes a value that is read by instruction no. 5. Therefore it cannot be executed after instruction no. 5.
5 S.S 1000(R1),F5 Reads register $R1$ which will be written by instruction no. 6. Therefore it cannot be executed after instruction no. 6.
6 DADDI R1,R1,#4 Writes a value that is read by instruction no. 7. Therefore it cannot be executed after instruction no. 7.
7 DADDI R5,R1,#-400 Writes a value that is read by instruction no. 8. Therefore it cannot be executed after instruction no. 8.
8 BNEZ R5,Loop The branch instruction itself.



We notice that there is some relation between each instruction and its successor that prevents moving any of them to the branch delay slot. But, observe that instruction no. 6 changes the value of $R1$ by the immediate value 4. So, if we changed instruction no. 5 to be S.S 996(R1),F5, it could be executed after instruction no. 6 without changing the semantics of the instructions. Then, doing that, we can move the couple of instructions 4 and 5 to the branch delay slot. So, our loop now looks like



L.S F2,100(R1)
L.S F3,500(R1)
SUB.S F5,F3,F2
DADDI R1,R1,#4
DADDI R5,R1,#-400
BNEZ R5,Loop
ADD.S F5,F5,F4
S.S 996(R1),F5



Unrolling the first two iterations of the loop after the modification we have made gives us enough information about the pattern of the remaining iterations. We get the following table:



Instruction IF ID EX MEM WB Comments
L.S F2,100(R1) 5 6 7 8 9  
L.S F3,500(R1) 6 7 8 9 10  
SUB.S F5,F3,F2 7 8-9 10-14 15 16 Stall for RAW hazard with F3 in ID stage.
DADDI R1,R1,#4 8-9 10 11 12 13  
DADDI R5,R1,#-400 10 11 12 13 14  
BNEZ R5,Loop 11 12 13 14 15  
ADD.S F5,F5,F4 12 13-14 15-19 20 21 Stall for RAW hazard with F5 in ID stage.
S.S 996(R1),F5 13-14 15 16-20 21 22 Stall in EX stage for both RAW hazard with F5 and structural hazard with MEM.
L.S F2,100(R1) 15 16-20 21 22 23  
L.S F3,500(R1) 16-20 21 22 23 24  
SUB.S F5,F3,F2 21 22-23 24-28 29 30 Stall for RAW hazard with F3 in ID stage.
DADDI R1,R1,#4 22-23 24 25 26 27  
DADDI R5,R1,#-400 24 25 26 27 28  
BNEZ R5,Loop 25 26 27 28 29  
ADD.S F5,F5,F4 26 27-28 29-33 34 35 Stall for RAW hazard with F5 in ID stage.
S.S 996(R1),F5 27-28 29 30-34 35 36 Stall in EX stage for both RAW hazard with F5 and structural hazard with MEM.



From the table above, we see that the number of clock cycles for the first iteration is 18 and the number of clock cycles for the ramaining iterations is 22. Also, we find that the last eight clock cycles of each iteration are the first eight clock cycles of its next iteration. Then we have

\begin{displaymath}CPI_{First Iteration} = \frac{18}{8} = 2.25\end{displaymath}


\begin{displaymath}CPI_{Any Other Iteration} = \frac{22}{8} = 2.75\end{displaymath}


\begin{displaymath}CPI_{Entire Loop} = \frac{18 + (22 - 8)\times 99}{800} = 1.755\end{displaymath}


next up previous
Next: About this document ... Up: work3-ans Previous: Problem 2
MM Hugue 2002-11-05

Web Accessibility