Loop: LW R3, 0(R1) ; load in an array entry
Loop: LW R6, 4(R1) ; load in an array entry
Loop: LW R9, 8(R1) ; load in an array entry
     ADDI R4, R3, #50 ; add a constant
     ADDI R7, R6, #50 ; add a constant
     ADDI R10, R9, #50 ; add a constant
     MULT R4, R4, R4 ; square the new value
     MULT R7, R7, R7 ; square the new value
     MULT R10, R10, R10 ; square the new value
     SW R4, 600(R1) ; store the new value
     SW R4, 604(R1) ; store the new value
     SW R4, 608(R1) ; store the new value
     ADDI R1, R1, #12 ; increment pointer
     SUBI R5, R1, #300 ; check whether ended
     BNEZ R5, Loop ; branch

The above is a simple unrolling three times. In a five stage DLX pipe-like pipe with forwarding, this will eliminate all of the stalls associated with the loop except the loop control instructions -- the two before the branch, and the branch itself. You can remove the stalls associated with the two instructions before the branch by moving them higher in the loop, and changing the offset to array access instructions. As follows:

Loop: LW R3, 0(R1) ; load in an array entry
Loop: LW R6, 4(R1) ; load in an array entry
Loop: LW R9, 8(R1) ; load in an array entry
     ADDI R1, R1, #12 ; increment pointer
     ADDI R4, R3, #50 ; add a constant
     ADDI R7, R6, #50 ; add a constant
     ADDI R10, R9, #50 ; add a constant
     MULT R4, R4, R4 ; square the new value
     MULT R7, R7, R7 ; square the new value
     MULT R10, R10, R10 ; square the new value
     SUBI R5, R1, #300 ; check whether ended
     SW R4, 588(R1) ; store the new value
     SW R4, 592(R1) ; store the new value
     SW R4, 596(R1) ; store the new value
     BNEZ R5, Loop ; branch

Thus, all stalls are eliminated except those required due to the control hazard -- you can't branch until you know the location you are branching to.


Prev Next