- Loop: LW R3, 0(R1) ; load in an array entry
- Loop: LW R6, 4(R1) ; load in an array entry
- Loop: LW R9, 8(R1) ; load in an array entry
- ADDI R4, R3, #50 ; add a constant
- ADDI R7, R6, #50 ; add a constant
- ADDI R10, R9, #50 ; add a constant
- MULT R4, R4, R4 ; square the new value
- MULT R7, R7, R7 ; square the new value
- MULT R10, R10, R10 ; square the new value
- SW R4, 600(R1) ; store the new value
- SW R4, 604(R1) ; store the new value
- SW R4, 608(R1) ; store the new value
- ADDI R1, R1, #12 ; increment pointer
- SUBI R5, R1, #300 ; check whether ended
- BNEZ R5, Loop ; branch
The above is a simple unrolling three times. In a five stage DLX pipe-like pipe with forwarding, this will eliminate all of the stalls associated with the loop except the loop control instructions -- the two before the branch, and the branch itself. You can remove the stalls associated with the two instructions before the branch by moving them higher in the loop, and changing the offset to array access instructions. As follows:
- Loop: LW R3, 0(R1) ; load in an array entry
- Loop: LW R6, 4(R1) ; load in an array entry
- Loop: LW R9, 8(R1) ; load in an array entry
- ADDI R1, R1, #12 ; increment pointer
- ADDI R4, R3, #50 ; add a constant
- ADDI R7, R6, #50 ; add a constant
- ADDI R10, R9, #50 ; add a constant
- MULT R4, R4, R4 ; square the new value
- MULT R7, R7, R7 ; square the new value
- MULT R10, R10, R10 ; square the new value
- SUBI R5, R1, #300 ; check whether ended
- SW R4, 588(R1) ; store the new value
- SW R4, 592(R1) ; store the new value
- SW R4, 596(R1) ; store the new value
- BNEZ R5, Loop ; branch
Thus, all stalls are eliminated except those required due to the control hazard -- you can't branch until you know the location you are branching to.