Don't panic, see the steps below the code fragment.
LD R1, 100(R1);
LD R2, 200(R1);
DADDI R1, R2, #3000;
DSUB R2, R4, R1;
AND R2, R3, R2;
SD R2,400(R2);
LD R3,100(R0);
XOR R3, R3, R3;
SD 0(R3), R3;
LD R3, 0(R2);
SD R3,100(R3);
Interpretation:
LD R1, 100(R1);
LD R2, 200(R1);
DADDI R1, R2, #3000;
DSUB R2, R4, R1;
AND R2, R0, R2;
SD R2,400(R2);
LD R3,100(R0);
XOR R3, R3, R3;
SD R3, 0(R3);
LD R3, 0(R2);
SD R3,100(R3);
It says that the speedup from pipelining is
Why do I just adore this formula? NOT! Go back and look at your problem 3.1 and 4.1 execution profiles (my term for mapping out what happens while executing on the pipe-like pipe) for the unoptimized code fragment. Then, apply this formula. And, then, tell me. I have a real problem applying this one. Why? Can you compute the number of stalls per instruction, independent of instruction order?