next up previous
Next: About this document ... Up: Part2: Pipelining and the Previous: Part2: Pipelining and the

Problem 3: Hands ON the Pipe-Like Pipe

It would help to have read the beginning of Appendix A, and/or

3.1
What do I mean by ``pipe-like pipe''? That is, why do I constantly refer to it that way, instead of merely ``the pipeline''?














3.2
Compute the speedup due to forwarding and bypassing for the extremely strange code fragment below, assuming that the 5-stage pipe-like pipe is used. By raw, I mean the naive version that we discussed in class, before we'd ever heard of structural hazards. There is no split instruction and data memory, and register reads and writes conflict before and after bypassing and forwarding are added.

Don't panic, see the steps below the code fragment.

LD R1, 100(R1);
LD R2, 200(R1);
DADDI R1, R2, #3000;
DSUB R2, R4, R1;
AND R2, R3, R2;
SD R2,400(R2);
LD R3,100(R0);
XOR R3, R3, R3;
SD 0(R3), R3;
LD R3, 0(R2);
SD R3,100(R3);

Interpretation:

  1. Compute ${\rm CPI}_{\rm old}$ by tracking execution of the fragment without bypassing and forwarding.
  2. Compute ${\rm CPI}_{\rm new}$ by tracking execution of the fragment assuming bypassing and forwarding.
  3. Speedup is ${\rm CPI}_{\rm old}$ divided by ${\rm CPI}_{\rm new}$. Why? Make sure you know!

4.1
Repeat the last part of problem 3, except this time, assume that split instruction and data memory are used, and that registers are written in the first half of the clock cycle, and read during the second throughout the entire problem. That is, what is the speedup due to bypassing and forwarding when all the structural hazards have been removed?

LD R1, 100(R1);
LD R2, 200(R1);
DADDI R1, R2, #3000;
DSUB R2, R4, R1;
AND R2, R0, R2;
SD R2,400(R2);
LD R3,100(R0);
XOR R3, R3, R3;
SD R3, 0(R3);
LD R3, 0(R2);
SD R3,100(R3);

4.2
Okay, now, just for grins and giggles, see if you can play optimizing compiler and accomplish whatever it is our friendly code fragment with fewer instructions, if possible, and try to minimize the CPI, essentially repeat 3.2 and 4.1 on your modified code fragment. The point is for you to become comfortable with the way code executes on the pipeline. Why? Because we'll be doing a lot of that in the next two weeks.

4.3
Bored yet? Well, here's my favorite question of all. There's this formula on page A-13, on your cheat sheet, and in Bill's Pipelining Handout.

It says that the speedup from pipelining is


\begin{displaymath}\frac{1}{1+ {\rm Pipeline\;\;stall\;\;cycles\;\; per\; instruction}} \times
{\rm Pipeline\;\; depth}.\end{displaymath}

Why do I just adore this formula? NOT! Go back and look at your problem 3.1 and 4.1 execution profiles (my term for mapping out what happens while executing on the pipe-like pipe) for the unoptimized code fragment. Then, apply this formula. And, then, tell me. I have a real problem applying this one. Why? Can you compute the number of stalls per instruction, independent of instruction order?


next up previous
Next: About this document ... Up: Part2: Pipelining and the Previous: Part2: Pipelining and the
MM Hugue 2002-09-22

Web Accessibility