next up previous
Next: About this document ... Up: Problem 2: Duplicated Processors Previous: Problem 3: Doctoring Pipelines

MMM-DLX Specifications




Warning: You MUST assume that the implementation includes separate memories; register writes and reads split across a clock cycle; and any forwarding, bypassing, or load interlocks required to resolve RAW hazards. Be sure to write down any additional assumptions you make.




The most significant modification in the MMM-DLX from the DLX architectures in H&P was moving the MEM stage from after the EX stage(s) to before the execution pipelines. Since ALU instructions can now have a memory access, we restrict the register addressing modes to register indirect. That is, loads and stores no longer require an effective address computation.

Instead, you have to have the exact address in the register. Thus, instruction ( LF F2, (R2)) will put the value stored at the address in R2 into register F2. The instruction ( ADD R2,R3,(R1)) will add the value stored at the address in R1 to the value in R3 and put the result in register R2. We call these new instructions ALUM (for ALU memory) to distinguish them from the standard ALU instructions. The symbol ALU/M refers to both ALU and ALUM instructions collectively.





A sample code fragment appears below:

LW R2, (R1)
LF F0, (R3)
LF F2, (R2)
MULTF F1, F0, F2
MULTF F1, (R4), F1
SF (R3),F1
ADDI R2, R2, #-16
LD F4, (R2)
ADDI R2, R2, #-16
ADDD F6,F4, (R2)
ADDI R2, R2, #-16
SD (R2),F6
SW (R4), R2
LW R3, (R10)
XORI R5, (R3), #-1
SW (R3),R5




***The details of the MMM-DLX appear on the next page.***

PS:
Pipeline Stages

F Instruction Fetch
ID Decode instruction, fetch register values and
  test branch condition
MEM Memory access for loads and stores and
  branch target computed
LWB/EX Load write back (1 cycle) and
  INT and FP ALU Units execute
  (see FU chart below)
ALWB ALU write back



FU:
EX Stage Functional Units and Instruction Mix

The term full pipe (fully pipelined) means that each pipeline stage is one clock cycle.

Unit   #Stages or CC's Ins Freq
INT ALU   2 stage $0.60$
FP ADD   8 full pipe stages A1-A8 $0.15$
FP MULT   18 full pipe stages M1-M18 $0.20$
FP DIV   60 stages $0.05$




CPU:
CPU Execution-Time Specific Parameters



Ins Type Ins Freq # of CPU CC  
ALU $0.40$ $\beta$  
ALUM $0.20$ $\beta$  
Load $0.20$ 1  
Store $0.10$ 1  
Cond Branch $0.09$ 3  
Jump $0.01$ 1  


Note: Problem 3.1 asks you to compute the value of $\beta$.

MEM:
Memory Stall Parameters

Mem Access Miss Rate Miss Penalty
Ins Fetch 0.05 XXX
Data Fetch 0.10 XXX
     
Reads XXX 100 CC
Writes XXX 250 CC


TAKE A DEEP BREATH. NOW, DO ANY 8 (EIGHT). THERE ARE 12 SIX-POINT PROBLEMS THERE. FIND SOMETHING YOU CAN DO. AND LABEL YOUR WORK CLEARLY FOR PARTIAL CREDIT CONSIDERATION.

3.1
Write an expression for $\beta$, the average number of clock cycles for an ALU or ALUM instruction, assuming the ALU instruction mix shown in the FU Table.

















3.2
Write an expression for the AVG CPI on the MMM-DLX assuming a perfect cache. That is, assume that the miss rate or MR, is zero. Your answer should be in terms of $\beta$, the average number of clock cycles for an ALU instruction, and assuming the instruction mix given in the CPU table.















3.3
Write an expression for the AVG CPI on the MMM-DLX taking the cache misses into account. Again, your answer should be in terms of $\beta$. Be sure to consider instruction and data fetches for each instruction type listed in the CPU Table.

3.4
Identify any potential structural hazards in the MMM-DLX. If none exist, explain why there are none.















3.5
Assess the potential for WAR, WAW, and RAR hazards in the MMM-DLX. That is, if they occur, give an example. If they don't, explain why they can't occur.

3.6
What type of strategy do you think is used in the MMM-DLX to deal with control hazards? Give and example of a control hazard, and explain your answer for full credit.




















3.7
Write an expression for the speedup due to the MMM-DLX control hazard handling method over the method of stalling until the control hazard is cleared.

For the next set of problems, credit is 2 points per correct latency per row in each latency table. Partial credit will be given if work is organized and labelled clearly.



3.8, 3.9
Using the information in the FU table and the pipeline structure, determine the latencies in the MMM-DLX for the dependent pairs of ALU instructions in the table below. When the symbol ALU/M is used, you may assume that the ALU and ALUM instructions have the same latencies associated with them.



Hint: for each dependent instruction pair, identify which stage of the producer instruction is the ``source'' of the value that must be ``received'' by the consumer instruction.



Producer Consumer Latency
Instruction Instruction  
FP ADD FP ALU/M  
FP DIV FP ALU/M  
FP MUL FP ALU/M  
FP Load FP ALU/M  
INT ALU/M INT ALU  
INT ALU/M INT ALUM  

3.10, 3.11
Determine the latencies associated with the pairs of operations given below. The notation is described in the previous problem's directions. Again, two points per correct row.

Producer Consumer Latency
Instruction Instruction  
INT ALU/M FP Load  
INT ALU/M FP Store  
INT ALU/M INT Store  
FP ALU/M FP Load  
FP ALU/M FP Store  
INT ALU/M Conditional Branch  

3.12
The is the last of the MMM-DLX latency problems. The previous four questions (3.8-3.11) never examined the potential latencies between INT Load producers or INT ALU/M producers and FP ALU and FP ALUM consumers. So, you get to. Make sure you explain your reasoning clearly.

Just some silly scratch paper.


next up previous
Next: About this document ... Up: Problem 2: Duplicated Processors Previous: Problem 3: Doctoring Pipelines
MM Hugue
2001-11-01