Warning: You MUST assume that the implementation includes separate memories; register writes and reads split across a clock cycle; and any forwarding, bypassing, or load interlocks required to resolve RAW hazards. Be sure to write down any additional assumptions you make.
The most significant modification in the MMM-DLX from the DLX architectures in H&P was moving the MEM stage from after the EX stage(s) to before the execution pipelines. Since ALU instructions can now have a memory access, we restrict the register addressing modes to register indirect. That is, loads and stores no longer require an effective address computation.
Instead, you have to have the exact address in the register. Thus, instruction ( LF F2, (R2)) will put the value stored at the address in R2 into register F2. The instruction ( ADD R2,R3,(R1)) will add the value stored at the address in R1 to the value in R3 and put the result in register R2. We call these new instructions ALUM (for ALU memory) to distinguish them from the standard ALU instructions. The symbol ALU/M refers to both ALU and ALUM instructions collectively.
A sample code fragment appears below:
LW R2, (R1)
LF F0, (R3)
LF F2, (R2)
MULTF F1, F0, F2
MULTF F1, (R4), F1
SF (R3),F1
ADDI R2, R2, #-16
LD F4, (R2)
ADDI R2, R2, #-16
ADDD F6,F4, (R2)
ADDI R2, R2, #-16
SD (R2),F6
SW (R4), R2
LW R3, (R10)
XORI R5, (R3), #-1
SW (R3),R5
***The details of the MMM-DLX appear on the next page.***
F | Instruction Fetch |
ID | Decode instruction, fetch register values and |
test branch condition | |
MEM | Memory access for loads and stores and |
branch target computed | |
LWB/EX | Load write back (1 cycle) and |
INT and FP ALU Units execute | |
(see FU chart below) | |
ALWB | ALU write back |
The term full pipe (fully pipelined) means that each pipeline stage is one clock cycle.
Unit | #Stages or CC's | Ins Freq | |
INT ALU | 2 stage | ||
FP ADD | 8 full pipe stages A1-A8 | ||
FP MULT | 18 full pipe stages M1-M18 | ||
FP DIV | 60 stages |
Ins Type | Ins Freq | # of CPU CC | |
ALU | |||
ALUM | |||
Load | 1 | ||
Store | 1 | ||
Cond Branch | 3 | ||
Jump | 1 |
Note: Problem 3.1 asks you to compute the value of .
Mem Access | Miss Rate | Miss Penalty |
Ins Fetch | 0.05 | XXX |
Data Fetch | 0.10 | XXX |
Reads | XXX | 100 CC |
Writes | XXX | 250 CC |
TAKE A DEEP BREATH. NOW, DO ANY 8 (EIGHT). THERE ARE 12 SIX-POINT PROBLEMS THERE. FIND SOMETHING YOU CAN DO. AND LABEL YOUR WORK CLEARLY FOR PARTIAL CREDIT CONSIDERATION.
For the next set of problems, credit is 2 points per correct latency per row in each latency table. Partial credit will be given if work is organized and labelled clearly.
Hint: for each dependent instruction pair, identify which stage of the producer instruction is the ``source'' of the value that must be ``received'' by the consumer instruction.
Producer | Consumer | Latency |
Instruction | Instruction | |
FP ADD | FP ALU/M | |
FP DIV | FP ALU/M | |
FP MUL | FP ALU/M | |
FP Load | FP ALU/M | |
INT ALU/M | INT ALU | |
INT ALU/M | INT ALUM |
Producer | Consumer | Latency |
Instruction | Instruction | |
INT ALU/M | FP Load | |
INT ALU/M | FP Store | |
INT ALU/M | INT Store | |
FP ALU/M | FP Load | |
FP ALU/M | FP Store | |
INT ALU/M | Conditional Branch |
Just some silly scratch paper.