MMM-DLX Specifications

Next: About this document ... Up: Problem 2: Duplicated Processors Previous: Problem 3: Doctoring Pipelines

MMM-DLX Specifications

Warning: You MUST assume that the implementation includes separate memories; register writes and reads split across a clock cycle; and any forwarding, bypassing, or load interlocks required to resolve RAW hazards. Be sure to write down any additional assumptions you make.

The most significant modification in the MMM-DLX from the DLX architectures in H&P was moving the MEM stage from after the EX stage(s) to before the execution pipelines. Since ALU instructions can now have a memory access, we restrict the register addressing modes to register indirect. That is, loads and stores no longer require an effective address computation.

Instead, you have to have the exact address in the register. Thus, instruction ( LF F2, (R2)) will put the value stored at the address in R2 into register F2. The instruction ( ADD R2,R3,(R1)) will add the value stored at the address in R1 to the value in R3 and put the result in register R2. We call these new instructions ALUM (for ALU memory) to distinguish them from the standard ALU instructions. The symbol ALU/M refers to both ALU and ALUM instructions collectively.

A sample code fragment appears below:

LW R2, (R1)
LF F0, (R3)
LF F2, (R2)
MULTF F1, F0, F2
MULTF F1, (R4), F1
SF (R3),F1
ADDI R2, R2, #-16
LD F4, (R2)
ADDI R2, R2, #-16
ADDD F6,F4, (R2)
ADDI R2, R2, #-16
SD (R2),F6
SW (R4), R2
LW R3, (R10)
XORI R5, (R3), #-1
SW (R3),R5

***The details of the MMM-DLX appear on the next page.***

PS:

Pipeline Stages

F	Instruction Fetch
ID	Decode instruction, fetch register values and
	test branch condition
MEM	Memory access for loads and stores and
	branch target computed
LWB/EX	Load write back (1 cycle) and
	INT and FP ALU Units execute
	(see FU chart below)
ALWB	ALU write back

FU:

EX Stage Functional Units and Instruction Mix

The term full pipe (fully pipelined) means that each pipeline stage is one clock cycle.

Unit #Stages or CC's Ins Freq

INT ALU 2 stage

FP ADD 8 full pipe stages A1-A8

FP MULT 18 full pipe stages M1-M18

FP DIV 60 stages

CPU:

CPU Execution-Time Specific Parameters

Ins Type Ins Freq # of CPU CC

ALU $\beta$

ALUM $\beta$

Load 1

Store 1

Cond Branch 3

Jump 1

Note: Problem 3.1 asks you to compute the value of $\beta$ .

MEM:

Memory Stall Parameters

Mem Access Miss Rate Miss Penalty

Ins Fetch 0.05 XXX

Data Fetch 0.10 XXX

Reads XXX 100 CC

Writes XXX 250 CC

TAKE A DEEP BREATH. NOW, DO ANY 8 (EIGHT). THERE ARE 12 SIX-POINT PROBLEMS THERE. FIND SOMETHING YOU CAN DO. AND LABEL YOUR WORK CLEARLY FOR PARTIAL CREDIT CONSIDERATION.

3.1

Write an expression for $\beta$ , the average number of clock cycles for an ALU or ALUM instruction, assuming the ALU instruction mix shown in the FU Table.

3.2

Write an expression for the AVG CPI on the MMM-DLX assuming a perfect cache. That is, assume that the miss rate or MR, is zero. Your answer should be in terms of $\beta$ , the average number of clock cycles for an ALU instruction, and assuming the instruction mix given in the CPU table.

3.3

Write an expression for the AVG CPI on the MMM-DLX taking the cache misses into account. Again, your answer should be in terms of $\beta$ . Be sure to consider instruction and data fetches for each instruction type listed in the CPU Table.

3.4

Identify any potential structural hazards in the MMM-DLX. If none exist, explain why there are none.

3.5

Assess the potential for WAR, WAW, and RAR hazards in the MMM-DLX. That is, if they occur, give an example. If they don't, explain why they can't occur.

3.6

What type of strategy do you think is used in the MMM-DLX to deal with control hazards? Give and example of a control hazard, and explain your answer for full credit.

3.7

Write an expression for the speedup due to the MMM-DLX control hazard handling method over the method of stalling until the control hazard is cleared.

For the next set of problems, credit is 2 points per correct latency per row in each latency table. Partial credit will be given if work is organized and labelled clearly.

3.8, 3.9

Using the information in the FU table and the pipeline structure, determine the latencies in the MMM-DLX for the dependent pairs of ALU instructions in the table below. When the symbol ALU/M is used, you may assume that the ALU and ALUM instructions have the same latencies associated with them.

Hint: for each dependent instruction pair, identify which stage of the producer instruction is the ``source'' of the value that must be ``received'' by the consumer instruction.

Producer	Consumer	Latency
Instruction	Instruction
FP ADD	FP ALU/M
FP DIV	FP ALU/M
FP MUL	FP ALU/M
FP Load	FP ALU/M
INT ALU/M	INT ALU
INT ALU/M	INT ALUM

3.10, 3.11

Determine the latencies associated with the pairs of operations given below. The notation is described in the previous problem's directions. Again, two points per correct row.

Producer	Consumer	Latency
Instruction	Instruction
INT ALU/M	FP Load
INT ALU/M	FP Store
INT ALU/M	INT Store
FP ALU/M	FP Load
FP ALU/M	FP Store
INT ALU/M	Conditional Branch

3.12

The is the last of the MMM-DLX latency problems. The previous four questions (3.8-3.11) never examined the potential latencies between INT Load producers or INT ALU/M producers and FP ALU and FP ALUM consumers. So, you get to. Make sure you explain your reasoning clearly.

Just some silly scratch paper.

Next: About this document ... Up: Problem 2: Duplicated Processors Previous: Problem 3: Doctoring Pipelines

MM Hugue
2001-11-01

Unit	#Stages or CC's	Ins Freq
INT ALU	2 stage
FP ADD	8 full pipe stages A1-A8
FP MULT	18 full pipe stages M1-M18
FP DIV	60 stages

Ins Type	Ins Freq	# of CPU CC
ALU		$\beta$
ALUM		$\beta$
Load		1
Store		1
Cond Branch		3
Jump		1

Mem Access	Miss Rate	Miss Penalty
Ins Fetch	0.05	XXX
Data Fetch	0.10	XXX

Reads	XXX	100 CC
Writes	XXX	250 CC