Problem 1:
We are considering carrying out a machine upgrade and we want to compare the performance we would obtain on some program that we frequently run. We are able to run the program on machine A, and conveniently enough, the program takes 100 seconds to run; 60 seconds are spent doing I/O and 40 seconds are spent computing. Possibly because this is a (sample) test question, we cannot run the program on machine B. Machine B has a processor/memory subsystem that is supposed to be 50% faster than machine A and an I/O subsystem that is supposed to be 30% faster than that of machine A. How much time should the program run on machine B?
Solution:
The I/O takes 60 seconds on machine A, and takes 0.7*60 seconds on machine B.
Computation takes 40 seconds on machine A, and takes 0.5*40 seconds on machine B
Speedup : 100 /(0.7*60+0.5*40) = 100/(42+20) = 100/62 = 1.13
Problem 2:
Write code for the following in DLX: (DLX instructions from Figures 2.22 - 2.26 would be passed out with exam).
x += y[a[3]+10];
w = x*x;
Solution:
Note: The base address of array a is stored in R10, the base address of array y is stored in R12. The address of x is stored in R14, and the address of w is stored in R16
LW R1,3(R10) #R1 has a[3]
ADD R2, R1,10 # R2 has a[3]+10
ADD R3, R2,R12 # R3 has base address of y + a[3]+10
LW R4, (R3) # Load contents of y[a[3]+10]
LW R5, (R14)
ADD R5,R5,R4 # Add y[a[3]+10] to x and store the results in R5
MULT R6,R5,R5 # Multiply x*x and put in R6
SW 0(R14),R5 #Store x
SW O(R16),R6 #Store w
Problem 3:
Consider the following instructions:
LD F0, 0(R2)
MULTD F2,F0,F0
LD F4, 0(R3)
MULTD F6,F4,F4
LD F8, 0(R4)
MULTD F10,F8,F8
Assume the pipeline in Figure 3.44 of H&P (the figure would be supplied with exam), assume that instructions consume their operands at the beginning of their first execution cycle, and assume that MULTD has 7 execution stages (i.e. stages for MULTDare IF,ID, M1,M2,M3,M4,M5,M6,M7,MEM,WB).
Show the timing of these instructions with and without forwarding
LD F0, 0(R2) |
IF |
ID |
EX |
ME |
WB |
||||||||||||||
MUL F2,F0,F0 |
IF |
ID |
S |
M1 |
M2 |
M3 |
M4 |
M5 |
M6 |
M7 |
ME |
WB |
|||||||
LD F4, 0(R3) |
IF |
S |
ID |
EX |
ME |
WB |
|||||||||||||
MULF6,F4,F4 |
S |
IF |
ID |
S |
M1 |
M2 |
M3 |
M4 |
M5 |
M6 |
M7 |
ME |
WB |
||||||
LD F8, 0(R4) |
S |
IF |
S |
ID |
EX |
ME |
WB |
||||||||||||
MUL F10,F8,F8 |
S |
S |
IF |
ID |
S |
M1 |
M2 |
M3 |
M4 |
M5 |
M6 |
M7 |
ME |
WB |
LD F0, 0(R2) |
IF |
ID |
EX |
ME |
WB |
|||||||||||||||||
MUL F2,F0,F0 |
IF |
ID |
S |
S |
M1 |
M2 |
M3 |
M4 |
M5 |
M6 |
M7 |
ME |
WB |
|||||||||
LD F4, 0(R3) |
IF |
S |
S |
ID |
EX |
ME |
WB |
|||||||||||||||
MUL F6,F4,F4 |
S |
S |
IF |
ID |
S |
S |
M1 |
M2 |
M3 |
M4 |
M5 |
M6 |
M7 |
ME |
WB |
|||||||
LD F8, 0(R4) |
S |
S |
IF |
S |
S |
ID |
EX |
ME |
WB |
|||||||||||||
MUL F10,F8,F8 |
S |
S |
S |
S |
IF |
ID |
S |
S |
M1 |
M2 |
M3 |
M4 |
M5 |
M6 |
M7 |
ME |
WB |
Reorder the instructions to minimize stalls. How does the reordering affect the timing?
LD F0, 0(R2) |
IF |
ID |
EX |
ME |
WB |
|||||||||||
LD F4, 0(R3) |
IF |
ID |
EX |
ME |
WB |
|||||||||||
LD F8, 0(R4) |
IF |
ID |
EX |
ME |
WB |
|||||||||||
MULTD F2,F0,F0 |
IF |
ID |
M1 |
M2 |
M3 |
M4 |
M5 |
M6 |
M7 |
ME |
WB |
|||||
MULTD F6,F4,F4 |
IF |
ID |
M1 |
M2 |
M3 |
M4 |
M5 |
M6 |
M7 |
ME |
WB |
|||||
MULTD F10,F8,F8 |
IF |
ID |
M1 |
M2 |
M3 |
M4 |
M5 |
M6 |
M7 |
ME |
WB |
Problem 4:
Give two examples from the following code of data dependencies, antidependencies, and output dependencies.
S1: LD F0,0(R1)
S2: ADDD F4,F0,F2
S3: ADDD F6,F0,F4
S4: LD F2,0(R3)
S5: SUBI R3,R3,#4
S6: MULT F6,F12,F2
S7: LD F2,0(R3)
Data: S1 to S2 carried by F0; S2 to S3 carried by F4
Antidependence: S2 to S4 carried by F2; S4 to S5 carried by R3
Output dependence: S3 to S6 carried by F6; S4 to S7 carried by F2
Problem 5:
Consider the following instructions:
S1: ADDD F0,F2,F4
S2: MULTD F2,F6,F8
S3: ADDD F8,F2,F4
I will provide you a description of what happens in Tomasulos algorithm on the first two cycles. Show what happens on the third cycle.
Tomasulos algorithm:
First Cycle: Issue S1
name | name | Op | Vj | Vk | Qj | Qk | Etime | Inst |
mult1 | ||||||||
mult2 | ||||||||
add1 | add | regs[F2] | regs[F4] | S1 | ||||
add2 |
Registers:
F0 | ||||||||
add1 |
Second Cycle: S1 executes, issue S2
name | Op | Vj | Vk | Qj | Qk | Etime | Inst |
mult1 | mult | regs[F6] | regs[F8] | S2 | |||
mult2 | |||||||
add1 | add | regs[F2] | regs[F4] | 1 |
S1 | ||
add2 |
Registers
F0 | F2 | ||||||
add1 | mult1 |
Third Cycle: <FILL THIS IN YOURSELF>
name | Op | Vj | Vk | Qj | Qk | Etime | Inst |
mult1 |
mult |
regs[F6] |
regs[F8] |
|
|
1 |
S2 |
mult2 |
|
|
|
|
|
|
|
add1 |
add |
regs[F2] |
regs[F4] |
|
|
2 |
S1 |
add2 |
add |
|
regs[F4] |
mult1 |
|
|
S3 |
Registers:
F0 | F2 | F8 | |||||
add1 | mult1 | add2 |