SpeedupOverall
= Exec.TimeOld
=
1
.
Exec.TimeNew
(1 - fractionenhanced
)
+ fractionenhanced
speedupenhanced
speedupenhanced
=
x
1.7
=
1
.
(1 - fractionenhanced
)
+ fractionenhanced
x
1.7
=
x
.
x - x * fractionenhanced + fractionenhanced
1.7x - 1.7x * fractionenhanced
+
1.7 * fractionenhanced = x
- 17x * fractionenhanced
+
1.7 * fractionenhanced = x - 1.7x
- x * fractionenhanced
+
fractionenhanced
= x - 1.7x
1.7
fractionenhanced
= x - 1.7x
.
1.7( 1 - x )
b.
SpeedupOverall =1.25
speedupenhanced
=
x
1.25 =
1 .
(1 - 0.5) + 0.50
x
1.25 =
x .
0.50x + 0.50
x =
1.25(0.50) .
1 - 1.25(0.50)
x = 1.667
Question 2:CPI
Information Given:
Perfect Cache:
Type | Frequency | Number of Clock Cycles |
ALU Operations | 30% |
|
Loads | 30% |
|
Stores | 20% |
|
Branches | 20% |
|
Instruction Fetch Miss Rate = 5%
Load/Store Miss Rate = 90%
Miss Penalty = 40 clock cycles
(a) CPI for Each Instruction Type:
CPI = CPIPerfect+
(Miss Rate * Miss Penalty)
CPIALUops = 1 + (0.05* 40) = 3
CPILoads = 2 + [(0.05 + 0.90) * 40] = 40
CPIStores = 2 + [(0.05 + 0.90) * 40] = 40
CPIBranches =
2 + (0.05* 40) = 4
4
CPIOverall
= å
CPI i*
ICi / ICtotal
i =1
CPIOverall = (3 * 0.30) + (40 * 0.30) + (40 * 0.20) + (4 * 0.20)
CPIOverall
= 21.70
CPIperfect =
1.7
Speedupwith perfect
cache =
Exec.TimeOverall
= IC * CPIOverall
*
CCT
Exec.Timeperfect
IC * CPIperfect * CCT
CPILoad = 2 cycles
CPIStore = 2 cycles
CPIBranches = (2
+n) cycles
ICALUops= 0.30
ICLoad= 0.30 – 0.30(0.20)(1) – 0.30(0.15)(2) = 0.15
ICStores = 0.20 – 0.30(0.10)(1) = 0.17
ICBranches = 0.20
CPU Perfect=
CPUtradeoff
1.70*(ICTotal) = (CPIALU ops * ICALU ops) + (CPILoads*ICLoads) + (CPIStores*ICStores) + (CPIBranches*ICBranches)
1.70*(ICTotal) = [(0.30*1.6) + (0.15*2) + (0.17*2) + (0.20*{2 + n})][ ICTotal]
n = 0.9
1.7*(ICTotal Clock Cycle) = (CPIALU ops*ICALU ops) + (CPILoads*ICLoads) + (CPIStores*ICStores) + (CPIBranches*ICBranches)
1.7*(ICTotal) = [(0.30*1.6) + (0.15*2) + (0.17*2) + (0.2*2)][ ICTotal] ][1 +x]
x = 11.84%
Ideal Machine:
CPUExecution Time = (CPUClock Cycles + Memory Stall Cycles) x Clock Cycle
= (IC x CPI + 0) x Clock Cycle
= IC x 2 x Clock Cycle
Memory Stall Cycles = IC x (1 x 0.025 x 40 + 0.4 x 0.01 x 40)
= IC x 1.16
CPUExecution Time (Cache) = (CPUClock Cycles + Memory Stall Cycles) x Clock Cycle
= (IC x [2 + 1.16]) x Clock Cycle
Performance Ratio
= CPUExecution Time (Cache)
= 3.16 =
1.58
CPUExecution Time
2
Number of Non Loads/Stores: ICold x 0.6
Number of Loads/Stores: ICold x 0.4(1-0.3) = 0.28ICold
Total # of ICNew = 0.88 x ICOld
Memory Stall Cycles = Instruction Misses + Data Misses = [ICNew x Instruction Miss Rate x Instruction Miss Penalty] +
= 0.88 x ICold + 0.112 x ICold
= 0.992 x ICOld
= ICOld * 2.8896 * CCOld