Problem 1: True Lies -------------------- A: False: RISC has very few register addressing modes to simplyify the instructions. B: True C: False: The MIPS rating Meas`ns absolutely squat. The overhead on one machine may be totally different than on another. D: False: There is a byte alignment error. 100 is not evenly divisable by 8 E: False: This really depends on the instructions and the hardware. F: True G: False: The RISC Arcitechture is proff that this is false. The RISC Arcitechure uses computer hardware to solve the hazards, thus increasing the CPI. H: False. In actual fact this question is false, since the next instruction on a sequential machine (CISC) waits til the last instruction is finished, and the PC is then waiting. In a RISC machine, however, this is ALWAYS true. I: False: J: True K: False: Latency is the number of clocks from when it enters the FPADD, til the sum is ready. L: False: The length of the clock cycle is determined by the SLOWEST Stage. M: False: DIV is not fully Pipelined, but multiply, add, and subtract are. Problem 2: Feirce Creatures ---------------------------- 2.1: This question is similar to the one on the last Quiz. The following information is used to solve the problem: The clock RATE was DECREASED by 20%. CRnew = .8CRold The CPI is INCREASED by 30% CPInew = 1.3 CPIold The IC is DECREASED by 40% ICnew = .6ICold So the equation is: ICold CPIold .8CRold .8 .8 ------- * --------- * ------- = -------- = ----- = 1.02 .6ICold 1.3CPIold CRold .6 * 1.3 .78 2.2 The question uses percentages of the execution time. Amdahls law uses execution time. Unoptimized Percentages of execution time: ALU = 50% Loads = 20% Other = 30% Optimized Percentages of Execution Time: ALU = 25% Loads = 10% Other = 65% ==> This is the remaining time not used in the other instructions already listed. since this is a percentage of program run-time. Using the picture method of Amdahl's Law, we get the total execution time of the unmodifed process to be "A", and the total execution time of the Modified process to be "B". Also the Unmodified potrion was the portion that did not change. Based on this information, the following is true: x x .65 = --- ; .30 = --- B A The question asks for effective speed up, which is A/B. well above we have enough information to figure out what A and B are. Minor Algebra makes the Above equations trivial: .65B = x ; .30A = x; .65B = .30A; By this we can say: A .65 - = --- This is the effective Speed up. B .30 2.3 This is the Normalization question. The setup is simple: Original program: ALU ops: 40% Loads: 20% Stores: 15% All others: 25% ==> Not given but not all instructions are of the Types above With the updates, 1/4 of the ALU ops do not have loads, so 10% of the loads disappear, 10% of the ALU ops disappear, but 10% of the instructions become ALU-L ops. The rest are unaffected. So the new IC breakdown looks like this: Upgraded Program: ALU ops: 30 ALU-L ops: 10 Loads: 10 Stores: 15 ==> No Changes Other: 25 ==> No Changes This breakdown, however, does not add up to 100%. So it must be normalized. The total of the the above is 90, not 100. So it must be divides out on each of the catagories. The answer to the question, then, is: ALU ops: 30/.9 = 33.3% ALU-L ops: 10/.9 = 11.1% Loads: 10/.9 = 11.1% Stores: 15/.9 = 16.7% Other: 25/.9 = 27.8% ==> Not asked for, but helps to make sure Total is 100% Problem 3: Trading Places: --------------------------- 3.1 This question is looking for the type of architechure in terms of the GPR measure. DLX is (0,3) architechture, and since the only change was in the Loads, and stores, not in the ALU instructions, it too is a (0,3) as well. 3.2 This is coding in the new architechture. The only difference between this and the original DLX is the memory references. So here is one version of the code: ADDI R1, R0, #100 LW R2, (R1) ADDI R1, R1, #4 LW R3, (R1) ADDI R1, R1, #4 LW R4, (R1) ADDI R1, R1, #4 LW R5, (R1) ADD R2, R2, R3 ADD R5, R5, R4 ADDI R1, R0, #200 ADD R2, R2, R5 SW (R1), R2 3.3 Because you need to compute whether or not the branch condition is true or false to determine whether the branch is taken, you either need to wait til the EX2 phase is complete, to use the ALU hardware, or put in new hardware somewhere in the Mem1 phase. putting condition checking hardware in the MEM1 phase would allow use of the PC after the clock pulse of Mem1. If you use the existing hardware the PC would be ready after EX2, or perhaps EX1. 3.4 The hazards are similar to the original DLX model. The potential hazards are as follows: Control Hazards Data Hazards RAW WAW Structural Hazards (Resource Contention) 3.5 CPI will vary in this architechture. Depending on the type of instructions executed. IC will be higher in the DLX-LAX, since you do not have the use of the offset. So instead of using an immediate offset, you must have an additional instruction to initiate a register with the offset. Clock Cycle time will be shorter due to the splitting of the memory stage. Problem 4: Forever Young: ------------------------- 4.1 The problem is looking fo rthe Consumer-Producer pairs. Those pairs are: Pair Code Fragment (Ex2, Mem1) ADD R1, R0, R3 SW (R2), R1 3 stalls neccessary. (Ex2, Ex1) This doesnt happen in this arcitechture since the ALU Writeback is finished in time for the Decode phase. (Mem2, Mem1) LW R2, (R1) SW (R3), R2 One Stall Neccessary, before entering Mem1 stage (Mem2, Ex1) LW R2, (R1) ADD R2, R2, R3 No Stalls Neccessary. 4.2 This question is fairly simple in terms of the hints given. With a sequential machine, the clock cycle time is the total of all the steps, if it is to complete in one clock cycle anyway. The clock cycle of the piped version is the length of the longest cycle, plus 1 ns for overhead. Therefore the clock cycle times for each machine are as follows: CCseq = 44 ns CCpipe = 9 ns (ID phase takes 8 ns, plus 1 for overhead) Next is the execution time. Well the equation for execution time is the same for each machine, and is: EQseq = IC * CCseq * CPIseq EQpipe = IC * CCpipe * CPIpipe In the hint, it says that the CPI for the sequential machine is 1, and in the piped version, the CPI approaches its ideal value, which is also 1. So as the CPI of the piped machine approaches its ideal state, and with the above info, we get teh following: CCseq = 44 ns ==> From above CCpipe = 9 ns ==> From above CPIseq = 1 CPIpipe = 1 ==> it is approaching this value ICseq = ICpipe ==> There is no change in the language, just the machine platform. With this information, and given the rules for effective speed up, the following is the answer to the question: EQseq IC CCseq CPIseq 1 44 1 44 ------ = -- * ------ * ------- = - * -- * - = -- EQpipe IC CCpipe CPIpipe 1 9 1 9Back