a) write assembly pseudocode corresponding to this loop
b) assume that we have a one bit prediction scheme - for which iterations do we mispredict S2?
c) assume that we have a two bit prediction scheme - for which iterations do we mispredict S2?
3) Consider the following loop:
for(i=0; i < N ;i++) {
S1: for(j=0;j<8;j++)
{
x(8*j) = x(8*j) + y(4*j);
} }
Assume that each array element is 4 bytes and that memory addresses are interleaved between banks at the word level (e.g bank 0 gets bytes 0-3, bank 1 gets bytes 4-7, etc).a) Assume that we have 8 independently accessible memory banks. Each bank can produce a datum every 50ns. What is a lower bound on the amount of time required to carry out an iteration of loop S1?
b) How could you rewrite the code to improve this situation? What is your new lower time bound?
4) Assume single instruction issue, but assume that there is a separate floating point add unit which has two stages (A1,A2), a floating point multiply unit with stages (M1,M2,M3), and a pipelined cache which takes 3 cycles to get data from cache but can produce data every cycle - pipelined cache stages (C1,C2,C3).
a) Write a pipeline diagram for this set of instructions
b) Schedule (reorder) the instructions to improve performance (you can also rename registers) and produce a pipeline diagram