Now, we have a branch penalty of 2 clock cycles. That will not affect
neither the execution profile nor the of any one iteration of
the loop since no extra stalls will appear within any iteration caused
by the new treatment of branches. The stalls will appear between
the iterations of the loop. Starting from the second iteration, the fetching
of the first instruction will be delayed by 2 clock cycles because of the
branch penalty. Then, we get the same table as in Problem 1 and the
same
for one iteration of the loop. For the entire loop we get