Branch prediction buffer: small memory indexed by lower portion of address of branch instruction
memory contains bit which provides branch prediction
fetching begins in the predicted direction
How to predict branch direction?
single bit scheme -- bit specifies whether branch was recently taken
two-bit scheme -- branch must miss twice before prediction is changed. Note finite state diagram for two bit scheme in Figure 4.13
n-bit counter -- counter can take on values from 0 to 2**n-1, when counter is greater or equal to half maximum value, branch is predicted as taken
Why more than one bit?
for(i=0;i<n;i++) {
for(j=0;j<m;j++){
...statements.... }}
For i>0, the branch associated with the j loop will be mispredicted twice per i loop iteration. (when j=0 and j=m)
Two bit scheme - will always predict branch taken. When j=m, prediction will be wrong, but when the next iteration of the i loop rolls around, prediction will be correct.
Implement as a small cache accessed with instruction address during IF pipe stage, or as a pair of bits attached to each block in the instruction cache and fetched with the instruction
Accuracy of branch prediction buffer with two bits per entry - 82-99% on SPEC89
Scheme (used on its own) helps if we know target PC before we know whether branch is taken.
Correlating predictors - use behavior of other branches to make a prediction. For instance:
if(d==0) {d=1;}
if(d==1)
DLX Version:
bnez r1,L1 ; branch b1 (r1!=0)
addi r1,r0,#1 ; set r1 to 1 as r1==0
L1: subi r1,r1,#1; subtract 1 from r1
bnez r1,L2; branch b2 (r1!=0)
.....
L2:
Assume that this code sequence is encountered repeatedly with r1 equal to 2 then 0 then 2 then
0 etc.
r1=2 -- branch b1 taken, branch b2 taken
r1=0 -- branch b1 not taken, branch b2 not taken
Assume that we have one bit predictor initialized to not taken. One bit predictor will always
be wrong. Two bit predictor will also always be wrong.
For each conditional, predict based on the outcome of the previously executed conditional
Assume each branch has a 1 bit preditor that uses one bit of correlation:
Notation - consider loop x: NT/NT means that if the previous loop was not taken, we predict loop x as not taken and if the previous loop was taken, we also predict loop x as not taken
Initial prediction for branch b1 and branch b2 is NT/NT. We assume that the last branch was not taken.
r1 = 2 -- branch b1 taken
Prediction NT:: assume last branch not taken - (state b1: NT/NT )
revised b1 state T/NT
branch b2 taken
Prediction NT:: last branch taken - [state b2: NT/NT}
revised b2 state NT/T
r1 = 0 - branch b2 not taken
Prediction NT :: last branch taken - [state b1:T/NT]
revised b1 state T/NT
branch b2 not taken
Prediction NT:: last branch not taken - [state b2: NT/T]
revised b2 state NT/T
r1 = 2 - branch b1 taken
Prediction T:: last branch not taken [state b1:T/NT]
revised b1 state T/NT
branch b2 taken
Prediction T:: last branch taken [state b2:NT/T]
revised b2 state NT/T
A (1,1) predictor uses the behavior of the last branch to choose from among a pair of one-bit branch predictors. An (m,n) predictor uses the
behavior of the last m branches to choose from 2**m branch predictors, each of which is an n-bit predictor for a single branch.
Global history of last m branches can be recorded in m bit shift register
Branch prediction buffer can be indexed by concatenating low-order bits from the branch address with the m-bit global history.
Branch prediction cache that stores predicted address for the next instruction after a branch is called a branch-target buffer or branch target cache
We are predicting the next instruction address before we find out whether a given instruction is a branch. Thus, we must also find out early whether the instruction is actually a branch. If we have seen a given branch before, the instruction will live at a specific address. (Assume we do not allow self-modifying code). If PC of fetched instruction matches PC in branch-target buffer, predicted PC is used as next PC (see Figure 4.22).
For one bit predictor, just store predicted-taken branches.
For two (or more) bit predictors, use both target buffer and prediction buffer.
Study Figure 4.23
Store target instructions instead of, or in addition to, target address.
branch folding --- substitute the instruction from the branch target buffer in place of the instruction returned from the cache - in other words, switch instructions during IF cycle