The MIPS R4000 Floating-Point Pipeline

The R4000 Floating-point unit has three functional units, FP Divider, FP Multiplier, and an FP Adder. FP operations of double precision can take anywhere from 2 cycles (negate) up to 112 cycles (sq root). The FP unit has eight stages, there is a single copy of each stage, an instruction may use a stage zero or more times and in different orders. The stages are as follows:

Stage Functional Unit (FU) Description
A FP Adder Mantissa Add Stage
D FP Divider Divide Pipeline Stage
E FP Multiplier Exception Test Stage
M FP Multiplier First Stage of Multiplier
N FP Multiplier Second Stage of Multiplier
R FP Adder Rounding Stage
S FP Adder Operand Shift Stage
U   Unpack FP Numbers

The following table shows common double-precision FP operations, their latencies, initiation intervals, and the R4000 Floating-point pipe stages they use:

FP Instruction Latency Initiation Interval Pipe Stages
Add, Subtract 4 3 U,S+A,A+R,R+S
Multiply 8 4 U,E+M,M,M,M,N,N+A,R
Divide 36 35 U,A,R,D(28),D+A,D+R,D+R,D+A,D+R,A,R
Square Root 112 111

U,E,(A+R)(108),A,R

Negate 2 1 U,S
Absolute Value 2 1 U,S
FP Compare 3 2 U,A,R

* Note: (28) signifies the stage is used 28 times in a row

Looking at the previous table, we can see what sequence of independent instructions can be executed without stalling. If two instructions try to share a pipe stage while executing, a stall will occur.

Example 1. The first example of this is an FP Multiply followed by an FP Add:

Operation Issue/Stall 0 1 2 3 4 5 6 7 8 9 10 11
Multiply Issue U E+M M M M N N+A R        
Add Issue   U S+A A+R R+S              
  Issue     U S+A A+R R+S            
  Issue       U S+A A+R R+S          
  Stall         U S+A A+R R+S        
  Stall           U S+A A+R R+S      
  Issue             U S+A A+R R+S    
  Issue               U S+A A+R R+S  

This table shows that stalls will occur when an FP Multiply instruction is followed by an FP Add instruction that is issued 4 or 5 clock cycles later. The yellow cells show where the shared pipe stages cause conflicts in the R4000 FP pipeline.

Example 2. Now we will see that an FP Add followed by an FP Multiply will not cause any problems in the MIPS R4000 pipeline.

Operation Issue/Stall 0 1 2 3 4 5 6 7 8 9
Add Issue U S+A A+R R+S            
Multiply Issue   U E+M M M M N N+A R  
  Issue     U E+M M M M N N+A R

This table shows that stalls do not occur because the Add operation is fairly quick and gets through the shared pipe stages before the Multiply operation needs them.

Example 3. Now for one more example, this time involving FP Divide and FP Add.

Operation Issue/Stall 25 26 27 28 29 30 31 32 33 34 35 36
Divide Issued in cycle 0 D D D D D D+A D+R D+A D+R A R  
Add Issue   U S+A A+R R+S              
  Issue     U S+A A+R R+S            
  Stall       U S+A A+R R+S          
  Stall         U S+A A+R R+S        
  Stall           U S+A A+R R+S      
  Stall             U S+A A+R R+S    
  Stall               U S+A A+R R+S  
  Stall                 U S+A A+R R+S
  Issue                   U S+A A+R
  Issue                     U S+A
  Issue                       U

We ignore the first 24 cycles as most of them are spent in the D stage since FP Divide is in the D stage for 28 cycles. You can see that any Add operation that is issued between 28 and 33 cycles after the Divide is issued will cause the pipeline to stall.

The preceeding examples showed what happens in the MIPS R4000 pipeline when two FP instructions are issued. If more than two instructions are issued, the opportunity for stalls is much higher, and the above tables would be much larger.