The MIPS R4000 Floating-Point Pipeline
The R4000 Floating-point unit has three functional units, FP Divider, FP Multiplier, and an FP Adder. FP operations of double precision can take anywhere from 2 cycles (negate) up to 112 cycles (sq root). The FP unit has eight stages, there is a single copy of each stage, an instruction may use a stage zero or more times and in different orders. The stages are as follows:
Stage | Functional Unit (FU) | Description |
A | FP Adder | Mantissa Add Stage |
D | FP Divider | Divide Pipeline Stage |
E | FP Multiplier | Exception Test Stage |
M | FP Multiplier | First Stage of Multiplier |
N | FP Multiplier | Second Stage of Multiplier |
R | FP Adder | Rounding Stage |
S | FP Adder | Operand Shift Stage |
U | Unpack FP Numbers |
The following table shows common double-precision FP operations, their latencies, initiation intervals, and the R4000 Floating-point pipe stages they use:
FP Instruction | Latency | Initiation Interval | Pipe Stages |
Add, Subtract | 4 | 3 | U,S+A,A+R,R+S |
Multiply | 8 | 4 | U,E+M,M,M,M,N,N+A,R |
Divide | 36 | 35 | U,A,R,D(28),D+A,D+R,D+R,D+A,D+R,A,R |
Square Root | 112 | 111 |
U,E,(A+R)(108),A,R |
Negate | 2 | 1 | U,S |
Absolute Value | 2 | 1 | U,S |
FP Compare | 3 | 2 | U,A,R |
* Note: (28) signifies the stage is used 28 times in a row
Looking at the previous table, we can see what sequence of independent instructions can be executed without stalling. If two instructions try to share a pipe stage while executing, a stall will occur.
Example 1. The first example of this is an FP Multiply followed by an FP Add:
Operation | Issue/Stall | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
Multiply | Issue | U | E+M | M | M | M | N | N+A | R | ||||
Add | Issue | U | S+A | A+R | R+S | ||||||||
Issue | U | S+A | A+R | R+S | |||||||||
Issue | U | S+A | A+R | R+S | |||||||||
Stall | U | S+A | A+R | R+S | |||||||||
Stall | U | S+A | A+R | R+S | |||||||||
Issue | U | S+A | A+R | R+S | |||||||||
Issue | U | S+A | A+R | R+S |
This table shows that stalls will occur when an FP Multiply instruction is followed by an FP Add instruction that is issued 4 or 5 clock cycles later. The yellow cells show where the shared pipe stages cause conflicts in the R4000 FP pipeline.
Example 2. Now we will see that an FP Add followed by an FP Multiply will not cause any problems in the MIPS R4000 pipeline.
Operation | Issue/Stall | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Add | Issue | U | S+A | A+R | R+S | ||||||
Multiply | Issue | U | E+M | M | M | M | N | N+A | R | ||
Issue | U | E+M | M | M | M | N | N+A | R |
This table shows that stalls do not occur because the Add operation is fairly quick and gets through the shared pipe stages before the Multiply operation needs them.
Example 3. Now for one more example, this time involving FP Divide and FP Add.
Operation | Issue/Stall | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 |
Divide | Issued in cycle 0 | D | D | D | D | D | D+A | D+R | D+A | D+R | A | R | |
Add | Issue | U | S+A | A+R | R+S | ||||||||
Issue | U | S+A | A+R | R+S | |||||||||
Stall | U | S+A | A+R | R+S | |||||||||
Stall | U | S+A | A+R | R+S | |||||||||
Stall | U | S+A | A+R | R+S | |||||||||
Stall | U | S+A | A+R | R+S | |||||||||
Stall | U | S+A | A+R | R+S | |||||||||
Stall | U | S+A | A+R | R+S | |||||||||
Issue | U | S+A | A+R | ||||||||||
Issue | U | S+A | |||||||||||
Issue | U |
We ignore the first 24 cycles as most of them are spent in the D stage since FP Divide is in the D stage for 28 cycles. You can see that any Add operation that is issued between 28 and 33 cycles after the Divide is issued will cause the pipeline to stall.
The preceeding examples showed what happens in the MIPS R4000 pipeline when two FP instructions are issued. If more than two instructions are issued, the opportunity for stalls is much higher, and the above tables would be much larger.