Pipeline

The pipeline for the UltraSparc IIi is 9 stages.

Instructions complete after they have left the Writeback stage.

Integer Pipeline (simplified)

Fetch
Decode
Group
Execute
Cache
N1
N2
N3
Write

Floating Point Pipeline (simplified)

Fetch
Decode
Group
Register
X1
X2
X3
N3
Write

The pipeline is syncronized with both the Integer and Floating Point Pipeline to make the it easier to handle synchronization of the pipeline and exception handling. This is a performance tradeoff in terms of the Integer pipeline because there are extra steps just to complete one integer instruction.

The UltraSparc IIi also employes dynamic branch prediction. It uses a standard 2 bit predictor state machine to allow for the best prediction of the way a branch is going to go. It also uses a 2k branch prediction buffer so that almost all of the branches that are going to be encountered in a program will be predicted.

The Fetch stage accesses the Instruction Cache and puts up to 4 instructions into the Instruction Buffer. The branch information is also fetched with each instruction so if they instructions are branches the cpu already has the information need to make the prediction. Since the Fetch stage is able to grab instructions at a rate of 4 at a time this allows the fetch equal bandwidth than the execution of the processor.

The Decode stage decodes up to 4 instructions in parallel. These then are sent into the next stage, Grouping.

The Grouping stage tries to dispatch 4 valid instructions per clock cycle. It will take the 2 instructions from the Fetch and Decode stages and routes them toward the Integer functional units if they are integer ALU instructions. The other 2 of the 4 instructions can be giving to the Floating Point and Graphics Unit. (FGU) This stage is also responsible for setting up the bypassing and handling interlock stalls.

The Execution stage will excute the 2 ineteger ALU instructions at this point if they are in the current grouping. The instnructions are then available in the very next stage because of the bypassing that was set up in the grouping stage. This means that from H&P that the UltraSparc IIi has forwarding enabled to try to reduce the number of stalls from RAW hazards. Memory operations where the need for Virtual Addresses to be computed are also done in parrallel to the ALU instructions in this stage.

The Register stage is for the FGU only. This is where the floating point registers are accessed instead of in normal DLX where they are sampled in the Decode stage. This stage also sets up bypasses for the instructions going through the FGU.

The Cache Access stage is where memory operations check to see if the instruction if it is a Load/Store is a hit or miss into the D-Cache. (Data Cache) While this is being done the virtual address is also sent to the MMU to be turned into a physical address. This stage also checks what are called condition codes. This is where branch prediction is checked. If the branch was mis-predicted then the pipeline is flushed. Then the pipeline is started again with the correct instructions.

The X1 stage is the execution stage for the FGU if the instruction has a latency of 1 then they are finished after they complete this stage.

The N1 stage is where a load miss for the D-cache it will enter the Load Buffer. If a TLB miss occurs a software routine will be called to do the address translation. Store's physical address enters the Store Buffer and to decrease stalls the address and data parts are decoupled.

The X2 stage is another execution cycle for the FGU.

The N2 stage is where most instructions finish their execution. Loads continue in the Load Buffer from N1. Dependency checking is done on all loads.

The X3 is again just another execution cycle for the FGU.

The N3 is where all traps are resolved.

The Write stage all results are put back into their respective registers. Execution is now terminated for instructions.