Appendix A
-
A.1 Introduction
-
Pipelining is an implementation whereby multiple instructions are overlapped
in execution. (Like a car assembly line.)
-
In an ideal pipeline, i.e. if the stages are perfectly balanced, then:
time per instruction on an unpipelined machine
Time per instruction in a pipelined machine = ------------------------------------------------------
number of pipe stages
-
The Basics of a RISC Instruction Set
-
Load - move data from memory to a register. (Get it ready to use
by putting it somewhere we can access it quickly.)
-
Store - move data from a register to memory. (We're done using it,
now put it away someplace slower.)
-
MIPS64 has 64 bit instructions
-
32 registers, register 0 always has a data value of 0.
-
Immediate instructions end in the suffix 'I' and unsigned end in 'U'.
-
ALU instructions - they take two registers or a register and a sign extended
16 bit immediate, and store the result in a third register. Ex: DADD
or DSUB.
-
Load and store instructions - takes a register source (called a base register),
and an immediate field (16 bits in MIPS) called the offset, as operands.
The sum, called the effective address, is used as a memory address.
If it's a load, the second register is the destination. If it's a
store, the second register operand is the source of the data that is stored
in memory.
-
Branches and jumps - branches are conditional, jumps are unconditional.
The branch destination is obtained by adding a sign-extending offset (16
bits in MIPS) to the current program counter.
-
A Simple Implementation of a RISC Intruction Set
-
Instruction fetch cycle (IF) - send the program counter (PC) to memory
and fetch the current instruction from memory. Update the PC to the
next sequential PC by adding 4 (since each instruction is 4 bytes) to the
PC.
-
Instruction decode / register fetcn cycle (ID) - decode the instruction
and read the registers corresponding to register source specifies from
the register file. Check for branching. Sign-extend the offset
field of the instruction. Compute the possible branch target address
by adding the sign-extended offset to the incrementd PC. Deocding
is done in parallel with reading registers, which is possible because the
register specifiers are at a fixed location in RISC architecture.
-
Execution / effective address cycle (EX) - the ALU operates on the operands
prepared
in the prior cycle, performing one of three functions, depending on the
instruction type. The three instructions are memory reference, register-register
ALU instruction, register-immediate ALU instruction.
-
Memory Access (MEM) - If the instruction is a load, memory does a read
using the effective address computed in the previous cycle. If it
is a store, then the memory writes the data from the scond register read
from the register file while using the address.
-
Write-back cycle (WB) - register to register ALU instruction or load instruction.
Write the result into the register file, whether it comes from the memory
system (for a load) or from the ALU (for an ALU instruction).
-
The Classic Five Stage Pipeline of a RISC Processor
-
Can't subtract and compute an effective address at the same time.
-
Splitting the IM and DM (Harvard style) eliminates a conflict for a single
memory that would arise between Instruction Fetch and Data memory access.
-
Register file is used in two stages: one for reading in the ID stage and
one for writing the WB stage. Those uses are distinct. Write
happens in the first half of a clock cycle and read in the second half.
-
The Program Counter is incremented by 4 bytes every IF. But branches
don't change the PC until the ID stage, and this is a problem.
-
Although is is critical to ensure that instructions in the pipeline do
not attempt to use the hardware resources at the same time, we must also
ensure that instructions in different stages of the do not interfere
with one another. This seperation is done by introducing pipelin
registers between successive stages of the pipeline, so that at the end
of a clock cycle all the results from a given stage are stored into a register
that is used at the input to the next stage on the next clock cycle.
-
Pipelining increases CPU throughput, but it doesn't reduce the amount of
time to perform an individual instruction. In fact it slightly increases
it due to the control factors of the pipeline.
-
A.2 The Major Hurdle of Pipelining - Pipeline Hazards
-
Hazards
-
Structural hazards - arise from resource conflicts when the hardware cannot
support all possible combinations of instructions simultaneously in overlapped
execution.
-
Data hazards - when an instruction depends on the results of a previous
instruction in a way that is exposed by the overlapping if instructions
in the pipeline.
-
Control hazards - arise from the pipelining of branches and other instructions
that change the program counter.
-
Hazards in pipelines can make it nescessary to stall the pipeline
-
When an instruction is stalled, all instructions issued later than the
stalled instruction, and hence not as far along in the pipeline, are also
stalled.
-
Instructions issued earlier than the stalled instructions, and hence further
along in the pipeline, must continue, since otherwise the hazard will never
clear.
-
This is like in baseball where a runner can't take the next base until
the runner there advances.
-
Performancer of Pipelines with Stalls
-
If we ignore the cycle time overhead of pipelining and assume the stages
are perfectly balanced, then the cycle time of two processors can be equal:
CPI unpipelined
speedup = --------------------------------------------------
1 + pipeline stall cycles per instruction
-
Pipeline depth
speedup = --------------------------------------------------
1 + pipeline stall cycles per instruction
-
Structural Hazards
-
Occur when there is a resource conflict. The pipeline will stall
one of the instructions until the required unit is available.
-
The most common instances of structural hazards arise when some functional
unit is not fully pipelined (like floating point divide). Another
common way is when some resource has not been duplicated enough to allow
all combinations of instructions in the pipeline to execute.
-
Stalls are also called "pipeline bubbles" because they take up space in
the pipeline but carry no useful work.
Alex Baglione @ UMCP
CMSC 411 Summer 2002
Hennessey and Patterson, Computer Architecture: A Quantitative
Approach; Third Edition
Appendix A Notes - for educational agricultural
use only
Web Accessibility