|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
What is DLX? An Implementation of DLX The Basic DLX Pipeline PipeliningPipelining is an implementation technique where multiple instructions are overlapped in execution. The computer pipeline is divided in stages. Each stage completes a part of an instruction in parallel. The stages are connected one to the next to form a pipe - instructions enter at one end, progress through the stages, and exit at the other end. Pipelining does not decrease the time for individual instruction execution. Instead, it increases instruction throughput. The throughput of the instruction pipeline is determined by how often an instruction exits the pipeline.
Because the pipe stages are hooked together, all the stages
must be ready to proceed at the same time. We call the time
required to move
an instruction one step further in the pipeline
a machine cycle
.
The
length
of the machine cycle is determined by the
time required for the slowest pipe stage.
The pipeline designer's
goal is to balance the length of each pipeline stage
. If the stages are perfectly balanced, then
the time per instruction on the pipelined machine is equal to
What is DLX?DLX is a simple pipeline architecture for CPU. It is mostly used in universities as a model to study pipelining technique.The architecture of DLX was chosen based on observations about most frequently used primitives in programs. DLX provides a good architectural model for study, not only because of the recent popularity of this type of machine, but also because it is easy to understand. Like most recent load/store machines, DLX emphasizes
OperationsThere are four classes of instructions:
An Implementation of DLXImplementing the instruction set requires the introduction of several temporary registers that are not part of the architecture. Every DLX instruction can be implemented in at most five clock cycles. The five clock cycles are Detailed description of each follows: ![]() IR <- MEM[PC]Operation: - Send out the PC and fetch the instruction from memory into the instruction register (IR) - increment the PC by 4 to address the next sequential instruction - the IR is used to hold the instruction that will be needed on subsequent clock cycles - the NPC is used to hold the next sequential PC (program counter)
A <- Regs[IR6..10]Operation: - Decode the instruction and access the register file to read the registers. - the output of the general-purpose registers are read into two temporary registers (A and B) for use in later clock cycles. - the lower 16 bits of the IR are also sign-extended and stored into the temporary register IMM, for use in the next cycle. - decoding is done in parallel with reading registers, which is possible because these fields are at a fixed location in the DLX instruction format. This technique is known as fixed-field decoding.
Memory reference:Operation:ALUOutput <- A +ImmOperation: The ALU adds the operands to form the effective address and places the result into the register ALUOutput -The ALU adds the NPC to the sign-extended immediate value in Imm to compute the address of the branch target. -Register A, which has been read in the prior cycle, is checked to determine whether the branch is taken. - the comparison operation op is the relational operator determined by the branch opcode (e.g. op is "==" for the instruction BEQZ) ![]() The only DLX instructions active in this cycle are loads, stores, and branches. Memory reference:LMD <- Mem[ALUOutput] or Mem[ALUOutput] <- BOperation: ![]() Register-Register ALU instruction:Regs[IR16..20] <- ALUOutputRegister-Immediate ALU instruction:Regs[IR11..15] <- ALUOutputLoad instruction:Regs[IR11..15] <- LMDOperation: The Basic DLX PipelineDLX datapath with almost no changes by starting a new instruction on each clock cycle. Each of the clock cycles of the DLX datapath now becomes a pipe stage: a cycle in the pipeline.While each instruction takes five clock cycles to complete, during each clock cycle the hardware will initiate a new instruction and will execute some part of the five different instructions. The typical way to show what is going on is:
Let's check again what happens on every clock cycle of the machine and make sure it does not perform two different operations with the same datapath resource on the same clock cycle. For example, a single ALU can not compute an effective address and perform a subtract operation at the same time. Fortunately, the simplicity of the DLX instruction set makes resource evaluation relatively easy. The major functional units are used in different cycles and hence overlapping the execution of multiple instructions introduces relatively few conflicts.
|