The aim of this web page is to provide an introduction to the DLX instruction set, created in Computer Architecture: A Quantitative Approach by Hennessy and Patterson. If you have some programming experience, but only in (relatively) high level languages like C/C++, understanding basic DLX commands and code fragments is well within your realm, despite what you may think after trying to read Hennessy and Patterson's opaque tome. Unfortunately, shining a light through more than a few pages of that monstrosity is beyond the scope of this web page, but if you've found Chapter 2 to present some hard slogging, then herein you have found your Mecca.
What makes up the basic DLX machine?
DLX Commands: Explanations and Examples
Exercises and Questions for Review
Of course, the machine based on the DLX instruction set is a total work of fiction. If one existed, however, it would be a 32 bit machine, i.e., each word would be four bytes. DLX is Big Endian, as opposed to Little Endian, which means that a DLX address first accesses the byte in the most significant position when it's getting a word out of memory. Another issue, byte alignment, is beyond the scope of this web page, but is something you should probably worry about for exam purposes.
The DLX machine is a general-purpose register (GPR) machine, and as such has at it's core a bunch of registers. The ones you really need to worry about are the
integer registers, R0, R1, . . . , R31,
the
GPRs, and the
floating point registers, F0, F1, . . . , F31,
the
FPRs.
Each kind of register holds what you'd expect from the name, with one twist: DLX handles floating point numbers of both "single precision" - 32 bits or four words - and "double precision" -- up to 64 bits, i.e., two words. You know these data types as "doubles" and "floats" from C++. To accommodate double-precision floats, you need to use two consecutive floating point registers, paired together, starting with one that's even-numbered and continuing with one that's odd-numbered. Warning: some operations, notably multiplication and division, can only be performed in the FP registers, even when the operands are integers. Later, we'll see code fragments that move data from integer registers to FP registers, perform the desired operations, and move the data back. (This is a good thing to know how to do because it seems to attract professors looking for exam questions like you-know-what does flies.)
To keep things confusing (and to provide more exam fodder), besides the two FP data types, DLX has three integer data types: 8 bit (1 byte), 16 bit (2 byte, or half-word), and 32 bit (4 byte, or word), respectively. The confusing aspect arises when you load less that a word into a register, because you have to worry about the part of the register that doesn't contain the data you just loaded. For example, say you've loaded a word into a register. That word accounts for 8 bits, but the register holds 32 bits. Arithmetic comprehensible to even the dullest moron tells you that you've got another 24 bits floating around out there somewhere in la-la land to worry about. Fortunately, what you do with these left-over bits isn't too difficult, or surprising: you just fill them with 0s. Later, we'll see examples of commands that load data consisting of less than a word into memory.
H&P tantalizingly tell you that DLX has a few other "special"
registers. These aren't things you'll worry about too much; the
one you really need to worry about is the first integer register,
R0
,
because it's value is always 0. What's the purpose of R0
?
Well, DLX, as you're no doubt aware, is a RISC architecture. (If
you didn't know this, you might as well hang up your cleats
right here and now.) When H&P (p.98) tell you that "we
can use this register to synthesize a variety of useful operations
from a simple instruction set" what they mean is that they're
going to use a few tricks to fake out some of the commands that
DLX doesn't explicitly support. This leads us to our dear friend,
R0
. Most importantly, while there are a ton of different
ways to access memory (that's what all those addressing modes on
page 75 are), DLX only explicitly supports one, displacement.
As we'll see below, a few of the others are effectively emulated
with R0
. We'll also see examples of R0
's
usefulness when we look at jumps and branches (i.e., when
implementing a loop one can use R0
as an easy point
of reference for a counter).
Finally, you should understand that DLX is a (0,3) GPR machine. For exam purposes you undoubtedly need to understand the GPR (m,n) format, and know which machine is which, and why, but for DLX programming purposes all you really need to know is that DLX operations take up to three operands, and that ALU operations and memory accesses cannot be combined. For example,
add R3, R2, 256(R1); R3 = R2 + the contents of the memory
location pointed to by R1+256
would not work in DLX (although it might in a (1,3) GPR machine). Instead,in DLX, you'd need to do the following:
LW R4, 256(R1); load contents of memory location R1+256
into R4
add R3, R2, R4; now we can add: R3 = R2 + R4
As you've probably figured out from the above examples, in DLX, when you're specifying a memory location, you put whatever it is that refers to the memory location in parentheses. And in DLX, as the next section explain, the thing in parentheses will always be a register.
There are three different kinds of objects a CPU may need to access: constants (known in DLX-land as immediates), registers, and memory locations. When H&P are talking about addressing modes, it's easy to forget that they're talking about anything other than a memory location, because we're used to thinking about memory addresses. Thus, it's worth stating the obvious, to wit: that any real computer program will have both constants and will reuse data, and in DLX we have to load data items into registers from memory before we can process them, all of which means that DLX commands address all three kinds of objects. An addressing mode is simply the way in which you describe the object you want to access.
What is an addressing mode will probably become most clear when you
look at some of the examples below, but for starters consider
that, just as you have different modes of address for different
people, an instruction set has different addressing modes
for different kinds of objects. And just as you might address
the same person in a number of different ways, depending on the
context of your address (e.g, "Bill", "Mr.
President", "hey you", and "Mr. Spineless"
can all be used to address the same person) an ISA can use a number
of different modes of address to get at the same object. While
some cruel 411 instructor may make you know all of them for an
exam (see table on page 75), for DLX you need to worry about only
five addressing modes, two of which DLX doesn't actually support,
but can simulate with our friend R0
. DLX's five
addressing modes (with the two indirectly supported modes listed
last) are as follows:
1) Register - this mode you use most of all; it's what you use any time you specify a register. For example, see about any code fragment contained herein, including those in the preceding section, which reference registers. This addressing mode is so basic that H&P don't even bother to mention it as a part of DLX (not that H&P's failure to mention something is a reliable indicator of whether or not it's basic). Remember, any time you access any object, you're using an addressing mode -- just as you are any time you speak to another person.
4) Register deferred - this address mode is allows you
to access a memory address contained in a register. In essence,
it's the displacement mode without the displacement. In DLX, you
fake this mode by simply using 0 as your offset. Example:
LW R17, 0(R23); put contents of address specified by R23
into R17
5) Absolute - this addressing mode, also known as direct
addressing, allows you to specify an address (e.g., (2001)
)
to be accessed. This one is faked out in DLX by using R0
as your register, and whatever address you want to access as your
displacement. Example:
LW R17, 2001(R0); put contents of address 2001 into R17
On page 99 of their monster tome, H&P, with their usual prolixity,
describe the format of DLX instructions in a brief paragraph accompanied
by a small diagram. There are three formats for DLX instructions:
I-type (I is for immediate), R-type (R is
for register), and J-type (J is for jump).
All DLX instructions are 32 bits long, and commence with a 6 bit
opcode. The opcode is simply nothing more than a 6 bit
binary number that represents a particular DLX command. For example
(and I made this up off the top of my head) suppose the opcode
for add
is 110011
. Then the first six
digits of all add
instructions will be 110011
.
Think of a DLX instruction just as you would a programming command in a high-level language. The opcode is like a keyword, and what follows are its arguments. That's what Figure 2.21 on page 99 shows you: what arguments follow the opcode for each type of instruction. For example, the only argument a J-type instruction takes is a 26 bit offset to the PC. This makes sense: when a program branches, what you're doing is skipping to a selected spot in the program, i.e., telling the PC to go next not to the next instruction in the sequence, but the one specified by the offset. Similarly, an I-type instruction takes as arguments two registers, one a target and one an operand, as well as an immediate, the other operand.
Most of the DLX commands you'll actually need to know (to succeed in CMSC 411, at any rate) are actually pretty easy. The syntax for a DLX command is, in general:
<opcode> <target> <source(s)>
Specifically, you'll usually use one of the following:
<command> <operand register> <immediate>
<command> <target register> <operand register>
<operand register>
<move command> <target> <source>
<branch command> <label of place in code to go
to>
That is, you say what you're going to do, specify which register receives the result, and which registers are accessed to get the result. Obviously, a number of different kinds of commands are described above; don't worry if these descriptions don't make sense to you right off the bat. The four different kinds of instructions in DLX are: (1) ALU operations, (2) data transfers, (3) branches and jumps, and (4) floating point operations. The rest of this section considers each of them in turn, after a brief note on notation.
Although it may not be obvious from the immediately preceding section, DLX syntax, at least insofar as you need to worry about it, is actually pretty simple. Anyone who's ever wrestled with a header file defining inherited class with pure virtual functions and all that junk in C++ will find mastering the amount of DLX required for this course to be a breeze. The hard part is understanding H&P's complicated and convoluted hardware notation. My advice to you with respect to the hardware notation is that, even though H&P found it so important they put it on the inside back cover of the book, you should ignore it. If you're taking CMSC 411 from the instructor who commissioned this web page, at least, you can follow this advice without any problems. You might need to struggle through a few examples in the book, since their hardware notation is the only commenting that H&P provide, but apart from that, you're better off spending your time worrying about what the code does as opposed to the difficult-to-decipher descriptive scheme concocted by H&P. This web page follows its own advice and ignores H&P's hardware notation. Everything is commented in English. It may be bulkier, but at least you have a shot at comprehending it.
ALU operations are at the core of most computer programs. For the most part, they consist of simple arithmetic and simple logic.
These are so easy that we include them here only for the sake of completeness. Examples:
ADD R3, R2, R1; R3 = R2 + R1
SUB R3, R2, R1; R3 = R2 - R1
Furthermore, ADDI
, ADDU
, ADDUI
,
SUBI
, SUBU
, and SUBUI
all
work like the above; simply be aware that the "U" means
unsigned and the "I" means immediate, and you should
use them as appropriate for your operands (I know at least one
CMSC 411 professor who will take points off if you don't!).
Multiplication and division work in a fashion similar to add and subtract, save that these operations can only be performed on data contained in floating-point registers. We'll take a look at how to move data from an integer register to a floating point register in the section on data transfers, below.
AND
and OR
work just like the logical
AND
and OR
you're used to from languages
like C++:
AND R3, R2, R1; if R1==R2 R3 gets value 1, else R3 gets
value 0
OR R3, R2, R1; if R2 != 0 or R1 != 0, R3 gets value 1, else
R3 gets value 0
Other basic logical operations include XOR
, ANDI
(and immediate), ORI
(or immediate), and XORI
(xor immediate). They work as you'd expect.
Now for the twists. The first low-level pain in the rear to worry about is load high immediate:
LHI, R19, #429; value 429 goes in upper half of R19, lower
half of R19 gets 16 0s
Remember, immediates are 16 bits. If you want to put an immediate
in the upper half, i.e., the upper two words, of a memory
location, well, then, by gum, LHI
is just what you're
looking for. And don't worry about the rest of the register --
LHI
quite thoughtfully fills the lower two bytes
with 0s.
Another logical operation that will probably seem familiar from
CMSC 311 (remember all those damn shift registers?) is the shift.
A shift command takes what's in a register, shifts it to the left
or right a specified number of places, and puts the result in
the target register, like this:
SLL, R6, R5, R3; shift R5 to the left by the amount specified
in R3; place value in R6
SLLI, R6, R5, #4; shift R5 left 4 places and put result
in R6
Other shift commands are SRL
(shift right logical),
SRA
(shift right arithmetic), SLLI
(shift
left logical immediate), SRLI
(shift right logical
immediate), and SRAI
(shift right arithmetic immediate).
The arithmetic shifts differ from the logical shifts in that they
operate on signed two's-complement numbers to preserve the sign
bit upon the shift. (Actually, while this is probably enough to
know, it's a little more complicated than this. If, like me, you
weren't taught this concept in the prerequisite to CMSC 411, you
might want to check out Chapter 9 of Logic and Computer Design
Fundamentals by Mano and Kime.)
The category of arithmetic/logical operation we need to worry
about are those that set a conditional. DLX allows you
to set the following conditions: LT
(less than),
GT
(greater than), LE
(less than or
equal), GE
(greater than or equal), EQ
(equal) and NE
(not equal). The syntax for setting
a condition is:
<set condition> <target register> <operand
1> <operand 2>
The target register takes on the value of 0 or 1, depending on whether or not comparing the operands meets the established condition. For example:
SGE R1, R15, R28; if R15 >= R28 then R1 = 1 else R1 =
0
SGEI R1, R27, #1089; if R27 >= 1089 then R1 = 1 else
R1 = 0
DLX includes three kinds of data transfers: loads, stores, and moves.
Loads and stores are fairly simple, and are essentially the inverse of one another. For example:
LW R5, 0(R3); load the word in the memory location pointed
to by R3 into R5
SW 0(R3), R5; store the word in R5 in the memory location
pointed to by R3
A complete list of load and store commands for integers is as
follows: LB
(load byte), LBU
(load byte
unsigned), LH
(load half byte), LHU
(load half unsigned), LW
(load word), SB
(store byte), SH
(store half), SW
(store
word). Of course, if you load or store less than a word, that
is, a byte or a half, then you have to worry about what happens
to the rest of the word. Because DLX is Big Endian,
when it works with less than a word it fills the most significant
byte(s) of the word, and puts 0s in the rest of the word. For
example, loading a byte into a register will put that byte into
positions 24-31 of the register, and will put 0s into the register's
positions 0-23. Similarly, storing a half word stores two bytes
of data in positions 16-31 of the designated memory address, and
0s in positions 0-15.
The above commands all act on the integer registers. For floats, you need to know about LF (load float), LD (load double), SF (store float), and SD (store double). These work just like their integer counterparts.
Finally, you should make sure you understand the concept of sign extension. No doubt you recall from CMSC 311 that signed two's-complement numbers use their most significant digit for the sign bit. Sign extension simply means that DLX fills empty bits in a word with the sign bit. For example, as H&P describe on page 101, when you use an immediate in an ALU operation, what you get is a 16 bit sign extended immediate - i.e., the upper two bits of the word are filled with the sign bit of the immediate.
DLX has three categories of moves: (1) MOVI2S
and
MOVS2I
-- moves between integer registers and special
registers, (2) MOVF
and MOVD
-- copies
(not moves) between single precision and double precision FP registers,
and (3) MOVFP2I
and MOVI2FP
-- moves
between FP registers and special registers. Notice that almost
everywhere else in DLX when you see and "I" it stands
for "immediate", but here it means "integer".
The syntax for move commands is:
<move> <target> <source>
It's really that simple. The main issue with moves is remembering to do them when you have data in one register and need to perform an operation that can only be done in another register.
H&P explain (page 80) that they refer to a " jump when the change in control is unconditional and a branch when the change is conditional." Fair enough, you say, but then you wonder "what's a change in control?" Nothing too complicated it turns out; jumps and branches are just like function calls, ifs, and the like in C++. Of course, when you get to pipelining, jumps and branches are a major headache, but from a pure DLX programming standpoint, you shouldn't find them to be too tough.
Ordinarily, in DLX, just like the high level languages it mirrors, instructions are sequentially executed. That is, whenever the CPU gets a new instruction the PC, or program counter, increments by 4 (remember, DLX memory addresses contain four bytes each), and the CPU fetches, decodes, executes, etc. the next instruction in the sequence. A branch mucks up this neat little scheme - just how badly you'll learn when you get to pipelining. For the moment, you need to understand that, when a jump or branch occurs, the program breaks out of its sequential execution, and goes to the instruction named by the branch or jump. In the case of a branch, the sequential execution is broken only if the condition on which the branch is predicated is met. As we explained when we looked at the DLX instruction format, J-type instructions - jumps and branches - have a 26 bit offset which is added to the PC. This offset contains the address of the instruction which should be fetched next in place of the instruction which would otherwise come next in the sequence. From a DLX programming standpoint, this means that whenever you implement a jump or a branch, you include in your command the name of the jump's or the branch's destination. Examples of all the control commands you need to know should make this more clear:
BEQZ, R12, subroutine; if R12 == 0 go to line labeled "subroutine"
BNEZ, R12, subroutine; if R12 != 0 go to line labeled "subroutine"
Branch if equal zero and branch if not equal to zero
are more or less self explanatory. So is the basic jump.
A jump and link is a little more complicated, but not much.
With this command, execution jumps to the line you specify, but
the location of the next instruction in the sequence (PC
+ 4
) is stored in R31
. Similarly, with jump
register and jump and link register, you specify the
register to jump to, and, if you also specify a link, PC + 4 is
again stored in R31
.
J loop; go to line labeled "loop"
JAL loop; go to line labeled "loop" and put PC
+ 4 in R31
JALR R21; jump to instruction whose address is in R21 and
put PC + 4 in R31
JR R15; jump to register whose address is R15.
DLX also has two commands which test the vlaue of the FP status
register. This register is another one of those special registers (like
R0
or the PC
) whose existence we mentioned
earlier. This register, true to its name, reflects the status
of floating point operations. Naturally, we want a way to get
at that valuable information about how the old FP operations are
coming along. That's where BFPT
(branch floating
point true) and BFPF
(branch floating point false)
come in.
DLX has two other control commands: RFE
(return from
exception) and TRAP
(transfer to operating system).
All you need to pretend you know is that they do what you'd think
from hearing their names, and beyond that we'll not go here.
The main fact that should concern you with respect to floating
point operations is their existence. That is, be aware that when
you're dealing with the FP registers, you need to use the commands
that manipulate data in those registers. If you want to add
two floats, use ADDD
(add double) or ADDF
(add single-precision float). Likewise for subtracting, multiplying,
and dividing. And of course, as we've already noted, you must
use the float registers to multiply and divide, even if your
operands are integers. Note also that there are special commands
for comparing floats just as there are for integers (and they
are all analogous). Finally, there are a host of commands to allow
you to convert between floats, doubles, and integers. I never
used them in CMSC 411 and I don't see why you should either.
Here are a few simple sample snippets of DLX code to get you started.
A few more sample snippets can be found in the answers to
the exercises below; these answers aren't commented, so you'll
need to look at the code-writing exercises to
see what these snippets are supposed to do. Actually, you can
probably find all the sample code you need on old exams, but just
in case you can't, here goes:
Here we place 4 and 2 in two integer registers, move them to FP registers so we can multiply them, and then move the result back to an integer register so we can store it as an integer. Note that, since we're creating what's in the integer registers by adding immediates, we could have simply added the immediates to the FP registers. Such an implementation of our multiplication would have created a pedagogical shortfall, however, because we would have lost an opportunity to display our moves.
ADDI R1, R0, #4; R1 now contains 4
ADDI R2, R0, #2; R2 now contains 2
MOVI2FP F1, R1; contents of R1 (the value 4) moved to F1
MOVI2FP F2, R2; contents of R2 (the value 2) moved to F2
MULT F3, F2, F1; F3 now contains 8 (2*4)
MOVFP2I R3, F3; contents of F3 (the value 8) moved to R3
SW 7000(R0), R3; integer 8 stored in memory location 7000
Here's a loop for you. This code fragment checks the elements of a 10 element array for the value 0, stopping when the value is found, or when all 10 elements have been checked. If a zero is found, the code stores a non-zero value (signifying "true"), else it stores a 0 (signifying "false"). Kind of confusing, eh? This truly is a code fragment in the proud spirit of H&P!
ADDI R31, R0, #1 ; R31 = 9
loop: LW R15, 0(R1); put integer in location R1 into R15
BEQZ R15, done; if R15 == 0 we need to exit the loop now
ADDI R1, R1, #4; make R1 point to the next element in the array
SUBI R31, R31, #1; decrement R31 by 1
BNEZ R15, loop; if R15 != 0, we need to check the next value
done: SW 256(R0), R31; R31 == 0 only if we FAILED to find a 0
in the array
Here's another loop. Assume we have a float, X
, in
F1, and a positive integer, I
, in R1. We want to
raise X to the I
th power.
ADDI R2, R0, #1; R2 = 1
MOVI2FP F3, R2; F3 = 1
loop: MULT F3, F3, F1; F3 = F3*F1
SUBI, R1, R1, #1; decrement R1 by 1
BNEZ, R1, loop; if R1 != 0 then we need to continue
SW 3000(R0), F3; store result
The following problems are here more to test your knowledge than to challenge you. The point is that if you get somewhat comfortable with DLX, issues you may be tested on won't present any more difficulty than programming problems presented in any of the languages you've already worked in.
Figure out what's wrong with the following lines of code:
ADDI R34, R5, R0;
LW 0(R12), F23;
SUB 0(R26), F4, R3;
J R11;
MULT R9, R8, R7;
LD F15, 0(R1);
Write a line or two of code that does the following:
R19
is equal to zero,
and branches to a line labeled "loop
" if
it is not.
R16
,
and saves the address of the next instruction (PC+4
)
in R31
.
R1
and R2
and stores the result.
Since there is always more than one way to skin a cat, your answers may be different - but not too different, because the code here is all very short and simple - from the answers below and still right.
Here are a few concepts related to DLX with which you should be conversant. Each term is a link to the place in this web page where we explain it.
Addressing modes (can you name and describe all the kinds DLX uses?)
back to top (no, this isn't a DLX term!)
Here are answers to some of the review questions:
ADDI
means you're adding an immediate, and R0
,
though a special register (remember, it's always 0), is not an
immediate.J
, a plain old jump command takes the name of
the line of code (e.g., "loop
" or
"subroutine1
") as its argument; if you
want to specify an address in a register for the jump target,
use JR
.MULT
only works on data in FPRs.BNEZ R19, loop;
JALR R16;
MOVI2FP F1, R1;
MOVI2FP F2, R2;
MULT F3, F2, F1;
SW 100(R0), F3;
LD F6, 0(R10);
LD F8, 8(R10);
MULTD F10, F6, F8;
SD 16(R10), F10;
ADDI R10, R0, #10;
MOVI2FP F10, R10;
SUBI F9, F10, #1;loop: MULT F10, F10, F9;
SUBI F9, F9, #1;
BNEZ F9, loop;
SW 2000(R0), F10;