We have provided an assembler, asm.c, so that you can assemble programs for your simulator. The asm.c file is fully functional, and you will not need to make any modifications to this file. Simply use the Makefile to make the binary asm from the asm.c source file.
The format for assembly programs is very simple. A valid assembly program is an ASCII file in which each line of the file represents a single instruction, or a data constant. The format for a line of assembly code is:
label<tab>instruction<tab>field0<tab>field1<tab>field2<tab>comments
The leftmost field on a line is the label field which indicates a symbolic address. Valid labels contain a maximum of 6 characters and can consist of letters and numbers. The label is optional (the tab following the label field is not). After the optional label is a tab. Then follows the instruction field, where the instruction can be any of the assembly-language mnemonics listed in Table 1. After another tab comes a series of fields. All fields are given as decimal numbers. The number of fields depends on the instruction. The following describes the instructions and how they are specified in assembly code:
lw rd rs1 imm Reg[rd] <- Mem[Reg[rs1] + imm] sw rd rs1 imm Reg[rd] -> Mem[Reg[rs1] + imm] beqz rd rs1 imm if (Reg[rs1] == 0) PC <- PC+4+imm addi rd rs1 imm Reg[rd] <- Reg[rs1] + imm add rd rs1 rs2 Reg[rd] <- Reg[rs1] + Reg[rs2] sub rd rs1 rs2 Reg[rd] <- Reg[rs1] - Reg[rs2] sll rd rs1 rs2 Reg[rd] <- Reg[rs1] << Reg[rs2] srl rd rs1 rs2 Reg[rd] <- Reg[rs1] >> Reg[rs2] and rd rs1 rs2 Reg[rd] <- Reg[rs1] & Reg[rs2] or rd rs1 rs2 Reg[rd] <- Reg[rs1] | Reg[rs2] halt stop simulation
Note that in the case of the beqz instruction, PC-relative addressing is used (and again, your simulator should not perform the left-shift when computing the PC-relative branch target). For the lw, sw, and beqz instructions, the imm field can either be a decimal value, or a label can be used. In the case of a label, the assembler performs a different action depending on whether the instruction is a lw / sw instruction, or a beqz instruction. For lw and sw instructions, the assembler inserts the absolute address corresponding to the label. For beqz instructions, the assembler computes a PC-relative offset with respect to the label.
After the last field is another tab, then any comments. The
comments end at the end of the line.
In addition to instructions, lines of assembly code can also include directives for the assembler. The only directive we will use is .fill. The .fill directive tells the assembler to put a number into the place where the instruction would normally be stored. The .fill directive uses one field, which can be either a numeric value or a symbolic address. For example, ``.fill 32'' puts the value 32 where the instruction would normally be stored. In the following example, ``.fill start'' will store the value 8, because the label ``start'' refers to address 8 (remember that the MIPS architecture uses byte addresses).
addi 1 0 5 load reg1 with 5 addi 2 0 -1 load reg2 with -1 start add 1 1 2 decrement reg1 lw 3 0 var1 loads reg3 with value stored in var1 addi 3 3 -1 decrement reg3 sw 3 0 var1 put reg3 back (thus var1 is decremented) beqz 0 1 done goto done when reg1==0 beqz 0 0 start back to start add 0 0 0 done halt .fill start will contain start address (8) var1 .fill 32 Declare a variable, initialized to 32
Try taking the above example and running the assembler on it. Enter
the above assembly code into a file called ``ex.s''. Then type ``asm
ex.s ex.out''. The assembler will generate a file ``ex.out'' which
should contain:
(address 0x0): 20010005 (address 0x4): 2002ffff (address 0x8): 00220820 (address 0xc): 8c03002c (address 0x10): 2063ffff (address 0x14): ac03002c (address 0x18): 10200008 (address 0x1c): 1000ffe8 (address 0x20): 00000020 (address 0x24): fc000000 (address 0x28): 00000008 (address 0x2c): 00000020The assembler assumes that all programs will be loaded into memory beginning at address 0x0.