3 a86: a Little Assembly Language
You need to let the little things that would ordinarily bore you suddenly thrill you.
3.1 Overview
x86 is an instruction set architecture (ISA), which is a fancy way of saying a programming language whose interpreter is implemented in hardware. Really, x86 is a family of instruction set architectures, and the first member of the family was 8086, a 16-bit ISA for Intel’s 8086 chipset.
x86 is old. It’s older than the professors teaching this class. But it lives on today in Intel and AMD based computers in the form x86-64, the 64-bit descendant of the 8086 language.
Because it’s old and because the design of each generation of x86 has had significant backwards compatability requirements and because modern processors are sophisticated pieces of machinery, the x86-64 language is, well, complicated. For example, Intel’s x86 software developer’s manual is 5,066 pages long. AMD’s manual is 3,242 pages.
x86-64 is going to be used as the target language of our compiler. We’ll build up a translation from a very high-level language, based on Racket, down to the very low-level langauge of x86-64.
However, that doesn’t mean we need to digest 8K+ pages of dense technical manuals. We will only use very small fraction of the x86-64 instruction set.
To make our lives easier, we will do what programming language designers often do, we will abstract the behemoth of x86-64 to a small, core language (which we call a86). Our compilers will target a86 and compiling from a86 to x86 as the last step in the compiler pipeline will be dead simple.
This chapter describes the a86 language.
3.2 Giving x86 a try
Before describing a86, let’s take a brief look at x86.
There are a few different human-readable formats for writing x86 assembly code, but we’ll be using the one supported by the Netwide Assembler (NASM).
Here’s an example x86 program, written using nasm syntax. The program has one global label called entry, which will be the main entry point for the program. This program computes the 36th triangular number, which will reside in register rax when the code returns.
The conventions for label names differ between Mac and Linux systems. On MacOS, you need to prefix all label names with an underscore, while on Linux you do not. So on a Mac, you would use the names _entry, _tri, and _done, while on Linux you would use entry, tri, and done.
This example is shown using the MacOS naming convention, because that’s what operating system was used when this web page was built.
default relsection .text global _entry _entry: mov rbx, 36 ; the "input" ;;; tri: a recursive function for computing nth ;;; triangular number, where n is given in rbx. _tri: cmp rbx, 0 ; if rbx = 0, done je _done push rbx ; save rbx sub rbx, 1 call _tri ; compute tri(rbx-1) in rax pop rbx ; restore rbx add rax, rbx ; result is rbx+tri(rbx-1) ret _done: ; jump here for base case mov rax, 0 ; return 0 ret
The nth triangular number is the sum of the integers from 0 to n, so the 36th triangular number is 0 + 1 + 2 + 3 + ... + 34 + 35 + 36.
This code is not intended to be a model of efficient computation, rather it demonstrates some important x86 instructions and shows how to compute in a recursive style, even at a low-level.
Without getting too bogged down in the details, here how the code works. Instructions execute one after another. There are a number of registers which can be used to hold values. This code makes use of the rax and rbx register (and some other registers are implicitly used and altered by the call, push, pop and ret instructions). The lines like entry:, tri:, and done: are not instructions, but labels – they are names for locations in the source code and can be used as the targets of jumps and calls.
Suppose we start executing at entry.
mov rbx, 36 sets the rbx register to 36.
cmp rbx 0 compares the value in register rbx to zero. Executing this instruction sets flags in the CPU, which affect subsequent “conditional” instructions. In this program, the next instruction is a conditional jump.
je done either jumps to the instruction following label done or proceeds to the next instruction, based on the state of the comparison flags. The je instruction jumps if the comparison was equal, so control jumps to done if rbx was 0 in this program. If not, the next instruction is executed.
push rbx uses memory as a stack to save the value of rbx. Under the hood this is modifying a register that holds a pointer to the stack memory location (register rsp).
sub rbx, 1 decrements rbx by 1.
call tri performs something like a function call; it uses memory as a stack to save the current location in the code (which is where control should return to after the function has completed). After saving this return pointer, it jumps to the label tri. There aren’t really functions, but this uses the stack to mimic the call-and-return mechanism of functions.
pop rbx uses the stack memory to pop off the top element and move it into rbx, adjusting the stack pointer appropriately. This has the effect of restoring rbx to the value saved earlier by the push, i.e. before the decrement and any changes done in the call to tri.
add rax, rbx updates rax to hold rax plus rbx.
ret does a “return,” i.e. it pops an address from the stack and jumps to it. In this case, the jump either returns from to a previous call to tri or to original caller of entry.
mov rax, 0 this instruction is only reached from the earlier conditional jump. It sets rax to 0. This program computes its result in rax so this is saying that when rbx (the “input”) is 0, then (the “output”) is 0.
ret does a “return,” either to a prior call to tri or the caller of entry.
Despite the lower-level mechanisms, this code computes in a way similar to a non-tail recursive definition of the tri function written in Racket:
(define (tri n) (if (= n 0) 0 (+ n (tri (sub1 n))))) (tri 36)
As an exercise to check your understanding, try writing a tail-recursive version of tri and the corresponding x86 code, which should not need to push anything to the stack or use the call instruction.
We can compile the tri.s assembly program to an object file with nasm:
The format argument should be macho64 on Mac OS and elf64 on Linux.
shell
> nasm -f macho64 -o tri.o tri.s
To run the object file, we will need to link with a small C program that can call the entry label of our assembly code and then print the result:
#include <stdio.h> #include <inttypes.h> int64_t entry(); int main(int argc, char** argv) { int64_t result = entry(); ("%" PRId64 "\n", result); printfreturn 0; }
Notice that from the C program’s perspective, the assembly code defines what looks like a C function called entry that returns an int64_t result.
How does this work? When the C program calls entry it places a return pointer on the stack and jumps to entry. The fact that we decided to put the result in register rax was not arbitrary – that’s the register that by convention is used for a return value. When the assembly code executes it’s final ret instruction, it jumps back to C with the 36th triangular number in rax, which the C side knows is the return value. This convention is part of a larger set of conventions known as the Application Binary Interface. For a reference, see the Texts section of the notes.
We can compile the main.c C program to an object file with gcc:
shell
> gcc -c main.c -o main.o
Now we can make an executable by linking the two together:
shell
> gcc main.o tri.o -o tri ld: warning: no platform load command found in '/Users/dvanhorn/git/www/langs/a86/tri.o', assuming: macOS
Finally, we can run the executable:
shell
> ./tri
There, of course, is a lot more to x86-64 than what’s been shown here. If you want to dig deeper, check the references in Texts. But now let’s turn to a86.
3.3 a86: Representing x86 Code as Data
Here we will employ one of the great ideas in computer science: we will represent programs as data. Rather than toil away at the level of x86, writing programs directly in nasm syntax, compiling, and running them, we will instead design a data type definition for representing x86 programs and compute programs.
Our representation will not be complete – this is going to help us simplify x86 down to something manageable. We won’t cover every instruction and we won’t cover every variation of the instructions we do cover.
An a86 program is a list of a86 instructions. Each instruction is represented as a structure, described in the following section.
Before working through these examples, you’ll need to install the a86 module, part of the langs package for this course. See The langs package for details on installing.
Here’s the triangular number example:
Examples
; import the a86 library > (require a86) ; a86 code that computes the 36th triangular number
> (define tri-36 (list (Global 'entry) (Label 'entry) (Mov 'rbx 36) (% "the \"input\"") (%%% "tri: a recursive function for computing nth") (%%% "triangular number, where n is given in rbx.") (Label 'tri) (Cmp 'rbx 0) (% "if rbx = 0, done") (Je 'done) (Push 'rbx) (% "save rbx") (Sub 'rbx 1) (Call 'tri) (% "compute tri(rbx-1) in rax") (Pop 'rbx) (% "restore rbx") (Add 'rax 'rbx) (% "result is rbx+tri(rbx-1)") (Ret) (Label 'done) (% "jump here for base case") (Mov 'rax 0) (% "return 0") (Ret)))
And who doesn’t love more parentheses?
So for example, let’s say you have two a86 programs and you want to glue them together into one: well that just append. Suppose you want to compute which registers are used in a given a86 program? Suppose you want to replace uses of 'rax with 'rdi? It just a matter of writing the right Racket function to do it.
Here’s another immediate benefit. Instead of writing a single x86 program, let’s write an infinite set of a86 programs, one for computing each nth triangular number. Easy-peasy:
Examples
; Natural -> a86 ; Computes a86 code that computes the nth triangular number
> (define (tri n) (list (Global 'entry) (Label 'entry) (Mov 'rbx n) (% "the \"input\"") (%%% "tri: a recursive function for computing nth") (%%% "triangular number, where n is given in rbx.") (Label 'tri) (Cmp 'rbx 0) (% "if rbx = 0, done") (Je 'done) (Push 'rbx) (% "save rbx") (Sub 'rbx 1) (Call 'tri) (% "compute tri(rbx-1) in rax") (Pop 'rbx) (% "restore rbx") (Add 'rax 'rbx) (% "result is rbx+tri(rbx-1)") (Ret) (Label 'done) (Mov 'rax 0) (Ret))) ; recreate original program > (define tri-36 (tri 36))
It’s also easy to go from our data representation to its interpretation as an x86 program.
There is a function provided for printing an a86 program as an x86 program using nasm notation, called asm-display. Calling this function prints to the current output port, but it’s also possible to write the output to a file or convert it to a string.
The asm-display function knows what OS you are using and adjusts the label naming convention to use underscores or not, so that you don’t have to worry about it.
Examples
> (asm-display (tri 36))
default rel
section .text
global _entry
_entry:
mov rbx, 36 ; the "input"
;;; tri: a recursive function for computing nth
;;; triangular number, where n is given in rbx.
_tri:
cmp rbx, 0 ; if rbx = 0, done
je _done
push rbx ; save rbx
sub rbx, 1
call _tri ; compute tri(rbx-1) in rax
pop rbx ; restore rbx
add rax, rbx ; result is rbx+tri(rbx-1)
ret
_done:
mov rax, 0
ret
Notice how this generates exactly what you saw in tri.s.
From here, we can assemble, link, and execute.
We can also, since we have a general purpose programming language at our disposal in the meta-language, write a program to do all that for us:
Examples
> (asm-interp (tri 36)) 666
The asm-interp function consumes an a86 program as input and produces the integer result the program computes, i.e. it is an Interpreter for a86. Behind the scenes it does this by converting to nasm, assemblying, compiling a thin C wrapper, executing the program, and reading the result. This will be a rather handy tool both in interactively exploring the a86 language (you can write assembly in a REPL), but also an important tool when it comes time to test the compilers we write.
3.4 Stacks: pushing, popping, calling, returning
The a86 execution model includes access to memory that can be used as a stack data structure. There are operations that manipulate the stack, such as Push, Pop, Call, and Ret, and the stack register pointer 'rsp is dedicated to the stack. Stack memory is allocated in “low” address space and grows downward. So pushing an element on to the stack decrements 'rsp.
The stack is useful as a way to save away values that may be needed later. For example, let’s say you have two (assembly-level) functions and you want to produce the sum of their results. By convention, functions return their result in 'rax, so doing something like this won’t work:
(seq (Call 'f) (Call 'g) (Add 'rax ...))
The problem is the return value of 'f gets clobbered by 'g. You might be tempted to fix the problem by moving the result to another register:
(seq (Call 'f) (Mov 'rbx 'rax) (Call 'g) (Add 'rax 'rbx))
This works only so long as 'g doesn’t clobber 'rbx. In general, it might not be possible to avoid that situation. So the solution is to use the stack to save the return value of 'f while the call to 'g proceeds:
(seq (Call 'f) (Push 'rax) (Call 'g) (Pop 'rbx) (Add 'rax 'rbx))
This code pushes the value in 'rax on to the stack and then pops it off and into 'rbx after 'g returns. Everything works out so long as 'g maintains a stack-discipline, i.e. the stack should be in the same state when 'g returns as when it was called.
We can make a complete example to confirm that this works as expected. First let’s set up a little function for letting us try out examples:
Examples
> (define (eg asm) (asm-interp (prog (Global 'entry) (Label 'entry) asm ; the example code we want to try out (Ret) (Label 'f) ; calling 'f returns 36 (Mov 'rax 36) (Ret) (Label 'g) ; calling 'g returns 6, but (Mov 'rbx 4) ; it clobbers 'rbx just for the lulz (Mov 'rax 6) (Ret))))
Now let’s try it, using the stack to confirm it does the right thing:
Examples
> (eg (seq (Call 'f) (Push 'rax) (Call 'g) (Pop 'rbx) (Add 'rax 'rbx))) 42
Compare that with the first version that used a register to save the result of 'f:
Examples
> (eg (seq (Call 'f) (Mov 'rbx 'rax) (Call 'g) (Add 'rax 'rbx))) 10
The Push and Pop instructions offer a useful illusion, but of course, there’s not really any data structure abstraction here; there’s just raw memory and registers. But so long as code abides by conventions, the illusion turns out to be the true state of affairs.
What’s really going on under the hood of Push and Pop is that the 'rsp register is decremented and the value is written to the memory location pointed to by the value of 'rsp.
The following code is mostly equivalent to what we wrote above (and we will discuss the difference in the next section):
Examples
> (eg (seq (Call 'f) (Sub 'rsp 8) ; "allocate" a word on the stack (Mov (Offset 'rsp 0) 'rax) ; write 'rax to top frame (Call 'g) (Mov 'rbx (Offset 'rsp 0)) ; load top frame into 'rbx (Add 'rsp 8) ; "deallocate" word on the stack (Add 'rax 'rbx))) 42
As you can see from this code, it would be easy to violate the usual invariants of stack data structure to, for example, access elements beyond the top of the stack. The value of Push and Pop is they make clear that you are using things in a stack-like way and they keep you from screwing up the accesses, offsets, and adjustments to 'rsp.
Just as Push and Pop are useful illusions, so too are Call and Ret. They give the impression that there is a notion of a procedure and procedure call mechanism in assembly, but actually there’s no such thing.
Think for a moment about what it means to “call” 'f in the examples above. When executing (Call 'f), control jumps to the instruction following (Label 'f). When we then get to (Ret), somehow the CPU knows to jump back to the instruction following the (Call 'f) that we started with.
What’s really going on is that (Call 'f) is pushing the address of subsequent instruction on to the stack and then jumping to the label 'f. This works in concert with Ret, which pops the return address off the stack and jumping to it.
Just as we could write equivalent code without Push and Pop, we can write the same code without Call and Ret.
We do need one new trick, which is the Lea instruction, which loads an effective address. You can think of it like Mov except that it loads the address of something rather than what is pointed to by an address. For our purposes, it is useful for loading the address of a label:
(Lea 'rax 'f)
This instruction puts the address of label 'f into rax. You can think of this as loading a function pointer into 'rax. With this new instruction, we can illuminate what is really going on with Call and Ret:
Examples
> (eg (seq (Lea 'rax 'fret) ; load address of 'fret label into 'rax (Push 'rax) ; push the return pointer on to stack (Jmp 'f) ; jump to 'f (Label 'fret) ; <– return point for "call" to 'f (Push 'rax) ; save result (like before) (Lea 'rax 'gret) ; load address of 'gret label into 'rax (Push 'rax) ; push the return pointer on to stack (Jmp 'g) ; jump to 'g (Label 'gret) ; <– return point for "call" to 'g (Pop 'rbx) ; pop saved result from calling 'f (Add 'rax 'rbx))) 42
The above shows how to encode Call as Lea, Push, and Jmp. The encoding of Ret is just:
(seq (Pop 'rbx) (Jmp 'rbx))
3.5 Flags
As mentioned earlier, the processor makes use of flags to handle comparisons. For our purposes, there are four flags to be aware of: zero (ZF), sign (SF), carry (CF), and overflow (OF).
These flags are set by each of the arithmetic operations, which are appropriately annotated in the Instruction set. Each of these operations is binary (meaning they take two arguments), and the flags are set according to properties of the result of the arithmetic operation. Many of these properties look at the most-significant bit (MSB) of the inputs and output.
ZF is set when the result is 0.
SF is set when the MSB of the result is set.
CF is set when a bit was set beyond the MSB.
OF is set when one of two conditions is met:
The MSB of each input is set and the MSB of the result is not set.
The MSB of each input is not set and the MSB of the result is set.
Note that CF is only useful for unsigned arithmetic, while OF is only useful for signed arithmetic. In opposite cases, they provide no interesting information.
These flags, along with many others, are stored in a special FLAGS register that cannot be accessed by normal means. Each flag is represented by a single bit in the register, and they all have specific bits assigned by the x86 specification. For example, CF is bit 0, ZF is bit 6, SF is bit 7, and OF is bit 11, as indexed from the least-significant bit position (but you don’t need to know these numbers).
The various conditions that can be tested for correspond to combinations of the flags. For example, the Jc instruction will jump if CF is set, otherwise execution will fall through to the next instruction. Most of the condition suffixes are straightforward to deduce from their spelling, but some are not. The suffixes (e.g., the c in Jc) and their meanings are given below. For brevity’s sake the flags’ names are abbreviated by ommitting the F suffix and prefixing them with either + or - to indicate set and unset positions, respectively, as needed. Some of the meanings require use of the bitwise operators | (OR), & (AND), ^ (XOR), and =? (equality).
Suffix | Flag | Suffix | Flag |
z | +Z | nz | -Z |
e | +Z | ne | -Z |
s | +S | ns | -S |
c | +C | nc | -C |
o | +O | no | -O |
l | (S ^ O) | g | (-Z & (S =? O)) |
le | (+Z | (S ^ O)) | ge | (S =? O) |
The e suffix (“equal?”) is just a synonym for the z suffix (“zero?”). This is because it is common to use the Cmp instruction to perform comparisons, but Cmp is actually identical to Sub with the exception that the result is not stored anywhere (i.e., it is only used for setting flags according to subtraction). If two values are subtracted and the resulting difference is zero (ZF is set), then the values are equal.
3.5.1 Push and Pop
In the previous section (Stacks: pushing, popping, calling, returning), it was explained that the Push and Pop operations are essentially equivalent to manually adjusting the stack pointer and target register. The one difference is that these special stack-manipulation operations do not set any flags like Add and Sub do. So while you can often choose to manually implement stack manipulation, you’ll need to use these instructions specifically if you want to preserve the condition flags after adjusting the stack.
3.6 a86 Reference
(require a86) | package: langs |
The a86 language may evolve some over the course of the semester, but we will aim to document any changes by updating this section. Also, because the run-time system changes for each language, you may need to do some work to have asm-interp cooperate with your run-time system.
This module provides all of the bindings from a86/ast, a86/printer, and a86/interp, described below.
3.7 Instruction set
(require a86/ast) | package: langs |
This section describes the instruction set of a86.
There are 16 registers: 'rax, 'rbx, 'rcx, 'rdx, 'rbp, 'rsp, 'rsi, 'rdi, 'r8, 'r9, 'r10, 'r11, 'r12, 'r13, 'r14, and 'r15. These registers are 64-bits wide. There is also 'eax which accesses the lower 32-bits of 'rax. This is useful in case you need to read or write 32-bits of memory.
The registers 'rbx, 'rsp, 'rbp, and 'r12 through 'r15 are “callee-saved” registers, meaning they are preserved across function calls (and must be saved and restored by any callee code).
Each register plays the same role as in x86, so for example 'rsp holds the current location of the stack.
Labels must also follow the NASM restrictions on label names: "Valid characters in labels are letters, numbers, _, $, #, @, ~, ., and ?. The only characters which may be used as the first character of an identifier are letters, . (with special meaning), _ and ?."
Examples
> (label? 'foo) #t
> (label? "foo") #f
> (label? 'rax) #f
> (label? 'foo-bar) #f
> (label? 'foo.bar) #t
procedure
(instruction? x) → boolean?
x : any/c
procedure
(64-bit-integer? x) → boolean?
x : any/c
Examples
> (64-bit-integer? 0) #t
> (64-bit-integer? (sub1 (expt 2 64))) #t
> (64-bit-integer? (expt 2 64)) #f
> (64-bit-integer? (- (expt 2 63))) #t
> (64-bit-integer? (sub1 (- (expt 2 63)))) #t
procedure
(32-bit-integer? x) → boolean?
x : any/c
Examples
> (32-bit-integer? 0) #t
> (32-bit-integer? (sub1 (expt 2 32))) #t
> (32-bit-integer? (expt 2 32)) #f
> (32-bit-integer? (- (expt 2 32))) #t
> (32-bit-integer? (sub1 (- (expt 2 32)))) #f
procedure
(seq x ...) → (listof instruction?)
x : (or/c instruction? (listof instruction?))
Examples
> (seq) '()
> (seq (Label 'foo)) (list (Label 'foo))
> (seq (list (Label 'foo))) (list (Label 'foo))
> (seq (list (Label 'foo) (Mov 'rax 0)) (Mov 'rdx 'rax) (list (Call 'bar) (Ret)))
(list
(Label 'foo)
(Mov 'rax 0)
(Mov 'rdx 'rax)
(Call 'bar)
(Ret))
procedure
(prog x ...) → (listof instruction?)
x : (or/c instruction? (listof instruction?))
Programs have at least one label which is declared Global; the first label is used as the entry point.
All label declarations are unique.
All label targets are declared.
... other properties may be added in the future.
This function is useful to do some early error checking over whole programs and can help avoid confusing NASM errors. Unlike seq it should be called at the outermost level of a function that produces a86 code and not nested.
Examples
> (prog (Global 'foo) (Label 'foo)) (list (Global 'foo) (Label 'foo))
> (prog (Label 'foo)) prog: initial label undeclared as global: 'foo
> (prog (list (Label 'foo))) prog: initial label undeclared as global: 'foo
> (prog (Mov 'rax 32)) prog: no initial label found
> (prog (Label 'foo) (Label 'foo)) prog: duplicate label declaration found: 'foo
> (prog (Jmp 'foo)) prog: undeclared labels found: '(foo)
> (prog (Global 'foo) (Label 'foo) (Jmp 'foo)) (list (Global 'foo) (Label 'foo) (Jmp 'foo))
procedure
(symbol->label s) → label?
s : symbol?
Examples
> (let ([l (symbol->label 'my-great-label)]) (seq (Label l) (Jmp l)))
(list
(Label 'label_my_great_label_a1d1fe873a8070d)
(Jmp 'label_my_great_label_a1d1fe873a8070d))
Examples
> (asm-display (prog (Global 'foo) (%%% "Start of foo") (Label 'foo) ; Racket comments won't appear (%% "Inputs one argument in rdi") (Mov 'rax 'rdi) (Add 'rax 'rax) (% "double it") (Sub 'rax 1) (% "subtract one") (%% "we're done!") (Ret)))
default rel
section .text
global _foo
;;; Start of foo
_foo:
;; Inputs one argument in rdi
mov rax, rdi
add rax, rax ; double it
sub rax, 1 ; subtract one
;; we're done!
ret
struct
r : register? i : exact-integer?
Examples
> (Offset 'rax 0) (Offset 'rax 0)
> (Offset 'rax 4.1) Offset: expects exact integer as second argument; given 4.1
Examples
> (Label 'fred) (Label 'fred)
> (Label "fred") Label: expects symbol; given "fred"
> (Label 'rax) Label: cannot use register as label name; given 'rax
> (Label 'fred-wilma) Label: label names must conform to nasm restrictions
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Call 'f) (Add 'rax 1) (Ret) (Label 'f) (Mov 'rax 41) (Ret))) 42
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 42) (Ret))) 42
Either dst or src may be offsets, but not both.
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rbx 42) (Mov 'rax 'rbx) (Ret))) 42
> (Mov (Offset 'rax 0) (Offset 'rbx 0)) Mov: cannot use two memory locations; given (Offset 'rax 0),
(Offset 'rbx 0)
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 32) (Add 'rax 10) (Ret))) 42
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 32) (Sub 'rax 10) (Ret))) 22
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 42) (Cmp 'rax 2) (Jg 'l1) (Mov 'rax 0) (Label 'l1) (Ret))) 42
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 42) (Jmp 'l1) (Mov 'rax 0) (Label 'l1) (Ret))) 42
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 42) (Pop 'rbx) (Jmp 'rbx))) 42
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 42) (Cmp 'rax 2) (Jz 'l1) (Mov 'rax 0) (Label 'l1) (Ret))) 0
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 42) (Cmp 'rax 2) (Jnz 'l1) (Mov 'rax 0) (Label 'l1) (Ret))) 42
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 42) (Cmp 'rax 2) (Je 'l1) (Mov 'rax 0) (Label 'l1) (Ret))) 0
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 42) (Cmp 'rax 2) (Jne 'l1) (Mov 'rax 0) (Label 'l1) (Ret))) 42
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 42) (Cmp 'rax 2) (Jl 'l1) (Mov 'rax 0) (Label 'l1) (Ret))) 0
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 42) (Cmp 'rax 42) (Jle 'l1) (Mov 'rax 0) (Label 'l1) (Ret))) 42
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 42) (Cmp 'rax 2) (Jg 'l1) (Mov 'rax 0) (Label 'l1) (Ret))) 42
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 42) (Cmp 'rax 42) (Jg 'l1) (Mov 'rax 0) (Label 'l1) (Ret))) 0
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax (sub1 (expt 2 63))) (Add 'rax 1) (Jo 'l1) (Mov 'rax 0) (Label 'l1) (Ret))) -9223372036854775808
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax (sub1 (expt 2 63))) (Add 'rax 1) (Jno 'l1) (Mov 'rax 0) (Label 'l1) (Ret))) 0
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax -1) (Add 'rax 1) (Jc 'l1) (Mov 'rax 0) (Label 'l1) (Ret))) 0
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax -1) (Add 'rax 1) (Jnc 'l1) (Mov 'rax 0) (Label 'l1) (Ret))) 0
Note that the semantics for conditional moves is not what many people expect. The src is always read, regardless of the condition’s evaluation. This means that if your source is illegal (such as an offset beyond the bounds of memory allocated to the current process), a segmentation fault will arise even if the condition “should have” prevented the error.
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 0) (Cmp 'rax 0) (Mov 'r9 1) (Cmovz 'rax 'r9) (Ret))) 1
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 2) (Cmp 'rax 0) (Mov 'r9 1) (Cmovz 'rax 'r9) (Ret))) 2
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 0) (Cmp 'rax 0) (Mov 'r9 1) (Cmove 'rax 'r9) (Ret))) 1
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 2) (Cmp 'rax 0) (Mov 'r9 1) (Cmove 'rax 'r9) (Ret))) 2
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 0) (Cmp 'rax 0) (Mov 'r9 1) (Cmovnz 'rax 'r9) (Ret))) 0
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 2) (Cmp 'rax 0) (Mov 'r9 1) (Cmovnz 'rax 'r9) (Ret))) 1
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 0) (Cmp 'rax 0) (Mov 'r9 1) (Cmovne 'rax 'r9) (Ret))) 0
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 2) (Cmp 'rax 0) (Mov 'r9 1) (Cmovne 'rax 'r9) (Ret))) 1
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 0) (Cmp 'rax 0) (Mov 'r9 1) (Cmovl 'rax 'r9) (Ret))) 0
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax -1) (Cmp 'rax 0) (Mov 'r9 1) (Cmovl 'rax 'r9) (Ret))) 1
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 0) (Cmp 'rax 0) (Mov 'r9 1) (Cmovle 'rax 'r9) (Ret))) 1
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 2) (Cmp 'rax 0) (Mov 'r9 1) (Cmovle 'rax 'r9) (Ret))) 2
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 0) (Cmp 'rax 0) (Mov 'r9 1) (Cmovg 'rax 'r9) (Ret))) 0
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 2) (Cmp 'rax 0) (Mov 'r9 1) (Cmovg 'rax 'r9) (Ret))) 1
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax -1) (Cmp 'rax 0) (Mov 'r9 1) (Cmovge 'rax 'r9) (Ret))) -1
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 2) (Cmp 'rax 0) (Mov 'r9 1) (Cmovge 'rax 'r9) (Ret))) 1
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax (- (expt 2 63) 1)) (Add 'rax 1) (Mov 'r9 1) (Cmovo 'rax 'r9) (Ret))) 1
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax (- (expt 2 63) 2)) (Add 'rax 1) (Mov 'r9 1) (Cmovo 'rax 'r9) (Ret))) 9223372036854775807
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax (- (expt 2 63) 1)) (Add 'rax 1) (Mov 'r9 1) (Cmovno 'rax 'r9) (Ret))) -9223372036854775808
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax (- (expt 2 63) 2)) (Add 'rax 1) (Mov 'r9 1) (Cmovno 'rax 'r9) (Ret))) 1
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax (- (expt 2 64) 1)) (Add 'rax 1) (Mov 'r9 1) (Cmovc 'rax 'r9) (Ret))) 1
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax (- (expt 2 64) 2)) (Add 'rax 1) (Mov 'r9 1) (Cmovc 'rax 'r9) (Ret))) -1
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax (- (expt 2 64) 1)) (Add 'rax 1) (Mov 'r9 1) (Cmovnc 'rax 'r9) (Ret))) 0
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax (- (expt 2 64) 2)) (Add 'rax 1) (Mov 'r9 1) (Cmovnc 'rax 'r9) (Ret))) 1
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 11) ; #b1011 = 11 (And 'rax 14) ; #b1110 = 14 (Ret))) 10
; #b1010 = 10
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 11) ; #b1011 = 11 (Or 'rax 14) ; #b1110 = 14 (Ret))) 15
; #b1111 = 15
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 11) ; #b1011 = 11 (Xor 'rax 14) ; #b1110 = 14 (Ret))) 5
; #b0101 = 5
struct
dst : register? i : (integer-in 0 63)
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 4) ; #b100 = 4 = 2^2 (Sal 'rax 6) (Ret))) 256
; #b100000000 = 256
struct
dst : register? i : (integer-in 0 63)
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 256) ; #b100000000 = 256 (Sar 'rax 6) (Ret))) 4
; #b100 = 4
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 269) ; #b100001101 = 269 (Sar 'rax 6) (Ret))) 4
; #b100 = 4
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 9223372036854775808) ; 1 in MSB (Sar 'rax 6) (Ret))) -144115188075855872
; #b1111111000000000000000000000000000000000000000000000000000000000
struct
dst : register? i : (integer-in 0 63)
struct
dst : register? i : (integer-in 0 63)
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 256) ; #b100000000 = 256 (Shr 'rax 6) (Ret))) 4
; #b100 = 4
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 269) ; #b100001101 = 269 (Shr 'rax 6) (Ret))) 4
; #b100 = 4
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 9223372036854775808) ; 1 in MSB (Shr 'rax 6) (Ret))) 144115188075855872
; #b0000001000000000000000000000000000000000000000000000000000000000
struct
a1 : (or/c 32-bit-integer? register?)
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 42) (Push 'rax) (Mov 'rax 0) (Pop 'rax) (Ret))) 42
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 42) (Push 'rax) (Mov 'rax 0) (Pop 'rax) (Ret))) 42
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 0) (Not 'rax) (Ret))) -1
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Lea 'rbx 'done) (Mov 'rax 42) (Jmp 'rbx) (Mov 'rax 0) (Label 'done) (Ret))) 42
3.8 From a86 to x86
(require a86/printer) | package: langs |
procedure
(asm-display is) → void?
is : (listof instruction?)
Examples
> (asm-display (prog (Global 'entry) (Label 'entry) (Mov 'rax 42) (Ret)))
default rel
section .text
global _entry
_entry:
mov rax, 42
ret
procedure
(asm-string is) → string?
is : (listof instruction?)
Examples
> (asm-string (prog (Global 'entry) (Label 'entry) (Mov 'rax 42) (Ret))) " default rel\n section .text\n global _entry\n_entry:\n mov rax, 42\n ret\n"
3.9 An Interpreter for a86
(require a86/interp) | package: langs |
As you’ve seen throughout this chapter, a86 is equiped with an interpreter, which enables you to run assembly programs from within Racket. This won’t be directly useful in building a compiler, but it will be very handy for interactively exploring assembly programs and making examples and test cases for your compiler.
The simplest form of interpreting an a86 program is to use asm-interp.
procedure
(asm-interp is) → integer?
is : (listof instruction?)
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 42) (Ret))) 42
Programs do not have to start with 'entry. The interpreter will jump to whatever the first label in the program is:
Examples
> (asm-interp (prog (Global 'f) (Label 'f) (Mov 'rax 42) (Ret))) 42
The argument of asm-interp should be a complete, well-formed a86 program. For best results, always use prog to construct the program so that error checking is done early. If you use prog and asm-interp and you get a NASM syntax error message, please report it to the course staff as this is a bug in the interpreter.
While we try to make syntax errors impossible, it is
possible—
Examples
> (asm-interp (prog (Global 'crash) (Label 'crash) (Mov 'rax 0) (Jmp 'rax))) invalid memory reference. Some debugging context lost
It is often the case that we want our assembly programs to interact with the oustide or to use functionality implemented in other programming languages. For that reason, it is possible to link in object files to the running of an a86 program.
The mechanism for controlling which objects should be linked in is a parameter called current-objs, which contains a list of paths to object files which are linked to the assembly code when it is interpreted.
parameter
(current-objs) → (listof path-string?)
(current-objs objs) → void? objs : (listof path-string?)
= '()
For example, let’s implement a GCD function in C:
int gcd(int n1, int n2) { return (n2 == 0) ? n1 : gcd(n2, n1 % n2); }
First, compile the program to an object file:
shell
> gcc -fPIC -c gcd.c -o gcd.o
The option -fPIC is important; it causes the C compiler to emit “position independent code,” which is what enables Racket to dynamically load and run the code.
Once the object file exists, using the current-objs parameter, we can run code that uses things defined in the C code:
Examples
> (parameterize ((current-objs '("gcd.o"))) (asm-interp (prog (Extern 'gcd) (Global 'f) (Label 'f) (Mov 'rdi 11571) (Mov 'rsi 1767) (Sub 'rsp 8) (Call 'gcd) (Add 'rsp 8) (Ret)))) 57
This will be particularly relevant for writing a compiler where emitted code will make use of functionality defined in a runtime system.
Note that if you forget to set current-objs, you will get a linking error saying a symbol is undefined:
Examples
> (asm-interp (prog (Extern 'gcd) (Global 'f) (Label 'f) (Mov 'rdi 11571) (Mov 'rsi 1767) (Sub 'rsp 8) (Call 'gcd) (Add 'rsp 8) (Ret))) link error: unknown link error.
Apple clang version 15.0.0 (clang-1500.1.0.2.5)
Target: x86_64-apple-darwin23.2.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
"/Library/Developer/CommandLineTools/usr/bin/ld" -demangle
-lto_library
/Library/Developer/CommandLineTools/usr/lib/libLTO.dylib
-dynamic -dylib -arch x86_64 -platform_version macos 14.0.0
14.2 -syslibroot
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -o /var/
folders/kb/h5ctgd113z9bdb8rhpgg48mm0000gn/T/nasm171388848817
13888488256.so -L/usr/local/lib /var/folders/kb/h5ctgd113z9b
db8rhpgg48mm0000gn/T/nasm17138884881713888488256.o -lSystem
/Library/Developer/CommandLineTools/usr/lib/clang/15.0.0/lib
/darwin/libclang_rt.osx.a
ld: warning: no platform load command found in '/private/var
/folders/kb/h5ctgd113z9bdb8rhpgg48mm0000gn/T/nasm17138884881
713888488256.o', assuming: macOS
ld: Undefined symbols:
_gcd, referenced from:
_f in nasm17138884881713888488256.o
clang: error: linker command failed with exit code 1 (use -v
to see invocation)
procedure
(asm-interp/io is in) → (cons integer? string?)
is : (listof instruction?) in : string?