Chapter 9: Instruction Set Architecture
In this chapter, we introduce the instruction set architecture. The architecture is the programmer’s view of a computer, which is defined by instruction set (language) and operand locations (registers and memory). We look at the computer’s vocabulary (called the instruction set). Computer instructions indicate both the operation to perform and the operands to use. We look at the operands which come from memory, from registers, or from the instruction itself. This chapter also shows how to interpret assembly language into machine language.
Objectives
By the end of this chapter you should be able to:
- Recognize Assembly language and machine language
- Demonstrate knowledge of MIPS Assembly architecture
- Differentiate the operation to perform and the operands to use
- Summarize features of word-addressable memory
- Deconstruct different types of instructs, i.e. R-type, I-type and J-type
- Translate assembly language into machine code
9.1 Instructions
An instruction is a single operation of a processor defined by the processor instruction set. The size of length of an instruction depends on the processor. The instruction can be written in human-readable formats or computer-readable formats. Assembly language is the human-readable format of instructions, whereas machine language is the computer-readable format (1’s and 0’s).
Once you’ve learned one architecture, it’s easy to learn others. MIPS (Microprocessor without Interlocked Pipelined Stages) architecture was developed by John Hennessy and his colleagues at Stanford in the 1980’s, and used in many commercial systems, including Silicon Graphics, Nintendo, and Cisco. Underlying architecture design principles, as articulated by Hennessy and Patterson are as follows:
- Simplicity favors regularity
- Make the common case fast
- Smaller is faster
- Good design demands good compromises
Let’s look at the following instructions for the addition:
High-level Code | MIPS assembly code |
---|---|
a = b + c | add a, b, c |
where add is a mnemonic which indicates operation to perform. B and c are source operands on which the operation is performed. A is a destination operation to which the result is written.
The next instructions show the subtractions in High-Level Code and MIPS assembly code.
High-level Code | MIPS assembly code |
---|---|
a = b – c | sub a, b, c |
The subtraction is similar to addition, only mnemonic changes.
As shown in the above instructions, MIPS assembly code shows consistent instruction format, has the same number of operands (two source operands and one destination operand), and is easy to encode and handle in hardware. This is the first design principle: Simplicity favors regularity.
More complex code is handled by multiple MIPS instructions. For example, the following High-Level Code can be interpreted into multiple MIPS instructions, as follows:
High-level Code | MIPS assembly code |
---|---|
a = b + c – d | add t, b, c # t = b + c sub a, t, d # a = t – d |
MIPS assembly code includes only simple, commonly used instructions. With this feature, hardware to decode and execute instructions can be simple, small, and fast. More complex instructions (that are less common) are performed using multiple simple instructions. This is the second design principle: Make the common case fast.
Operands
An instruction operates on operands. The instructions need a physical location from which to retrieve the binary data. Operand can be stored in the following locations:
- Registers that is located in CPU. The instruction in registers can be accessed quickly.
- Memory is located outside of CPU in the computer. It provides large capacity but operate slowly.
- Constant (also called immediate) expressions indicate inline values of the instruction.
Fig. ‑. CPU Organization
As shown in the above figure, CPU is organized with Program Counter (PC), Instruction Register (IR), Instruction Decoder, Control Unit, Arithmetic Logic Unit (ALU), general registers, and buses. MIPS has 32 32-bit general registers, which is called the register set or register file. The fewer the registers, the faster they can be accessed. This is related to the third design principle: Smaller is faster. In terms of volume, the registers are much smaller than memory, and located within CPU. That’s why the registers are faster than memory. MIPS is also called “32-bit architecture” because it operates on 32-bit data.
The operands are positioned on registers. Typically, the register comes with the symbol $ before their name. For example, we read the symbol $0 in “register zero”, “dollar zero”. The registers are used for specific purposes. The register $0 always holds the constant value 0. The saved registers, $s0 - $s7, are used to hold variables. The temporary registers, $t0 - $t9, are used to hold intermediate values during a larger computation process. The following table show the register usage in MIPS assembly system.
Table ‑. Register Usage
Register number | Register name | Usage |
---|---|---|
0 | zero | Always zero |
1 | $at | Reserved for the assembler |
2 – 3 | $v0 - $v1 | Function return value |
4 – 7 | $a0 - $a3 | The first four parameters passed to a procedure. (Function arguments) |
8 – 15 | $t0 - $t7 | Temporary variables. Can be overwritten by callee |
16 – 23 | $s0 - $s7 | Saved variables. Must be saved/restored by callee |
24 – 25 | $t8 - $t9 | Temporary variables. Can be overwritten by callee |
26 – 27 | $k0 - $k1 | Reserved for kernel usage (operating system) |
28 | $gp | Global pointer for static data (pointer to global area) |
29 | $sp | Stack pointer |
30 | $fp | Frame pointer |
31 | $ra | Function return address |
Now, we can interpret the instructions with registers. The following High-Level codes can be converted to MIPS assembly codes with designated register names:
Example 1)
High-Level code | MIPS assembly code |
---|---|
a = b + c; | # $s0=a, $s1=b, $s2=c add $s0, $s1, $s2 |
Example 2)
High-Level code | MIPS assembly code |
---|---|
a = b + c – d; | # $s0=a, $s1=b, $s2=c, $s3=d sub $t0, $s2, $s3 // t = c – d add $s0, $s1, $t0 // a = b + t |
Word-addressable Memory
When we execute instructions, there are too much data to fit in only 32 registers. The memory has a lot of capacities to store data. The register file is small and fast, whereas memory is large and slow, because the memory is located outside the CPU. Only commonly used variables are kept in registers. The rest of them are kept in memory for a future processing. As shown in the below, each 32-bit data word has a unique 32-bit address. This is called word-addressable memory. Both the 32-bit word address and the 32-bit data value are written in hexadecimal.
Fig. ‑. Word-addressable Memory
Exercises
Translate the following high-level code into assembly language. Assume variables a – c are held in registers $s0 - $s2 and f – j are in $s3-$s7.
A = b – c;
f = (g + h) – (i + j);
Answer)
# MIPS assembly code
# $s0=a, $s1=b, $s2=c, $s3=f, $s4=g, $s5=h, $s6=i, $s7=j
sub $s0, $s1, $s2 # a = b – c
add $t0, $s4, $s5 # $t0 = g + h
add $t1, $s6, $s7 # $t1 = i + j
sub $s3, $t0, $t1 # f = (g + h) – (i + j)
9.2 Machine Languages
Assembly language is convenient for humans to read. However, digital circuits understand only 1’s and 0’s. Therefore, a program written in assembly language is translated from mnemonics to a representation using only 1’s and 0’s called machine language. The small number of formats allows some regularity among all the types, and thus simpler hardware, while it can also accommodate different instructions needs.
MIPS Assembly language uses 32-bit instructions that makes the compromise of defining three instruction formats: R-type, I-type, and J-type. This is the fourth design principle: Good design demands good compromises. In MIPS assembly language, multiple instruction formats allow flexibility. For example, add and sub use 3 register operands, whereas lw and sw use 2 register operands and a constant. The number of instruction formats kept small to adhere to design principles 1 and 3.
R-type Instruction Format
The name R-type is short for register-type. The following figure shows the R-type instruction fields.
Fig. ‑. R-type Instruction fields
- opcode: operation code (zero value for all R-type)
- rs: first source register number
- rt: second source register number
- rd: destination register number
- shamt: shift amount (00000 for now)
- function: function code (extends opcode)
Now, let’s look at how the computer can interpret a MIPS instruction, add $s0, $s1, $s2, into a machine language. In the R-type instruction, the operation code field is all zero. The function field extends the operation code value that define the add mnemonic in the function field. The rs and rt fields are filled with the two source operands, $s1 and $s2. The rd field is filled with the destination operand $s0. For the add mnemonic, the shift amount is unused for now. This field is filled with all 0’s, as shown below.
Fig. ‑. R-type Instruction field with add $s0, $s1, $s2
Each instruction set architecture has its own function definition in the following table.
Table ‑. References for Operation code and Function field
The register number for the register usage is defined in Table 9-1. We can define the register numbers, such as the decimal value 17 for $s1, the decimal value 18 for $s2, and the decimal value 16 for $s0. The R-type instruction field of the add instruction is filled with all those decimal values, as shown below:
Fig. ‑. R-type Instruction field with Decimal Representation
The decimal representation is expressed with the binary number representation, i.e. machine code as shown below:
Fig. ‑. R-type Instruction field with Binary Number Representation
We can express this binary number in the hexadecimal representation: 0232802016.
Let’s look at another example with the sub instruction, sub $t0, $t3, $t5, and interpret it into a machine language. Since the sub instruction is one of R-type instructions as shown in Table 9-2, the operation code field is all zero. The function field extends the operation code value that define the sub mnemonic in the function field. The rs and rt fields are filled with the two source operands, $t3 and $t5. The rd field is filled with the destination operand $t0. For the sub mnemonic, the shift amount is unused. This field is filled with all 0’s, as shown below.
Fig. ‑. R-type Instruction field with sub $t0, $t3, $t5
As shown with the register number in Table 9-1. We can define the register numbers, such as the decimal value 11 for $t3, the decimal value 13 for $t5, and the decimal value 8 for $t0. The R-type instruction field of the sub instruction is filled with all those decimal values, as shown below:
Fig. ‑. R-type Instruction field with Decimal Representation
The decimal representation is expressed with the binary number representation, i.e. machine code as shown below:
Fig. ‑. R-type Instruction field with Binary Number Representation
We can express this binary number in the hexadecimal representation: 016D402216.
I-type Instruction Format
Although multiple formats complicate the hardware, we can reduce the complexity by keeping the formats similar. Any instruction that comes with a constant (off) value or memory address can be accommodated with the I-type instruction format. That means the I-type instruction format can be used for the load/store word instruction and the immediate arithmetic instructions which include a constant value. The following figure shows the I-type instruction fields.
Fig. ‑. I-type Instruction Fields
- The first three fields, op, rs, and rt, are like those of R-type instructions.
- rs and imm are always used as source operands.
- rt is used as a destination (addi and lw) or another source (sw)
- Constant (imm): −215 to 215−1
- Address: offset added to base address in rs
Now, let’s look at how the computer can interpret the following I-type instructions into machine languages.
- Assembly Code
addi rt, rs, imm 🡪 addi $s0, $s1, 5
addi rt, rs, imm 🡪 addi $t0, $s3, -12
lw rt, imm(rs) 🡪 lw $t2, 32($0)
sw rt, imm(rs) 🡪 sw $s1, 4($t1)
The addi is a I-type instruction, where rt is used for the destination register address, rs is the base address, and imm is the 16-bit immediate value. The opcode field of the addi is the decimal value 8 (00100) defined in Table 9-2. Both the load word (lw) and the store word (sw) instructions are I-type instructions. The data positioned in the memory can be loaded to the (destination) register with the load word (lw) instruction.
The opcode field of the lw instruction is the decimal value 35. For the lw instruction, the memory address is calculated with the sum of the base register address and the offset value. In the above example, the base register address is $0 and the offset value is 32. The calculated memory address is 32. After finding the data that is located in the memory (memory address: 32), the data is loaded into the destination register address ($t2).
The opcode field of the sw instruction is the decimal value 43. The data positioned in the register file can be stored to the memory with the store word (sw) instruction. For the sw instruction, the memory address is calculated in the same way to the lw instruction. In the above example, the base register address $t1 is and the offset value is 4. The memory address is the sum of the value in $t1 and the offset value, $t1 + 4. The value located in the register $s1 is stored in the memory address $t1 + 4.
The following figure shows the field values of the above examples:
Fig. ‑. I-type Instruction fields with Decimal Representation
The decimal representations are expressed with binary number representations, i.e. machine code, as shown below:
Fig. ‑. I-type Instruction field with Binary Number Representation
J-type Instructions
The J-type instruction is used to jump the target of the address. The following figure shows the J-type instruction field.
Fig. ‑. J-type Instruction Fields
Jump instruction uses word address and updates PC with concatenation of the following values (total of 32 bits):
- Top 4 bits of old PC ( 4 bits)
- 26-bit jump address (26 bits)
- 00 (2 bits)
The following example codes show how the Jump instruction is used in the assembly code.
Addi $s0, $0, 4 # $s0 = 4 addi $s1, $0, 1 # $s1 = 1 j target # jump to target addi $s1, $s1, 1 # not executed sub $s1, $s1, $s0 # not executed target: add $s1, $s1, $s0 # $s1 = 1 + 4 = 5 |
The first two addi instructions execute the immediate arithmetic operations, where the destination register address $s0 holds the sum of $0 and 4 ($s0=4), and the destination register address $s1 holds the sum of $0 and 1 ($s1=1). The jump instruction jumps the target of the address and then executes the last add instruction. The destination register address $s1 holds the sum of two register values 1 and 4.
Instruction Fetch and PC
Program Counter (PC) is a 32-bit register which holds the address of the next instruction to be fetched from the memory. PC value is increased by 4 for the next instruction, as shown in the following figure. The instruction memory fetches the instruction from the memory, and forward the instruction to the next step.
Fig. ‑. Instruction Fetch and PC Increment
Exercises
- Translate the following assembly language into machine language.
add $t0, $s4, $s5 // $t0->8, $s4->20, $s5->21
Answer)
- Decimal representation (field values):
- Binary number representation (Machine Code):
000000101001010101000000001000002 = 0295402016
- Translate the following I-type instruction into machine code.
// lw opcode value: 35
lw $s3, -24($s4) // $s3 and $s4 are #19 and #20.
Answer)
- Decimal representation (field values):
- Binary number representation (Machine Code):
100011101001001111111111111010002 = 8E93FFE816
- Convert the following machine language into MIPS assembly language.
0x01094020
Answer)
0000 0001 0000 1001 0100 0000 0010 0000 (32 bits)
0000 0001 0000 1001 0100 0000 0010 0000
0 8 9 8 0 32
Opcode src src dst shmt func
add $t0 $t0 $t1