mirror of
https://github.com/m-lamonaca/dev-notes.git
synced 2025-04-05 18:36:41 +00:00
assembly: ass RISC-V variant notes
This commit is contained in:
parent
7790d370b6
commit
cea6c7050a
2 changed files with 221 additions and 0 deletions
220
docs/languages/assembly/riscv.md
Normal file
220
docs/languages/assembly/riscv.md
Normal file
|
@ -0,0 +1,220 @@
|
|||
# [Assembly (RISC-V)][book]
|
||||
|
||||
[book]: https://riscv-programming.org/book/riscv-book.html "An Introduction to Assembly Programming with RISC-V"
|
||||
|
||||
Assembly programs are encoded as plain text files and contain four main elements:
|
||||
|
||||
- **Comments**: comments are textual notes that are often used to document information on the code.
|
||||
- **Labels**: labels are "markers" that represent program locations.
|
||||
- **Instructions**: Assembly instructions are instructions that are converted by the assembler into machine instructions.
|
||||
- **Directives**: Assembly directives are commands used to coordinate the assembling process.
|
||||
|
||||
## Assembly Language
|
||||
|
||||
### Labels
|
||||
|
||||
**Labels** are "markers" that represent program _locations_. They can be inserted into an assembly program to "mark" a program position so that it can be referred to by assembly instructions.
|
||||
|
||||
```asm
|
||||
x: # <-- label definition
|
||||
.word 10
|
||||
|
||||
sum:
|
||||
lw a0, x
|
||||
addi a0, a0, 10
|
||||
ret
|
||||
```
|
||||
|
||||
Assemblers usually accept two kinds of labels: **symbolic** and **numeric** labels.
|
||||
|
||||
Symbolic labels are defined by an _identifier_ followed by `:`.
|
||||
They are stored as symbols in the symbol table and are often used to identify global variables and routines.
|
||||
|
||||
Numeric labels are defined by a _single decimal digit_ followed by `:`.
|
||||
They are used for local reference and are _not included_ in the symbol table of executable files. They can be redefined repeatedly in the same assembly program.
|
||||
|
||||
References to numeric labels contain a _suffix_ that indicates whether the reference is to a numeric label positioned before (`b` suffix) or after (`f` suffix) the reference.
|
||||
|
||||
### Symbols
|
||||
|
||||
Program **symbols** are "names" that are associated with _numerical values_ and the **symbol table** is a data structure that maps each program symbol to its value.
|
||||
Labels are automatically converted into program symbols by the assembler and associated with a numerical value that represents its position in the program, which is a memory address.
|
||||
|
||||
It's possible to explicitly define symbols with the `.set` (or `.equ`) directive.
|
||||
|
||||
```asm
|
||||
.set answer, 42
|
||||
|
||||
get_answer:
|
||||
li a0, answer
|
||||
ret
|
||||
```
|
||||
|
||||
### References & Relocations
|
||||
|
||||
Each reference to a label must be replaced by an address during the assembling and linking processes. **Relocation** is the process in which the code and data are assigned new memory addresses so that they do not conflict with addresses of coming from the other linked sources.
|
||||
|
||||
The **relocation table** is a data structure that contains information that describes how the program instructions and data need to be modified to reflect the addresses reassignment. Each object file contains a relocation table and the linker uses their information to adjust the code when performing the relocation process.
|
||||
|
||||
### Global vs Local Symbols
|
||||
|
||||
Symbols are classified as _local_ or _global_ symbols.
|
||||
|
||||
**Local symbols** are only visible on the same file, i.e., the linker does not use them to resolve undefined references on other files.
|
||||
**Global symbols**, on the other hand, are used by the linker to resolve undefined reference on other files.
|
||||
|
||||
By default, the assembler registers labels as local symbols. The `.globl` directive instructs the assembler to register a label as a global symbol.
|
||||
|
||||
```asm
|
||||
.globl exit
|
||||
|
||||
exit:
|
||||
li a0, 0
|
||||
li a7, 93
|
||||
ecall
|
||||
```
|
||||
|
||||
### Program Entry Point
|
||||
|
||||
Every program has an **entry point**: the point from which the CPU must start executing the program.
|
||||
The entry point is defined by an address, which is the address of the _first_ instruction that must be executed.
|
||||
|
||||
```asm
|
||||
.globl start # <-- program entry point
|
||||
```
|
||||
|
||||
> **Note**: the `start` label **must** be registered as a _global_ symbol for the linker to recognize it as the entry point.
|
||||
|
||||
## Program Sections
|
||||
|
||||
Executable and object files, and assembly programs are usually organized in **sections**.
|
||||
A section may contain data or instructions, and the contents of each section are mapped to a set of consecutive main memory addresses.
|
||||
|
||||
The following sections are often present on executable files generated for Linux-based systems:
|
||||
|
||||
- `.text`: a section dedicated to store the program instructions.
|
||||
- `.data`: a section dedicated to store initialized global variables.
|
||||
- `.bss`: a section dedicated to store uninitialized global variables.
|
||||
- `.rodata`: a section dedicated to store constants.
|
||||
|
||||
When linking multiple object files, the linker groups information from sections with the same name and places them together into a single section on the executable file.
|
||||
|
||||
To instruct the assembler to add the assembled information into other sections, the programmer (or the compiler) may use the `.section <name>` directive.
|
||||
|
||||
```asm
|
||||
.section .text
|
||||
|
||||
update_y:
|
||||
la t1, y
|
||||
sw a0, (t1)
|
||||
ret
|
||||
|
||||
update_x:
|
||||
la t1, x
|
||||
sw a0, (t1)
|
||||
ret
|
||||
|
||||
.section .data
|
||||
|
||||
x: .word 10
|
||||
y: .word 12
|
||||
```
|
||||
|
||||
### Assembly Instructions
|
||||
|
||||
Assembly instructions are instructions that are converted by the assembler into machine instructions.
|
||||
They are usually encoded as a string that contains a **mnemonic** and a sequence of parameters, known as **operands**.
|
||||
|
||||
A **pseudo-instruction** is an assembly instruction that does not have a corresponding machine instruction on the ISA, but can be translated automatically by the assembler into one or more alternative machine instructions to achieve the same effect.
|
||||
|
||||
The operands of assembly instructions may contain:
|
||||
|
||||
- **A register name**: a register name identifies one of the ISA registers.
|
||||
|
||||
<!-- RV32I ISA registers are numbered from 0 to 31 and are named x0, x1, ..., x31.
|
||||
RV32I registers may also be identified by their aliases, for example, a0, t1, ra, etc.-->
|
||||
|
||||
- **An immediate value**: an immediate value is a numerical constant that is directly encoded into the machine instruction as a sequence of bits.
|
||||
- **A symbol name**: symbol names identify symbols on the symbol table and are replaced by their respective values during the assembling and linking processes.
|
||||
Their value are encoded into the machine instruction as a sequence of bits.
|
||||
|
||||
### Immediate Values
|
||||
|
||||
Immediate values are represented on assembly language by a sequence of alphanumeric characters.
|
||||
|
||||
- Sequences started with the `0x` and the `0b` prefixes are interpreted as hexadecimal and binary numbers, respectively.
|
||||
- Octal numbers are represented by a sequence of numeric digits starting with digit `0`.
|
||||
- Sequences of numeric digits starting with digits `1` to `9` are interpreted as decimal numbers.
|
||||
- Alphanumeric characters represented between single quotation marks are converted to numeric values using the ASCII table
|
||||
- To denote a negative integer, it suffices to add the `-` prefix.
|
||||
|
||||
```asm
|
||||
li a0, 10 # load value ten into register a0
|
||||
2 li a1, 0xa # load value ten into register a1
|
||||
3 li a2, 0b1010 # load value ten into register a2
|
||||
4 li a3, 012 # load value ten into register a3
|
||||
5 li a4, ’0’ # load value forty eight into register a4
|
||||
6 li a5, ’a’ # load value ninety seven into register a5
|
||||
```
|
||||
|
||||
### The `.<value>` Directives
|
||||
|
||||
The `.byte`, `.half`, `.word`, and `.dword` directives add one or more values to the active section. Their arguments may be expressed as immediate
|
||||
values, symbols (which are replaced by their value during the assembling and linking processes) or by arithmetic expressions that combine both.
|
||||
|
||||
The `.string`, `.asciz`, and `.ascii` directives add strings to the active section. The string is encoded as a sequence of bytes.
|
||||
|
||||
| Directive | Arguments | Description |
|
||||
|:---------:|:----------------:|:------------------------------------------------:|
|
||||
| `.byte` | `expr [, expr]*` | Emit one or more **8bit** comma separated words |
|
||||
| `.half` | `expr [, expr]*` | Emit one or more **16bit** comma separated words |
|
||||
| `.word` | `expr [, expr]*` | Emit one or more **32bit** comma separated words |
|
||||
| `.dword` | `expr [, expr]*` | Emit one or more **64bit** comma separated words |
|
||||
| `.string` | `string` | Emit `NULL` terminated string |
|
||||
| `.asciz` | `string` | Alias for `.string` |
|
||||
| `.ascii` | `string` | Emit string without `NULL` character |
|
||||
|
||||
### The `.set` and `.equ` directives
|
||||
|
||||
The `.set name, expression` directive adds a symbol to the symbol table.
|
||||
It takes a name and an expression as arguments, evaluates an expression to a value and store the name and the resulting value into the symbol table.
|
||||
|
||||
The `.equ` directive performs the same task as the .set directive.
|
||||
|
||||
### The `.globl` directive
|
||||
|
||||
The `.globl` directive can be used to turn local symbols into global ones.
|
||||
|
||||
```asm
|
||||
.global start
|
||||
.global max_value
|
||||
|
||||
.set max_value, 42
|
||||
|
||||
start:
|
||||
li a0, max_value
|
||||
jal process_temp
|
||||
ret
|
||||
```
|
||||
|
||||
### The `.skip` directive
|
||||
|
||||
The `.bss` section is dedicated for storing _uninitialized_ global variables.
|
||||
These variables need to be allocated on memory, but they do not need to be initialized by the loader when a program is executed. As a consequence, their initial value do not need to be stored on executable nor object files.
|
||||
|
||||
To allocate variables on the `.bss` section it suffices to declare a label to identify the variable and advance the `.bss` location counter by the amount of bytes the variable require, so further variables are allocated on other address.
|
||||
The `.skip N` directive is a directive that advances the location counter by `N` units and can be used to allocate space for variables on the `.bss` section.
|
||||
|
||||
```asm
|
||||
.section .bss
|
||||
x: .skip 4
|
||||
y: .skip 80
|
||||
z: .skip 4
|
||||
```
|
||||
|
||||
### The `.align` directive
|
||||
|
||||
Some ISA's require instructions or multi-byte data to be stored on addresses that are multiple of a given number.
|
||||
|
||||
The proper way of ensuring the location counter is aligned is by using the `.align N` directive.
|
||||
The `.align N` directive checks if the location counter is a _multiple_ of `2^N`, if it is, it has no effect on the program, otherwise, it advances the location counter to the next value that is a multiple of `2^N`.
|
|
@ -94,6 +94,7 @@ nav:
|
|||
- Swift: languages/swift/swift.md
|
||||
- Assembly:
|
||||
- Intel: languages/assembly/intel.md
|
||||
- RISC-V: languages/assembly/riscv.md
|
||||
- Python:
|
||||
- Python: languages/python/python.md
|
||||
- Modules:
|
||||
|
|
Loading…
Add table
Reference in a new issue