From cea6c7050aa93554357b99fdf0334cd2eb81e348 Mon Sep 17 00:00:00 2001 From: Marcello Lamonaca Date: Fri, 17 May 2024 19:15:32 +0200 Subject: [PATCH] assembly: ass RISC-V variant notes --- docs/languages/assembly/riscv.md | 220 +++++++++++++++++++++++++++++++ mkdocs.yml | 1 + 2 files changed, 221 insertions(+) create mode 100644 docs/languages/assembly/riscv.md diff --git a/docs/languages/assembly/riscv.md b/docs/languages/assembly/riscv.md new file mode 100644 index 0000000..ce1abb4 --- /dev/null +++ b/docs/languages/assembly/riscv.md @@ -0,0 +1,220 @@ +# [Assembly (RISC-V)][book] + +[book]: https://riscv-programming.org/book/riscv-book.html "An Introduction to Assembly Programming with RISC-V" + +Assembly programs are encoded as plain text files and contain four main elements: + +- **Comments**: comments are textual notes that are often used to document information on the code. +- **Labels**: labels are "markers" that represent program locations. +- **Instructions**: Assembly instructions are instructions that are converted by the assembler into machine instructions. +- **Directives**: Assembly directives are commands used to coordinate the assembling process. + +## Assembly Language + +### Labels + +**Labels** are "markers" that represent program _locations_. They can be inserted into an assembly program to "mark" a program position so that it can be referred to by assembly instructions. + +```asm +x: # <-- label definition + .word 10 + +sum: + lw a0, x + addi a0, a0, 10 + ret +``` + +Assemblers usually accept two kinds of labels: **symbolic** and **numeric** labels. + +Symbolic labels are defined by an _identifier_ followed by `:`. +They are stored as symbols in the symbol table and are often used to identify global variables and routines. + +Numeric labels are defined by a _single decimal digit_ followed by `:`. +They are used for local reference and are _not included_ in the symbol table of executable files. They can be redefined repeatedly in the same assembly program. + +References to numeric labels contain a _suffix_ that indicates whether the reference is to a numeric label positioned before (`b` suffix) or after (`f` suffix) the reference. + +### Symbols + +Program **symbols** are "names" that are associated with _numerical values_ and the **symbol table** is a data structure that maps each program symbol to its value. +Labels are automatically converted into program symbols by the assembler and associated with a numerical value that represents its position in the program, which is a memory address. + +It's possible to explicitly define symbols with the `.set` (or `.equ`) directive. + +```asm +.set answer, 42 + +get_answer: + li a0, answer + ret +``` + +### References & Relocations + +Each reference to a label must be replaced by an address during the assembling and linking processes. **Relocation** is the process in which the code and data are assigned new memory addresses so that they do not conflict with addresses of coming from the other linked sources. + +The **relocation table** is a data structure that contains information that describes how the program instructions and data need to be modified to reflect the addresses reassignment. Each object file contains a relocation table and the linker uses their information to adjust the code when performing the relocation process. + +### Global vs Local Symbols + +Symbols are classified as _local_ or _global_ symbols. + +**Local symbols** are only visible on the same file, i.e., the linker does not use them to resolve undefined references on other files. +**Global symbols**, on the other hand, are used by the linker to resolve undefined reference on other files. + +By default, the assembler registers labels as local symbols. The `.globl` directive instructs the assembler to register a label as a global symbol. + +```asm +.globl exit + +exit: + li a0, 0 + li a7, 93 + ecall +``` + +### Program Entry Point + +Every program has an **entry point**: the point from which the CPU must start executing the program. +The entry point is defined by an address, which is the address of the _first_ instruction that must be executed. + +```asm +.globl start # <-- program entry point +``` + +> **Note**: the `start` label **must** be registered as a _global_ symbol for the linker to recognize it as the entry point. + +## Program Sections + +Executable and object files, and assembly programs are usually organized in **sections**. +A section may contain data or instructions, and the contents of each section are mapped to a set of consecutive main memory addresses. + +The following sections are often present on executable files generated for Linux-based systems: + +- `.text`: a section dedicated to store the program instructions. +- `.data`: a section dedicated to store initialized global variables. +- `.bss`: a section dedicated to store uninitialized global variables. +- `.rodata`: a section dedicated to store constants. + +When linking multiple object files, the linker groups information from sections with the same name and places them together into a single section on the executable file. + +To instruct the assembler to add the assembled information into other sections, the programmer (or the compiler) may use the `.section ` directive. + +```asm +.section .text + +update_y: + la t1, y + sw a0, (t1) + ret + +update_x: + la t1, x + sw a0, (t1) + ret + +.section .data + +x: .word 10 +y: .word 12 +``` + +### Assembly Instructions + +Assembly instructions are instructions that are converted by the assembler into machine instructions. +They are usually encoded as a string that contains a **mnemonic** and a sequence of parameters, known as **operands**. + +A **pseudo-instruction** is an assembly instruction that does not have a corresponding machine instruction on the ISA, but can be translated automatically by the assembler into one or more alternative machine instructions to achieve the same effect. + +The operands of assembly instructions may contain: + +- **A register name**: a register name identifies one of the ISA registers. + + + +- **An immediate value**: an immediate value is a numerical constant that is directly encoded into the machine instruction as a sequence of bits. +- **A symbol name**: symbol names identify symbols on the symbol table and are replaced by their respective values during the assembling and linking processes. + Their value are encoded into the machine instruction as a sequence of bits. + +### Immediate Values + +Immediate values are represented on assembly language by a sequence of alphanumeric characters. + +- Sequences started with the `0x` and the `0b` prefixes are interpreted as hexadecimal and binary numbers, respectively. +- Octal numbers are represented by a sequence of numeric digits starting with digit `0`. +- Sequences of numeric digits starting with digits `1` to `9` are interpreted as decimal numbers. +- Alphanumeric characters represented between single quotation marks are converted to numeric values using the ASCII table +- To denote a negative integer, it suffices to add the `-` prefix. + +```asm +li a0, 10 # load value ten into register a0 +2 li a1, 0xa # load value ten into register a1 +3 li a2, 0b1010 # load value ten into register a2 +4 li a3, 012 # load value ten into register a3 +5 li a4, ’0’ # load value forty eight into register a4 +6 li a5, ’a’ # load value ninety seven into register a5 +``` + +### The `.` Directives + +The `.byte`, `.half`, `.word`, and `.dword` directives add one or more values to the active section. Their arguments may be expressed as immediate +values, symbols (which are replaced by their value during the assembling and linking processes) or by arithmetic expressions that combine both. + +The `.string`, `.asciz`, and `.ascii` directives add strings to the active section. The string is encoded as a sequence of bytes. + +| Directive | Arguments | Description | +|:---------:|:----------------:|:------------------------------------------------:| +| `.byte` | `expr [, expr]*` | Emit one or more **8bit** comma separated words | +| `.half` | `expr [, expr]*` | Emit one or more **16bit** comma separated words | +| `.word` | `expr [, expr]*` | Emit one or more **32bit** comma separated words | +| `.dword` | `expr [, expr]*` | Emit one or more **64bit** comma separated words | +| `.string` | `string` | Emit `NULL` terminated string | +| `.asciz` | `string` | Alias for `.string` | +| `.ascii` | `string` | Emit string without `NULL` character | + +### The `.set` and `.equ` directives + +The `.set name, expression` directive adds a symbol to the symbol table. +It takes a name and an expression as arguments, evaluates an expression to a value and store the name and the resulting value into the symbol table. + +The `.equ` directive performs the same task as the .set directive. + +### The `.globl` directive + +The `.globl` directive can be used to turn local symbols into global ones. + +```asm +.global start +.global max_value + +.set max_value, 42 + +start: + li a0, max_value + jal process_temp + ret +``` + +### The `.skip` directive + +The `.bss` section is dedicated for storing _uninitialized_ global variables. +These variables need to be allocated on memory, but they do not need to be initialized by the loader when a program is executed. As a consequence, their initial value do not need to be stored on executable nor object files. + +To allocate variables on the `.bss` section it suffices to declare a label to identify the variable and advance the `.bss` location counter by the amount of bytes the variable require, so further variables are allocated on other address. +The `.skip N` directive is a directive that advances the location counter by `N` units and can be used to allocate space for variables on the `.bss` section. + +```asm +.section .bss +x: .skip 4 +y: .skip 80 +z: .skip 4 +``` + +### The `.align` directive + +Some ISA's require instructions or multi-byte data to be stored on addresses that are multiple of a given number. + +The proper way of ensuring the location counter is aligned is by using the `.align N` directive. +The `.align N` directive checks if the location counter is a _multiple_ of `2^N`, if it is, it has no effect on the program, otherwise, it advances the location counter to the next value that is a multiple of `2^N`. diff --git a/mkdocs.yml b/mkdocs.yml index c45f0e1..43f9342 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -94,6 +94,7 @@ nav: - Swift: languages/swift/swift.md - Assembly: - Intel: languages/assembly/intel.md + - RISC-V: languages/assembly/riscv.md - Python: - Python: languages/python/python.md - Modules: