Skip to content

RISC-V Assembly Language

The open-source instruction set architecture revolutionizing processor design

RISC-V (pronounced "risk-five") represents a paradigm shift in processor architecture design, offering an open-source instruction set architecture that eliminates licensing restrictions while providing exceptional flexibility and performance. Originally developed at UC Berkeley, RISC-V has rapidly gained industry adoption with projected growth of 50% annually through 2030, making it essential knowledge for modern embedded systems developers, AI/ML engineers, and computer architecture researchers.

Architecture Overview

RISC-V follows classic reduced instruction set computer (RISC) principles with a clean, modular design optimized for modern compiler techniques. The architecture supports both 32-bit (RV32) and 64-bit (RV64) implementations, with a base instruction set that can be extended through optional modules for specific applications.

Key Design Principles

The RISC-V architecture embodies several fundamental design principles that distinguish it from other processor architectures. The instruction set maintains a consistent 32-bit instruction length for the base ISA, simplifying instruction fetch and decode logic. The architecture employs a load-store design where arithmetic operations only work on registers, with separate instructions for memory access. This approach reduces complexity and improves performance predictability.

The modular extension system allows implementations to include only the features needed for specific applications, reducing silicon area and power consumption. The open-source nature eliminates licensing fees and vendor lock-in, enabling custom processor development and fostering innovation across the industry.

Register Architecture

RISC-V provides 32 general-purpose registers (x0-x31) in the base integer instruction set, each 32 bits wide in RV32I or 64 bits wide in RV64I. The register x0 is hardwired to zero and cannot be modified, providing a constant zero value for operations and simplifying instruction encoding.

Register Set and ABI Names

RegisterABI NameDescriptionCalling Convention
x0zeroHard-wired zeroN/A
x1raReturn addressCaller-saved
x2spStack pointerCallee-saved
x3gpGlobal pointerN/A
x4tpThread pointerN/A
x5-x7t0-t2Temporary registersCaller-saved
x8s0/fpSaved register/Frame pointerCallee-saved
x9s1Saved registerCallee-saved
x10-x11a0-a1Function arguments/return valuesCaller-saved
x12-x17a2-a7Function argumentsCaller-saved
x18-x27s2-s11Saved registersCallee-saved
x28-x31t3-t6Temporary registersCaller-saved

Register Usage Conventions

The RISC-V calling convention defines specific roles for registers to ensure compatibility between different compilers and libraries. Argument registers a0-a7 pass the first eight arguments to functions, with additional arguments passed on the stack. Return values use a0-a1, with a0 holding the primary return value and a1 used for 64-bit returns on RV32.

Temporary registers t0-t6 provide scratch space for computations and need not be preserved across function calls. Saved registers s0-s11 must be preserved by called functions, making them suitable for variables that span function calls. The stack pointer sp must always point to valid stack memory, while the frame pointer fp (alias for s0) optionally maintains a fixed reference point within the current stack frame.

Instruction Set Architecture

The RISC-V base instruction set provides a complete foundation for general-purpose computing while maintaining simplicity and regularity. RV32I includes 37 instructions covering arithmetic, logical, memory access, and control flow operations. RV64I extends this with 12 additional instructions for 64-bit operations.

Instruction Formats

RISC-V uses six basic instruction formats that encode different types of operations while maintaining consistent field positions for common elements like register specifiers.

R-Type Instructions (Register-Register)

31    25 24  20 19  15 14  12 11   7 6     0
[funct7] [rs2] [rs1] [funct3] [rd] [opcode]

R-type instructions perform operations between two source registers and store the result in a destination register. Examples include arithmetic operations like ADD, SUB, and logical operations like AND, OR, XOR.

assembly
add x1, x2, x3      # x1 = x2 + x3
sub x4, x5, x6      # x4 = x5 - x6
and x7, x8, x9      # x7 = x8 & x9
or  x10, x11, x12   # x10 = x11 | x12

I-Type Instructions (Immediate)

31        20 19  15 14  12 11   7 6     0
[immediate] [rs1] [funct3] [rd] [opcode]

I-type instructions operate on a register and a 12-bit immediate value. This format covers immediate arithmetic, load instructions, and some system operations.

assembly
addi x1, x2, 100    # x1 = x2 + 100
lw   x3, 8(x4)      # x3 = memory[x4 + 8]
andi x5, x6, 0xFF   # x5 = x6 & 0xFF

S-Type Instructions (Store)

31    25 24  20 19  15 14  12 11   7 6     0
[imm[11:5]] [rs2] [rs1] [funct3] [imm[4:0]] [opcode]

S-type instructions store register values to memory with a 12-bit offset from a base register.

assembly
sw x1, 12(x2)       # memory[x2 + 12] = x1
sb x3, 0(x4)        # memory[x4] = x3 (byte)
sh x5, 4(x6)        # memory[x6 + 4] = x5 (halfword)

B-Type Instructions (Branch)

31    25 24  20 19  15 14  12 11   7 6     0
[imm[12|10:5]] [rs2] [rs1] [funct3] [imm[4:1|11]] [opcode]

B-type instructions perform conditional branches based on register comparisons with a 12-bit PC-relative offset.

assembly
beq x1, x2, label   # branch if x1 == x2
bne x3, x4, label   # branch if x3 != x4
blt x5, x6, label   # branch if x5 < x6 (signed)
bge x7, x8, label   # branch if x7 >= x8 (signed)

U-Type Instructions (Upper Immediate)

31        12 11   7 6     0
[immediate] [rd] [opcode]

U-type instructions load 20-bit immediate values into the upper bits of a register.

assembly
lui x1, 0x12345     # x1 = 0x12345000
auipc x2, 0x1000    # x2 = PC + 0x1000000

J-Type Instructions (Jump)

31        12 11   7 6     0
[immediate] [rd] [opcode]

J-type instructions perform unconditional jumps with a 20-bit PC-relative offset.

assembly
jal x1, function    # x1 = PC + 4, PC = PC + offset

Base Integer Instructions (RV32I/RV64I)

Arithmetic Instructions

RISC-V provides a comprehensive set of arithmetic instructions for both register-register and register-immediate operations. These instructions form the foundation for mathematical computations and address calculations.

assembly
# Basic arithmetic
add  rd, rs1, rs2   # rd = rs1 + rs2
sub  rd, rs1, rs2   # rd = rs1 - rs2
addi rd, rs1, imm   # rd = rs1 + sign_extend(imm)

# Logical operations
and  rd, rs1, rs2   # rd = rs1 & rs2
or   rd, rs1, rs2   # rd = rs1 | rs2
xor  rd, rs1, rs2   # rd = rs1 ^ rs2
andi rd, rs1, imm   # rd = rs1 & sign_extend(imm)
ori  rd, rs1, imm   # rd = rs1 | sign_extend(imm)
xori rd, rs1, imm   # rd = rs1 ^ sign_extend(imm)

# Shift operations
sll  rd, rs1, rs2   # rd = rs1 << (rs2 & 0x1F)
srl  rd, rs1, rs2   # rd = rs1 >> (rs2 & 0x1F) (logical)
sra  rd, rs1, rs2   # rd = rs1 >> (rs2 & 0x1F) (arithmetic)
slli rd, rs1, shamt # rd = rs1 << shamt
srli rd, rs1, shamt # rd = rs1 >> shamt (logical)
srai rd, rs1, shamt # rd = rs1 >> shamt (arithmetic)

# Comparison operations
slt  rd, rs1, rs2   # rd = (rs1 < rs2) ? 1 : 0 (signed)
sltu rd, rs1, rs2   # rd = (rs1 < rs2) ? 1 : 0 (unsigned)
slti rd, rs1, imm   # rd = (rs1 < sign_extend(imm)) ? 1 : 0
sltiu rd, rs1, imm  # rd = (rs1 < sign_extend(imm)) ? 1 : 0 (unsigned)

Memory Access Instructions

RISC-V uses a load-store architecture where all arithmetic operations work on registers, with separate instructions for memory access. Load instructions transfer data from memory to registers, while store instructions transfer data from registers to memory.

assembly
# Load instructions
lb  rd, offset(rs1) # rd = sign_extend(memory[rs1 + offset][7:0])
lh  rd, offset(rs1) # rd = sign_extend(memory[rs1 + offset][15:0])
lw  rd, offset(rs1) # rd = sign_extend(memory[rs1 + offset][31:0])
lbu rd, offset(rs1) # rd = zero_extend(memory[rs1 + offset][7:0])
lhu rd, offset(rs1) # rd = zero_extend(memory[rs1 + offset][15:0])

# Store instructions
sb rs2, offset(rs1) # memory[rs1 + offset][7:0] = rs2[7:0]
sh rs2, offset(rs1) # memory[rs1 + offset][15:0] = rs2[15:0]
sw rs2, offset(rs1) # memory[rs1 + offset][31:0] = rs2[31:0]

# RV64I additional load/store instructions
ld  rd, offset(rs1) # rd = memory[rs1 + offset][63:0]
lwu rd, offset(rs1) # rd = zero_extend(memory[rs1 + offset][31:0])
sd  rs2, offset(rs1) # memory[rs1 + offset][63:0] = rs2[63:0]

Control Flow Instructions

Control flow instructions manage program execution by implementing branches, jumps, and function calls. RISC-V provides both conditional and unconditional control transfer instructions.

assembly
# Conditional branches
beq  rs1, rs2, offset # if (rs1 == rs2) PC += sign_extend(offset)
bne  rs1, rs2, offset # if (rs1 != rs2) PC += sign_extend(offset)
blt  rs1, rs2, offset # if (rs1 < rs2) PC += sign_extend(offset) (signed)
bge  rs1, rs2, offset # if (rs1 >= rs2) PC += sign_extend(offset) (signed)
bltu rs1, rs2, offset # if (rs1 < rs2) PC += sign_extend(offset) (unsigned)
bgeu rs1, rs2, offset # if (rs1 >= rs2) PC += sign_extend(offset) (unsigned)

# Unconditional jumps
jal  rd, offset       # rd = PC + 4; PC += sign_extend(offset)
jalr rd, rs1, offset  # rd = PC + 4; PC = (rs1 + sign_extend(offset)) & ~1

# Common jump patterns
j    offset           # jal x0, offset (discard return address)
jr   rs1              # jalr x0, rs1, 0 (jump register)
call offset           # jal x1, offset (function call)
ret                   # jalr x0, x1, 0 (function return)

Upper Immediate Instructions

Upper immediate instructions load 20-bit constants into the upper portion of registers, enabling the construction of full 32-bit constants when combined with immediate arithmetic instructions.

assembly
# Load upper immediate
lui   rd, immediate   # rd = immediate << 12

# Add upper immediate to PC
auipc rd, immediate   # rd = PC + (immediate << 12)

# Loading 32-bit constants
lui   x1, %hi(0x12345678)  # x1 = 0x12345000
addi  x1, x1, %lo(0x12345678)  # x1 = 0x12345678

# PC-relative addressing
auipc x1, %pcrel_hi(symbol)   # x1 = PC + high_20_bits(symbol - PC)
addi  x1, x1, %pcrel_lo(symbol)  # x1 = symbol address

RV64I Extensions

RV64I extends the base 32-bit instruction set with additional instructions for 64-bit operations. These instructions provide efficient manipulation of 64-bit values while maintaining compatibility with 32-bit operations.

64-bit Arithmetic Instructions

assembly
# 64-bit arithmetic (RV64I only)
addw  rd, rs1, rs2    # rd = sign_extend((rs1 + rs2)[31:0])
subw  rd, rs1, rs2    # rd = sign_extend((rs1 - rs2)[31:0])
addiw rd, rs1, imm    # rd = sign_extend((rs1 + imm)[31:0])

# 64-bit shifts (RV64I only)
sllw  rd, rs1, rs2    # rd = sign_extend((rs1 << rs2[4:0])[31:0])
srlw  rd, rs1, rs2    # rd = sign_extend((rs1[31:0] >> rs2[4:0])[31:0])
sraw  rd, rs1, rs2    # rd = sign_extend((rs1[31:0] >> rs2[4:0])[31:0]) (arithmetic)
slliw rd, rs1, shamt  # rd = sign_extend((rs1 << shamt)[31:0])
srliw rd, rs1, shamt  # rd = sign_extend((rs1[31:0] >> shamt)[31:0])
sraiw rd, rs1, shamt  # rd = sign_extend((rs1[31:0] >> shamt)[31:0]) (arithmetic)

System Instructions

RISC-V includes system instructions for environment calls, breakpoints, and control and status register (CSR) access. These instructions enable system software implementation and debugging support.

assembly
# Environment calls
ecall                 # Environment call (system call)
ebreak                # Environment break (debugger breakpoint)

# Control and Status Register instructions
csrrw  rd, csr, rs1   # rd = CSR; CSR = rs1
csrrs  rd, csr, rs1   # rd = CSR; CSR = CSR | rs1
csrrc  rd, csr, rs1   # rd = CSR; CSR = CSR & ~rs1
csrrwi rd, csr, imm   # rd = CSR; CSR = zero_extend(imm)
csrrsi rd, csr, imm   # rd = CSR; CSR = CSR | zero_extend(imm)
csrrci rd, csr, imm   # rd = CSR; CSR = CSR & ~zero_extend(imm)

# Fence instructions
fence                 # Memory fence
fence.i               # Instruction fence

Pseudoinstructions

RISC-V assembly language includes numerous pseudoinstructions that expand to one or more base instructions, providing convenient mnemonics for common operations.

assembly
# Common pseudoinstructions
nop                   # addi x0, x0, 0
li   rd, immediate    # Load immediate (may expand to lui + addi)
mv   rd, rs           # addi rd, rs, 0
not  rd, rs           # xori rd, rs, -1
neg  rd, rs           # sub rd, x0, rs
seqz rd, rs           # sltiu rd, rs, 1
snez rd, rs           # sltu rd, x0, rs
sltz rd, rs           # slt rd, rs, x0
sgtz rd, rs           # slt rd, x0, rs

# Branch pseudoinstructions
beqz rs, offset       # beq rs, x0, offset
bnez rs, offset       # bne rs, x0, offset
blez rs, offset       # bge x0, rs, offset
bgez rs, offset       # bge rs, x0, offset
bltz rs, offset       # blt rs, x0, offset
bgtz rs, offset       # blt x0, rs, offset

# Jump pseudoinstructions
j    offset           # jal x0, offset
jr   rs               # jalr x0, rs, 0
ret                   # jalr x0, x1, 0
call offset           # jal x1, offset
tail offset           # jal x0, offset

Programming Examples

Hello World Program

assembly
.section .data
hello_msg:
    .string "Hello, RISC-V World!\n"
    .equ hello_len, . - hello_msg

.section .text
.global _start

_start:
    # Write system call
    li a7, 64           # sys_write
    li a0, 1            # stdout
    la a1, hello_msg    # message address
    li a2, hello_len    # message length
    ecall               # system call
    
    # Exit system call
    li a7, 93           # sys_exit
    li a0, 0            # exit status
    ecall               # system call

Fibonacci Sequence

assembly
.section .text
.global fibonacci

# Calculate nth Fibonacci number
# Input: a0 = n
# Output: a0 = fibonacci(n)
fibonacci:
    # Base cases
    li t0, 2
    blt a0, t0, fib_base    # if n < 2, return n
    
    # Save registers
    addi sp, sp, -16
    sd ra, 8(sp)
    sd s0, 0(sp)
    
    mv s0, a0               # save n
    
    # Calculate fibonacci(n-1)
    addi a0, s0, -1
    call fibonacci
    mv t1, a0               # save fibonacci(n-1)
    
    # Calculate fibonacci(n-2)
    addi a0, s0, -2
    call fibonacci
    add a0, a0, t1          # fibonacci(n-2) + fibonacci(n-1)
    
    # Restore registers
    ld ra, 8(sp)
    ld s0, 0(sp)
    addi sp, sp, 16
    ret

fib_base:
    ret                     # return n (already in a0)

Array Sum Function

assembly
.section .text
.global array_sum

# Calculate sum of array elements
# Input: a0 = array address, a1 = array length
# Output: a0 = sum
array_sum:
    li t0, 0                # sum = 0
    li t1, 0                # index = 0
    
sum_loop:
    bge t1, a1, sum_done    # if index >= length, exit
    
    slli t2, t1, 2          # t2 = index * 4 (word size)
    add t3, a0, t2          # t3 = array + offset
    lw t4, 0(t3)            # load array[index]
    add t0, t0, t4          # sum += array[index]
    
    addi t1, t1, 1          # index++
    j sum_loop
    
sum_done:
    mv a0, t0               # return sum
    ret

Standard Extensions

M Extension (Integer Multiplication and Division)

The M extension adds integer multiplication and division instructions to the base instruction set.

assembly
# Multiplication instructions
mul    rd, rs1, rs2     # rd = (rs1 * rs2)[XLEN-1:0]
mulh   rd, rs1, rs2     # rd = (rs1 * rs2)[2*XLEN-1:XLEN] (signed × signed)
mulhsu rd, rs1, rs2     # rd = (rs1 * rs2)[2*XLEN-1:XLEN] (signed × unsigned)
mulhu  rd, rs1, rs2     # rd = (rs1 * rs2)[2*XLEN-1:XLEN] (unsigned × unsigned)

# Division instructions
div    rd, rs1, rs2     # rd = rs1 / rs2 (signed)
divu   rd, rs1, rs2     # rd = rs1 / rs2 (unsigned)
rem    rd, rs1, rs2     # rd = rs1 % rs2 (signed)
remu   rd, rs1, rs2     # rd = rs1 % rs2 (unsigned)

# RV64M additional instructions
mulw   rd, rs1, rs2     # rd = sign_extend((rs1 * rs2)[31:0])
divw   rd, rs1, rs2     # rd = sign_extend((rs1 / rs2)[31:0]) (signed)
divuw  rd, rs1, rs2     # rd = sign_extend((rs1 / rs2)[31:0]) (unsigned)
remw   rd, rs1, rs2     # rd = sign_extend((rs1 % rs2)[31:0]) (signed)
remuw  rd, rs1, rs2     # rd = sign_extend((rs1 % rs2)[31:0]) (unsigned)

A Extension (Atomic Instructions)

The A extension provides atomic memory operations for synchronization and lock-free programming.

assembly
# Load-reserved/Store-conditional
lr.w    rd, (rs1)       # rd = memory[rs1]; reserve memory[rs1]
sc.w    rd, rs2, (rs1)  # if reservation valid: memory[rs1] = rs2, rd = 0
                        # else: rd = nonzero

# Atomic memory operations
amoadd.w  rd, rs2, (rs1)  # rd = memory[rs1]; memory[rs1] += rs2
amoswap.w rd, rs2, (rs1)  # rd = memory[rs1]; memory[rs1] = rs2
amoand.w  rd, rs2, (rs1)  # rd = memory[rs1]; memory[rs1] &= rs2
amoor.w   rd, rs2, (rs1)  # rd = memory[rs1]; memory[rs1] |= rs2
amoxor.w  rd, rs2, (rs1)  # rd = memory[rs1]; memory[rs1] ^= rs2
amomax.w  rd, rs2, (rs1)  # rd = memory[rs1]; memory[rs1] = max(memory[rs1], rs2)
amomin.w  rd, rs2, (rs1)  # rd = memory[rs1]; memory[rs1] = min(memory[rs1], rs2)

# Memory ordering annotations
.aq     # Acquire semantics
.rl     # Release semantics
.aqrl   # Acquire-release semantics

F and D Extensions (Floating-Point)

The F extension adds single-precision floating-point support, while the D extension adds double-precision floating-point.

assembly
# Single-precision floating-point (F extension)
fadd.s    fd, fs1, fs2    # fd = fs1 + fs2
fsub.s    fd, fs1, fs2    # fd = fs1 - fs2
fmul.s    fd, fs1, fs2    # fd = fs1 * fs2
fdiv.s    fd, fs1, fs2    # fd = fs1 / fs2
fsqrt.s   fd, fs1         # fd = sqrt(fs1)

# Floating-point load/store
flw       fd, offset(rs1) # fd = memory[rs1 + offset]
fsw       fs2, offset(rs1) # memory[rs1 + offset] = fs2

# Floating-point comparisons
feq.s     rd, fs1, fs2    # rd = (fs1 == fs2) ? 1 : 0
flt.s     rd, fs1, fs2    # rd = (fs1 < fs2) ? 1 : 0
fle.s     rd, fs1, fs2    # rd = (fs1 <= fs2) ? 1 : 0

# Floating-point conversions
fcvt.w.s  rd, fs1         # rd = (int32_t)fs1
fcvt.s.w  fd, rs1         # fd = (float)rs1

Assembly Programming Techniques

Function Calling Convention

RISC-V follows a standard calling convention that ensures compatibility between different compilers and libraries. Understanding this convention is essential for writing assembly functions that can interface with high-level language code.

assembly
# Function prologue
function_name:
    # Save return address and callee-saved registers
    addi sp, sp, -32        # allocate stack frame
    sd ra, 24(sp)           # save return address
    sd s0, 16(sp)           # save frame pointer
    sd s1, 8(sp)            # save callee-saved register
    sd s2, 0(sp)            # save callee-saved register
    addi s0, sp, 32         # set frame pointer
    
    # Function body
    # Arguments in a0-a7, return value in a0-a1
    # Use t0-t6 for temporary values
    # Use s1-s11 for values that must survive function calls
    
    # Function epilogue
    ld ra, 24(sp)           # restore return address
    ld s0, 16(sp)           # restore frame pointer
    ld s1, 8(sp)            # restore callee-saved register
    ld s2, 0(sp)            # restore callee-saved register
    addi sp, sp, 32         # deallocate stack frame
    ret                     # return to caller

Position-Independent Code

Position-independent code (PIC) allows programs to run at any memory address, essential for shared libraries and modern operating systems.

assembly
# Global variable access in PIC
.option pic

access_global:
    # Get PC-relative address of global variable
    auipc t0, %pcrel_hi(global_var)
    addi t0, t0, %pcrel_lo(global_var)
    lw a0, 0(t0)            # load global variable
    ret

# Function call in PIC
call_function:
    # PC-relative function call
    auipc t0, %pcrel_hi(target_function)
    jalr ra, t0, %pcrel_lo(target_function)
    ret

Optimized Memory Operations

assembly
# Optimized memory copy (word-aligned)
memcpy_words:
    # a0 = destination, a1 = source, a2 = word count
    beqz a2, copy_done      # if count == 0, done
    
copy_loop:
    lw t0, 0(a1)            # load word from source
    sw t0, 0(a0)            # store word to destination
    addi a0, a0, 4          # advance destination
    addi a1, a1, 4          # advance source
    addi a2, a2, -1         # decrement count
    bnez a2, copy_loop      # continue if count > 0
    
copy_done:
    ret

# Optimized string length calculation
strlen:
    # a0 = string address, return length in a0
    mv t0, a0               # save original address
    
strlen_loop:
    lb t1, 0(t0)            # load byte
    beqz t1, strlen_done    # if null terminator, done
    addi t0, t0, 1          # advance pointer
    j strlen_loop
    
strlen_done:
    sub a0, t0, a0          # calculate length
    ret

Debugging and Development Tools

GDB Integration

RISC-V assembly programs can be debugged using GDB with RISC-V support. The debugging process involves compiling with debug information and using GDB commands specific to RISC-V.

bash
# Compile with debug information
riscv64-unknown-elf-gcc -g -o program program.s

# Debug with GDB
riscv64-unknown-elf-gdb program

# GDB commands for RISC-V
(gdb) info registers        # show all registers
(gdb) info registers x1     # show specific register
(gdb) x/10i $pc            # disassemble 10 instructions at PC
(gdb) stepi                # single step instruction
(gdb) break *0x10000       # set breakpoint at address

Simulation and Emulation

RISC-V programs can be tested using various simulators and emulators before running on actual hardware.

bash
# Spike simulator (ISA simulator)
spike pk program

# QEMU system emulation
qemu-system-riscv64 -machine virt -bios none -kernel program

# QEMU user mode emulation
qemu-riscv64 program

Performance Optimization

Instruction Scheduling

RISC-V processors benefit from careful instruction scheduling to minimize pipeline stalls and maximize throughput.

assembly
# Poor scheduling - data dependency stalls
lw t0, 0(a0)
addi t1, t0, 1      # stall: depends on t0
sw t1, 4(a0)        # stall: depends on t1

# Better scheduling - interleave independent operations
lw t0, 0(a0)
lw t2, 8(a0)        # independent load
addi t1, t0, 1      # t0 now available
addi t3, t2, 1      # independent operation
sw t1, 4(a0)
sw t3, 12(a0)

Loop Optimization

assembly
# Unrolled loop for better performance
unrolled_sum:
    # a0 = array, a1 = count (must be multiple of 4)
    li t0, 0                # sum
    
unroll_loop:
    beqz a1, sum_done
    
    lw t1, 0(a0)            # load 4 elements
    lw t2, 4(a0)
    lw t3, 8(a0)
    lw t4, 12(a0)
    
    add t0, t0, t1          # accumulate
    add t0, t0, t2
    add t0, t0, t3
    add t0, t0, t4
    
    addi a0, a0, 16         # advance by 4 words
    addi a1, a1, -4         # decrement count by 4
    j unroll_loop
    
sum_done:
    mv a0, t0
    ret

System Programming

Exception Handling

RISC-V provides a comprehensive exception handling mechanism for implementing operating systems and handling runtime errors.

assembly
# Exception vector table
.section .text.vectors
.align 2
exception_vector:
    j handle_reset          # Reset
    j handle_nmi            # Non-maskable interrupt
    j handle_hard_fault     # Hard fault
    # ... additional vectors

# Exception handler
handle_exception:
    # Save all registers
    addi sp, sp, -256
    sd x1, 8(sp)
    sd x2, 16(sp)
    # ... save all registers x1-x31
    
    # Read exception cause
    csrr t0, mcause
    csrr t1, mepc
    csrr t2, mtval
    
    # Handle specific exceptions
    li t3, 8                # Environment call
    beq t0, t3, handle_ecall
    
    # Default handler
    j default_exception
    
handle_ecall:
    # System call handling
    # a7 contains system call number
    # a0-a6 contain arguments
    
    # Restore registers and return
    # ... restore all registers
    addi sp, sp, 256
    mret                    # return from exception

Memory Management

assembly
# Page table setup (simplified)
setup_page_table:
    # Set up identity mapping for kernel
    la t0, page_table
    
    # Create page table entries
    li t1, 0x1000           # 4KB pages
    li t2, 0x0F             # Read/write/execute permissions
    
map_loop:
    or t3, t1, t2           # combine address and permissions
    sd t3, 0(t0)            # store page table entry
    addi t0, t0, 8          # next entry
    addi t1, t1, 0x1000     # next page
    # ... continue mapping
    
    # Enable paging
    la t0, page_table
    srli t0, t0, 12         # convert to PPN
    li t1, 0x8000000000000000  # SATP mode field
    or t0, t0, t1
    csrw satp, t0           # set page table base
    sfence.vma              # flush TLB
    ret

Best Practices and Common Patterns

Error Handling

assembly
# Function with error checking
safe_divide:
    # a0 = dividend, a1 = divisor
    # Returns: a0 = result, a1 = error code (0 = success)
    
    beqz a1, divide_by_zero # check for division by zero
    
    div a0, a0, a1          # perform division
    li a1, 0                # success
    ret
    
divide_by_zero:
    li a0, 0                # result = 0
    li a1, 1                # error code = 1
    ret

Data Structure Manipulation

assembly
# Linked list traversal
traverse_list:
    # a0 = list head pointer
    # Returns: a0 = list length
    
    li t0, 0                # count = 0
    
traverse_loop:
    beqz a0, traverse_done  # if pointer is null, done
    
    addi t0, t0, 1          # increment count
    ld a0, 8(a0)            # load next pointer (assuming 8-byte offset)
    j traverse_loop
    
traverse_done:
    mv a0, t0               # return count
    ret