๐Ÿ’พ

A language tour

The Intimacy
of Assembly

No compiler between you and the machine. Just registers, opcodes, and the raw will to compute.

scroll

01 โ€” The Machine's Native Tongue

The language that is the machine

Every program you have ever written โ€” in Python, Go, Rust, JavaScript โ€” eventually becomes a stream of assembly instructions before any computation happens. Assembly is not an abstraction over the CPU. It is the CPU's own language, transliterated into something humans can read. When you write assembly, nothing is hidden from you. Nothing can be.

"There is nothing more honest than assembly. It does exactly what you tell it. The problem is that it also does exactly what you tell it."

hello.asm
section .data
    msg  db  "Hello, World!", 10   ; 10 = newline (ASCII LF)
    len  equ $ - msg           ; $ = current address, so len = 14

section .text
    global _start

_start:
    mov  rax, 1       ; syscall number: write(2)
    mov  rdi, 1       ; arg 1: fd = 1 (stdout)
    mov  rsi, msg     ; arg 2: buf = pointer to message
    mov  rdx, len     ; arg 3: count = 14 bytes
    syscall           ; trap into the kernel

    mov  rax, 60      ; syscall number: exit(2)
    xor  rdi, rdi    ; arg 1: status = 0  (xor r,r is fastest zero)
    syscall

There is no runtime, no standard library, no main(). This program speaks directly to the Linux kernel through the syscall instruction โ€” a software interrupt that asks the OS to perform a privileged action on your behalf.


02 โ€” Registers

The CPU has sixteen pockets

Main memory holds gigabytes. Cache holds megabytes. Registers hold sixteen 64-bit values โ€” and they operate at the speed of the processor itself, with zero latency. The art of assembly is moving the right data into the right register at the right moment. A function that avoids spilling registers to memory can be an order of magnitude faster than one that doesn't.

registers.asm
; x86-64 general-purpose registers (System V ABI roles):
;   rax          โ€” return value, accumulator
;   rdi rsi rdx  โ€” function arguments 1, 2, 3
;   rcx r8  r9   โ€” function arguments 4, 5, 6
;   rbp rsp      โ€” frame pointer, stack pointer
;   rbx r12โ€“r15  โ€” callee-saved (you must preserve these)

; max(int a, int b) โ€” two approaches to the same problem

max_branch:                   ; edi = a, esi = b
    cmp  edi, esi            ; compare a and b (sets flags)
    jge  .a_wins             ; jump if a >= b (signed)
    mov  eax, esi            ; b is larger โ€” return it
    ret
.a_wins:
    mov  eax, edi
    ret

max_branchless:               ; same contract, no branch
    mov   eax, edi           ; assume a is the answer
    cmovl eax, esi           ; overwrite with b if a < b
    ret                       ; cmovl = conditional move if less

cmovl (conditional move if less) is a data-flow instruction โ€” it selects a value without a jump. Modern CPUs struggle to predict branches in data-dependent comparisons; cmov eliminates the prediction entirely.


03 โ€” The Stack

Every function call is a ritual

When you call a function in any language, the CPU executes a precise ceremony: it pushes the return address onto the stack, adjusts the stack pointer, and jumps. The callee saves the registers it needs, does its work, restores them in reverse order, and returns. High-level languages make this invisible. In assembly, you perform it yourself โ€” and understand it forever.

factorial.asm
; long factorial(int n)   โ€” edi = n, returns in rax
factorial:
    push  rbp                ; โ”€โ•ฎ standard prologue:
    mov   rbp, rsp          ; โ”€โ•ฏ establish a stack frame

    cmp   edi, 1
    jle   .base_case         ; n <= 1 โ†’ return 1

    push  rdi                ; save n โ€” the call below will clobber rdi
    dec   edi                ; edi = n - 1
    call  factorial          ; rax = factorial(n - 1)
    pop   rdi               ; restore n from the stack

    imul  rax, rdi          ; rax = n * factorial(n - 1)
    pop   rbp
    ret

.base_case:
    mov   rax, 1             ; return 1
    pop   rbp
    ret

The push rdi before the recursive call and pop rdi after it is the calling convention made visible: rdi is caller-saved, meaning if you need it after a call, you โ€” the caller โ€” are responsible for preserving it.


04 โ€” Flags & Branches

There is no if โ€” only compare and jump

Every conditional statement in every language becomes some form of compare-and-jump. The CPU maintains a FLAGS register โ€” a collection of single-bit indicators set as a side effect of arithmetic and comparison instructions. cmp subtracts two values and discards the result, but the flags remain. Then a conditional jump reads those flags and either leaps or falls through.

clamp.asm
; FLAGS register bits (set by cmp, sub, add, and others):
;   ZF โ€” Zero Flag    (result was zero)
;   SF โ€” Sign Flag    (result was negative)
;   CF โ€” Carry Flag   (unsigned overflow)
;   OF โ€” Overflow     (signed overflow)
;
; jl  = jump if less          (SF โ‰  OF)
; jge = jump if โ‰ฅ             (SF = OF)
; jz  = jump if zero          (ZF = 1)
; jne = jump if not equal     (ZF = 0)

; int clamp(int val, int lo, int hi)   edi=val, esi=lo, edx=hi
clamp:
    mov  eax, edi
    cmp  eax, esi            ; val - lo (sets flags, discards result)
    jge  .check_hi           ; val >= lo? proceed to upper bound check
    mov  eax, esi            ; val < lo: return lo
    ret
.check_hi:
    cmp  eax, edx            ; val - hi
    jle  .done               ; val <= hi: within range, return val
    mov  eax, edx            ; val > hi: return hi
.done:
    ret

cmp a, b is identical to sub a, b except the result is thrown away. The flags it sets are the only thing that matters. This is the foundation of every conditional expression you have ever written.


05 โ€” The Art of the Idiom

The tricks the architecture rewards

Assembly is full of idioms โ€” patterns that exploit the hardware's specific quirks to do more with less. They look cryptic until you understand the CPU they were written for. Then they look inevitable. A skilled assembly programmer's code is full of these, and reading them is like reading someone who knows exactly how the machine breathes.

idioms.asm
; โ”€โ”€ Zero a register โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
xor   rax, rax        ; 3 bytes. Shorter than mov rax, 0 (7 bytes).
                       ; Also renamed by the CPU: no dependency on old rax.

; โ”€โ”€ Test a register for zero โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
test  eax, eax        ; eax & eax โ€” sets ZF, SF, PF. Discards result.
jz    .is_zero        ; Saves one byte vs. cmp eax, 0 + je.

; โ”€โ”€ Cheap multiply via LEA โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
lea   rax, [rdi + rdi*2]  ; rax = rdi * 3   (no imul instruction)
lea   rax, [rdi + rdi*8]  ; rax = rdi * 9   (shift + add in one cycle)

; โ”€โ”€ Branchless absolute value โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
mov   ecx, eax
sar   ecx, 31         ; arithmetic shift: ecx = 0x00000000 or 0xFFFFFFFF
xor   eax, ecx        ; flip all bits if negative
sub   eax, ecx        ; sub -1 = add 1 โ†’ two's complement round-trip
                       ; result: |eax|, no branch, no mispredict

; โ”€โ”€ Load effective address for pointer arithmetic โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
lea   rcx, [rdi + rcx*8]  ; rcx = rdi + (rcx * 8) โ€” array element address
                           ; scales any index to any element size for free

lea (Load Effective Address) was designed to compute memory addresses โ€” but compilers have hijacked it for arithmetic because it can add, shift, and store in a single cycle without touching the FLAGS register. Knowing this is the difference between reading assembly and understanding it.


06 โ€” The Whole Picture

Why assembly still matters

๐Ÿ”

Reverse Engineering

Security researchers, malware analysts, and CTF players read assembly every day โ€” it's the only language available when you don't have the source.

๐Ÿ”Œ

Embedded Systems

Microcontrollers with 2KB of RAM and no OS are still programmed in assembly. When every byte counts, there is no room for a compiler's opinion.

๐Ÿ–ฅ๏ธ

Operating Systems

Interrupt handlers, context switches, and the first instructions after power-on are hand-written assembly. Linux still has thousands of lines of it.

โšก

Performance-Critical Code

Game engines, video codecs, and cryptography libraries drop into hand-optimised assembly for inner loops where every nanosecond is a design decision.

โš™๏ธ

Compilers

Every compiler has an assembly backend. Inspecting the output of gcc -O2 or rustc is one of the most instructive things a programmer can do.

๐Ÿง 

Understanding Everything

Once you can read assembly, you understand what your code actually does โ€” not what you imagined it does. That clarity changes how you write in every other language.