Memory API — malloc, free, and friends — Operating Systems

What it is#

malloc(size) and free(ptr) are the canonical user-space heap API. malloc returns a pointer to a contiguous block of size bytes that won’t move until you free it; free releases that block back to the allocator so a later malloc can reuse it. realloc, calloc, and aligned_alloc are convenience wrappers on top.

The interface is deceptively simple. Underneath, malloc is a library function — it lives in libc (glibc, musl), or your runtime (Go’s runtime, the JVM’s GC), or your replacement allocator (jemalloc, tcmalloc, mimalloc). The OS knows nothing about malloc; what the OS provides is bulk virtual-memory operations like brk and mmap, and the allocator carves user-sized blocks out of those.

The split matters: a malloc(8) call almost never reaches the kernel. The allocator already has a page or two of cached space; it returns a slice from there in tens of nanoseconds. Only when the cache runs dry does it ask the kernel for more pages, and that’s an expensive system call.

When to use it#

You use malloc whenever you need heap-allocated memory whose lifetime extends past the current function’s stack frame, or whose size isn’t known at compile time, or that’s larger than a comfortable stack allocation (a few KB). In C and C++, that covers most non-trivial data structures.

You don’t use it (or you use it indirectly) in higher-level languages:

Garbage-collected runtimes (Go, Java, Python) call into their own allocator, which may or may not be malloc under the hood. The GC tracks lifetimes so user code rarely calls free.
Region/arena allocators (Rust’s bumpalo, C++ std::pmr, game engines’ frame allocators) batch many small allocations into one chunk and free them all at once. Useful when lifetimes are known to be cohort-shaped.
Stack allocation is always preferable when the size is bounded and known. It’s literally free — moving the stack pointer.

How it works#

What the allocator sees#

The libc allocator maintains a pool of memory in your process’s address space. It tracks which bytes are handed out and which are free using free lists or bins organised by size class. On a malloc(N):

Round N up to the allocator’s size class (often the next power of 2, or a slab class).
Look in the free list / bin for that class.
If something’s there, unlink it and return the pointer.
If not, ask the kernel for more memory (more on this below), carve out a chunk, return one piece, push the rest onto the appropriate bin.

On free(p):

Look up the size of the block at p (typically stored in a header just before p).
Push the block onto the free list for that size class.
Optionally coalesce with neighbours to fight fragmentation. The deep dive is in Free Space Management.

How the allocator talks to the kernel#

There are two channels:

brk / sbrk — moves the program break, the boundary between the heap and the unmapped gap above it. Cheap, sequential. The kernel just extends the mapped region; physical pages are allocated lazily on first touch. Good for small, growing allocations.
mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, ...) — asks for a brand-new anonymous mapping somewhere in the virtual address space. Returns a pointer to len bytes of zero-initialised memory. Used for large allocations (glibc switches above ~128 KB by default via M_MMAP_THRESHOLD) and for returning memory to the OS — you can munmap an mmap region but you can’t easily punch a hole in the middle of a brk heap.

process address space:

  ┌──────────────┐  ← stack
  │              │
  │  mmap area   │  ← large mallocs land here; munmap returns them
  │              │
  │   (gap)      │
  │              │
  │  brk heap    │  ← small mallocs carved from here
  └──────────────┘  ← program break (moves with sbrk)
  │  bss / data  │
  │  code        │
  └──────────────┘

Why `free` rarely returns memory to the OS#

free(p) doesn’t usually call munmap. It pushes the block onto a free list inside the allocator, and the allocator keeps that virtual range mapped. Subsequent malloc calls hit the cache and run fast. The drawback: process RSS rarely shrinks after a burst of allocation, even after you free everything. This is why long-running processes (Redis, Postgres) seem to “leak” memory that they actually freed — the allocator is hoarding it. Mature allocators (jemalloc, tcmalloc) have heuristics to call madvise(MADV_DONTNEED) on idle pages, letting the kernel reclaim physical frames while keeping the virtual mapping.

Variants#

Allocator implementations#

glibc ptmalloc2 — Linux default. Per-thread arenas to reduce contention. Reasonable on most workloads, sometimes profiled as the bottleneck under heavy multi-thread allocation.
jemalloc — Facebook’s choice, used by Redis, Rust’s default in some configurations. Strong on fragmentation, predictable latency.
tcmalloc — Google’s, used by Chrome and Bazel. Per-thread caches, fast on small allocations.
mimalloc — Microsoft’s; aggressive segmentation, very fast in benchmarks.

Swapping the allocator is often a one-line change (LD_PRELOAD=libjemalloc.so). On a fragmentation-bound workload it can recover gigabytes of RSS.

Aligned allocation#

aligned_alloc(64, 1024) returns a 1024-byte block aligned to a 64-byte cache-line boundary. Useful for SIMD, lock-free structures (avoiding false sharing), and DMA buffers.

`realloc`#

realloc(p, new_size) resizes the block. If there’s room in place, it grows the block; otherwise it allocates a new one, memcpy’s the old contents, and frees the old block. The “in-place if possible” optimisation matters for growing vectors.

Trade-offs#

malloc / free (manual) — explicit lifetimes, predictable performance, no GC pause. Cost: every use-after-free, double-free, leak, and buffer overflow is your problem. Decades of CVEs and tools (ASAN, Valgrind, MSan) exist to fight this.

GC / arenas (automatic) — no lifetime bookkeeping, fewer memory-safety bugs. Cost: GC pauses (or arena resets at specific times), higher steady-state memory (you can’t free until the GC runs), and more opaque performance.

Other recurring tensions:

Speed vs. fragmentation — first-fit is faster than best-fit but fragments more. Size classes give predictable speed at the cost of internal fragmentation (a malloc(33) may consume a 48-byte block).
Thread-local caches vs. global pools — per-thread caches eliminate contention but multiply cached memory by the thread count. tcmalloc and jemalloc balance this with periodic flush.
Returning memory to the OS — eagerly calling madvise keeps RSS low but causes future allocations to re-fault. Holding pages saves faults but bloats process footprint. Most allocators tune this with a quiescence delay.

What does `free(p)` actually do with the pointer?

It reads metadata stored just before p (typically 8-16 bytes of header containing the block size and some flags). It then computes which size-class bin or free list the block belongs to, unlinks it from any in-use list, optionally coalesces with adjacent free blocks, and pushes it onto the bin. None of this touches the OS. If you pass a pointer that wasn’t returned by malloc, the header is garbage, and you get the classic “corrupted size vs. prev_size” abort.

Common pitfalls#

Use-after-free — accessing p after free(p). The memory may still appear valid until reused, then it silently mutates under you. Detected by ASAN with quarantines.
Double-free — calling free(p) twice. Typically corrupts the allocator’s internal linked lists; libc may catch it with corrupted unsorted chunks but only after damage is done.
Memory leak — losing the only reference to a block before calling free. Long-running processes drift upward in RSS. Valgrind’s --leak-check=full finds reachable but un-freed allocations at exit.
Buffer overrun — writing past the end of a block. Often overwrites the next block’s header, producing a corruption that surfaces in a later, unrelated malloc.
Mismatched malloc / delete — in C++, you must pair malloc with free and new with delete. They go through different paths and mixing them is undefined behaviour.
Forgetting that calloc(n, m) zeroes; malloc(n*m) does not. Reading uninitialised memory is undefined; rely on calloc or memset for zero.

Address Spaces — where the heap actually lives.
Free Space Management — how the allocator tracks blocks.
Address Translation — Base and Bounds — the simplest model of how the kernel enforces the heap’s bounds.
Paging Fundamentals — why allocating one byte gives you a 4 KB physical page.
Swapping — Mechanisms — what happens when the heap is bigger than RAM.