Free Space Management
Allocator strategies — first-fit, best-fit, worst-fit, next-fit, segregated free lists, buddy, slab — and what fragmentation actually costs.
What it is#
Free space management is the data-structure and policy problem of tracking which bytes (or pages, or slabs) inside a fixed pool are currently in use and which are free, and choosing where to allocate from when a request comes in. It shows up at every layer of the memory stack:
- In
malloc/freeinside libc, managing the user-mode heap carved out ofbrk/mmapregions. - In page-level OS allocators that hand out 4 KB physical frames to kernel and user code (Linux’s buddy allocator).
- In slab allocators like Linux’s SLUB/SLAB, which sit on top of the buddy allocator and cache fixed-size kernel objects.
- In file-system free-block tracking, which is the same problem but on disk.
The core question is always: given a free pool and a request for N bytes, which hole do you give out, and how do you keep the bookkeeping data structure efficient as allocations and frees happen?
When to use it#
You need an allocator any time you have a single pool of resource you carve up for many small requests, and you can’t afford to track each byte individually. The right strategy depends on:
- Request size distribution. Uniform small (kernel objects of fixed type) → slab. Power-of-two pages → buddy. Highly variable (
malloc) → segregated free lists by size class. - Allocation / free pattern. Cohort lifetimes (request handler creates / destroys together) → arena/bump allocator. Long-tail mixed lifetimes → general-purpose allocator with bins.
- Contention. Single-threaded → simple list. Many threads → per-thread caches with periodic flush (tcmalloc, jemalloc).
- Constraints. Real-time? Avoid first-fit’s worst case; use slab. Tiny memory? Avoid power-of-two rounding; use exact-fit lists.
For a single fixed-size resource (a pool of 1024 connection objects), a free list is plenty. For variable sizes, you’re in classic free-space-management territory.
How it works#
The free list#
The simplest data structure: a linked list of free blocks. Each free block stores its size and a next pointer (in the block’s own bytes — free space is unused, so use it as bookkeeping). On malloc(N):
walk the free list, find a block big enough, split it: return N bytes, push the remainder back as a smaller free block.
on free(p): read the size from the header just before p, push the block onto the free list, optionally coalesce with the previous / next blocks if adjacent.free list before malloc(20): [10] → [50] → [30] → null │ │ split into [20] returned + [30] remainder ▼free list after: [10] → [30] → [30] → nullPlacement policies#
When multiple holes fit, which do you pick?
- First-fit — walk from the head, take the first hole big enough. Fast (
O(1)average on a well-organized list). Tends to fragment near the head over time. - Best-fit — scan the whole list, take the smallest hole that fits. Minimises leftover, but
O(n)per allocation and produces many tiny unusable fragments. - Worst-fit — take the largest hole. Counterintuitively keeps a few small holes around, but most studies show it’s the worst overall.
- Next-fit — like first-fit, but resume the walk from where the last allocation left off. Spreads allocations more evenly across the list.
In practice, all four are slow on long free lists. Real allocators use segregated structures.
Segregated free lists#
Group free blocks into bins by size class — bin i holds blocks of size in [2^i, 2^(i+1)) or similar. Allocation becomes bin = bins[size_class(N)]; pop one. Constant-time. Internal fragmentation rises because every request rounds up to a class, but throughput improves dramatically.
glibc ptmalloc2, jemalloc, tcmalloc, and mimalloc all use some flavour of segregated lists.
Buddy allocator#
The standard physical-page allocator in Linux. Memory is managed in power-of-two sized blocks (4 KB, 8 KB, 16 KB, …, up to a MAX_ORDER of typically 4 MB).
- Allocation of
Npages rounds up to the smallest power of two>= N. If a block of that size is free, take it. Otherwise, recursively split a larger block — each split creates two “buddies” of half the size. - Free finds the block’s buddy (by XOR’ing the block’s address with its size). If the buddy is also free, coalesce them into a block of twice the size, then recursively check that larger block’s buddy. Otherwise stop.
buddy split: 16 KB block at addr 0x10000 → 8 KB at 0x10000 + 8 KB at 0x12000 (buddies) → 4 KB at 0x10000 + 4 KB at 0x11000 (buddies, if 8 KB is split)Constant-time alloc/free, fast coalescing, mild internal fragmentation (~25% worst case). The cost is that you can’t allocate a 5 KB block — you get 8 KB.
Slab allocator#
For kernel objects of a fixed type (task_struct, inode, file), the buddy allocator’s per-allocation overhead and rounding are wasteful. The slab allocator caches pre-constructed objects of one type in a pool. Each cache holds:
- A list of partially-full slabs (a slab is one or more pages divided into N object-sized slots).
- A list of empty slabs (to release back to the buddy allocator under memory pressure).
- A list of full slabs.
Allocation: pop a slot from a partial slab. Free: push back. Both O(1) and per-CPU caches give zero contention on the hot path. Linux ships SLUB (Slab Unification) as the default.
Coalescing#
A freed block adjacent to free neighbours should merge with them so future big allocations find contiguous space. Coalescing requires knowing what’s immediately before and after the block — typically a doubly-linked list ordered by address, or boundary tags (a small footer at the end of each block recording its size, so the next block’s header can read the previous block’s size).
Without coalescing, an allocator may have plenty of total free space but no single hole large enough for a big request — pathological external fragmentation.
Variants#
Bump / arena allocators#
alloc(N) is pointer += N; return pointer - N. There’s no free; the whole arena is reset (pointer = start) when the cohort ends. Two-instruction allocation; no per-block overhead; perfect for request-handler memory where everything is freed together.
Used in compiler ASTs, parser trees, per-frame game allocations, Rust’s bumpalo.
Region allocators#
A generalisation of arenas where you can have multiple named regions in flight and reset each independently. Useful in long-running servers that want cohort lifetimes but have several overlapping cohorts.
Free-bitmap allocators#
Each unit of allocation (page, sector, cluster) has one bit in a bitmap: 1 = used, 0 = free. Allocation scans for a run of zeros; free clears the bit. Used by file systems (FAT, ext) and by some embedded allocators. Compact, but search is O(n / 64) even with bitmask tricks.
TLAB / per-thread caches#
Each thread holds a small private free list. Allocations and frees on the hot path never touch shared data. Periodic flush balances inventory. The reason ptmalloc2’s per-thread arenas, jemalloc’s tcaches, and the JVM’s TLABs all look similar.
Trade-offs#
Recurring tensions:
- Internal vs. external fragmentation. Internal is space wasted inside allocated blocks (size class rounding); external is space wasted between allocated blocks (holes too small to use). Allocators choose where to lose: free lists fight internal at the cost of external; buddies do the opposite.
- Latency predictability vs. memory efficiency. Bin-based allocators have constant time but waste memory. Best-fit minimises waste but has bad tail latency under fragmentation.
- Thread-local caches vs. global return. TLAB-style caches eliminate contention but strand memory on idle threads.
- Eager coalescing vs. lazy coalescing. Eager is simpler but burns CPU on every free; lazy saves CPU but produces transient external fragmentation.
What does Linux actually use under the hood?
The kernel’s page allocator is buddy. On top of that, the SLUB slab allocator caches fixed-size kernel objects with per-CPU partial-slab queues. User-space malloc is libc’s job — usually ptmalloc2 (glibc default, per-thread arenas, segregated bins). Many production stacks LD_PRELOAD jemalloc or tcmalloc instead, which behave better under high concurrency and heavy fragmentation. Three layers, three allocators — each tuned for its workload.
Common pitfalls#
- Counting only allocations, not fragmentation, when sizing memory. A process that allocated 4 GB total may consume 6 GB resident after fragmentation. RSS lies if you’re not careful.
- Mixing allocator and language.
mallocandneware different;mmapreturned memory cannot befree’d via libc. Honour the contract for each. - Long-lived small allocations next to short-lived large ones. The small ones become “fragmentation anchors” preventing the surrounding region from being returned to the OS. Periodic restart helps; arena allocation for the cohort helps more.
- Spinning on
mallocfailures. Once the allocator says no, retrying without freeing won’t help — you’ve exhausted address space or DRAM, and the nextmallocwill fail too. - Tuning
M_MMAP_THRESHOLDblindly. Lowering it makes large allocations return memory to the OS sooner but multiplies system-call overhead. - Forgetting that
reallocmay move. Areallocthat grows past the in-place limit invalidates the old pointer. Holding a pointer into the old region is a use-after-free.
Related building blocks#
- Memory API — malloc, free, and friends — the user-mode interface free-space management implements.
- Segmentation — historical mechanism that made free-space management the OS’s problem.
- Address Translation — Base and Bounds — the simpler ancestor with one chunk per process.
- Paging Fundamentals — fixed-size pages eliminated external fragmentation at the physical level.
- Address Spaces — the bigger picture for what’s being allocated.