Segmentation

Generalised base/bounds per code/heap/stack segment, sharing across processes, fragmentation as the recurring cost.

Building Block Intermediate
7 min read
segmentation segments sharing fragmentation x86

What it is#

Segmentation generalises base-and-bounds from one (base, bounds) pair per process to multiple pairs — typically one each for code, heap, and stack. Each segment has its own base address in physical memory, its own length, and its own protection bits (read / write / execute). The MMU picks the right pair based on which part of the virtual address space the access targets and translates accordingly.

The high bits of the virtual address (or an explicit segment selector) name the segment; the low bits are the offset within that segment. Translation becomes:

virtual addr → [segment number | offset]
→ base[seg] + offset, check offset < bounds[seg]
→ physical address

The wins over a single (base, bounds): the stack and heap can grow independently (they’re separate segments), code can be shared read-only across processes (point two processes’ code segment at the same physical region), and unused virtual ranges between segments cost nothing in DRAM.

When to use it#

Pure segmentation is mostly historical — Multics, early UNIX on PDP-11, the original x86 protected mode, Burroughs B5000. Modern OSes use paging as the primary mechanism. Where segmentation still shows up:

  • x86 segmentation in 32-bit mode is used by Linux and Windows for thread-local storage via the fs / gs segment registers. In 64-bit mode segmentation is mostly disabled (cs, ds, es, ss have base=0, no bounds) except for fs/gs, which retain a base and are used for per-CPU kernel data and TLS.
  • Combined segmentation + paging (Multics, early VAX VMS) — segments define the high-level layout; each segment is then paged. Gave you sharing and growth at the segment level plus fine-grained on-demand allocation at the page level.
  • Capability machines (CHERI, IBM System/38) implement segment-like per-pointer bounds in hardware.

The intuition is worth understanding even if the mechanism is dead: segmentation is the cleanest way to express “different regions have different sizes, growth rates, and permissions.”

How it works#

Selecting a segment#

There are two common designs:

  1. Top bits of the virtual address. The high 2 or 3 bits index a small segment table (e.g., 00=code, 01=heap, 11=stack). The remaining bits are the offset within the segment. Common in early MMUs because no extra register is needed.
  2. Explicit segment register. Each instruction implicitly references a segment register (cs for code fetches, ds for data loads, ss for stack). The MMU uses that register’s base + bounds. x86 takes this approach.

In both cases the segment table (or its in-CPU register cache) holds, per segment: base, length/limit, and protection bits.

Translation flow#

virtual addr: | segno (2-3 bits) | offset |
│ table[segno] → (base, limit, prot)
check: offset < limit AND access matches prot
physical = base + offset

Stack grows down#

The stack segment grows toward lower addresses. The MMU has to know this — instead of offset < limit, it checks offset >= (segment_size - limit). A direction bit on the segment table entry tells the hardware which way to check.

Sharing#

Two processes can point their code segments at the same physical region with read-only protection. Each process sees the code at its own virtual location, but the bytes physically exist once. This is how segmented systems implemented shared libraries before paging gave the same effect at page granularity. Sharing the heap or stack is rarely sensible because both are writable and process-specific.

A worked example#

process P, segments:
code : base=0x4000, limit=0x1000, prot=r-x
heap : base=0xc000, limit=0x4000, prot=rw-, grows up
stack: base=0x20000, limit=0x2000, prot=rw-, grows down
virtual address 0x000600 → segno=0 (code), offset=0x600
→ 0x600 < 0x1000 OK
→ physical = 0x4000 + 0x600 = 0x4600 ✓
virtual address 0x80FFFF → segno=2 (stack), offset=0xFFFF
→ stack grows down; check offset >= (max - limit)
→ fail → bounds fault

Variants#

Pure segmentation#

The textbook design above: variable-size segments, OS tracks free physical ranges, segments placed contiguously. Used by early UNIX on the PDP-11 (which had 8 KB segments) and the original Intel 8086 (which had 64 KB segments selected by cs/ds/ss/es).

Segmentation with paging#

Combine: top bits of the virtual address pick a segment; the segment table entry points to a per-segment page table. You get sharing and growth at the segment granularity, plus fine-grained physical allocation. Used by Multics, the VAX VMS architecture, and (in spirit) the i386’s segmentation-over-paging design.

Segmentation with single segment per kind#

x86 has six segment registers but Linux essentially uses only fs and gs (with base, no limit) for TLS and per-CPU data. Everything else has base=0, no limit, no protection — segmentation is reduced to addition-of-zero, leaving paging to do all the real work.

CHERI / capability segmentation#

CHERI extends every pointer to a 128-bit capability containing (base, length, permissions, type). Each pointer carries its own segment metadata. The hardware enforces the bounds on every dereference. This is segmentation taken to per-pointer granularity, with applications in safe C / C++ and confidential computing.

Trade-offs#

Segmentation — variable-size regions match program structure, growth in either direction is natural, sharing of code/data segments is straightforward, protection bits per logical region. Hardware is cheap (table per process, few entries).
Paging — fixed-size pages eliminate external fragmentation, any frame holds any page, sparse virtual spaces are free, demand paging and swap are natural. Hardware is heavier (multi-level walks, TLB, shootdown).

The recurring critique of pure segmentation:

  • External fragmentation, again. Like base-and-bounds, segments are variable-size and contiguous. As processes load and unload, physical memory accumulates holes that are too small individually but huge in aggregate. Compaction works but is expensive.
  • Stack and heap growth still has to find a contiguous physical extension. Even with separate segments, you can’t grow a heap if the segment immediately after it is occupied.
  • Granularity mismatch with the page cache. Page cache, swap, and DMA all want fixed-size units. Variable segments don’t fit into 4 KB I/O.
  • No demand loading of partial segments. You either map the whole segment or none of it. Paging gives you “load only the touched pages from disk” naturally; pure segmentation can’t.

The combined segmentation + paging design (Multics, VAX) fixed several of these but added a level of indirection — and the simplicity of “everything is a page” eventually won.

Why did x86 keep segmentation around in 32-bit mode?

Historical compatibility. The 8086 had only segmentation (no paging), the 286 added protection bits to segmentation, the 386 added paging on top of segmentation. Each successor had to remain backward-compatible. Linux on 32-bit x86 mostly disabled segmentation by setting base=0 and limit=4 GB on every segment, then relied on paging — but the segment registers stayed in the silicon because removing them would break legacy code. 64-bit mode finally simplified by forcing base=0 / no-limit on cs/ds/es/ss, retaining only fs/gs as useful relocation pointers.

Common pitfalls#

  • Treating segments as the same shape as pages. They aren’t — segments are variable-size and few in number; pages are fixed and there are millions.
  • Forgetting stack direction in bounds checks. A grow-down segment with a normal offset < limit check passes accesses that should fault.
  • Assuming x86 segmentation is fully active in 64-bit mode. Most segments have base=0 / no-limit in long mode; using a non-zero cs.base is illegal. Only fs and gs retain a base.
  • Conflating segments with shared libraries. A .so file maps as several segments (code, data, bss), but on modern Linux those are paged regions, not hardware segments.
  • Implementing growth as “extend the segment in place.” Without physical compaction, you almost always have to copy the segment to a larger hole. Most real systems avoided this by oversizing segments at creation.
Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.