The Fast File System (FFS)
Cylinder groups, locality policy, large-file exception — the design that made UNIX file systems orders of magnitude faster.
What it is#
The Fast File System (FFS), introduced by McKusick, Joy, Leffler, and Fabry at UC Berkeley in 1984, replaced the original UNIX file system with a disk-aware layout that delivered roughly 10x throughput on real workloads. The original UNIX file system scattered inodes and data blocks across the disk with no regard for physical geometry; FFS introduced cylinder groups, locality policies, and a large-block-with-fragments scheme that became the template for almost every subsequent disk-based file system, including ext2/3/4, XFS, and the BSD UFS family.
FFS shipped in 4.2BSD and from there into SunOS, Ultrix, and the broader UNIX ecosystem. Its design ideas — keep related metadata near related data, allocate sequentially when you can, leave room for growth — are still alive in every modern Linux file system, even where the physical “cylinder” no longer exists.
Architecture overview#
The original UNIX file system had:
- Superblock at the start of the disk.
- One big inode table after the superblock.
- All remaining blocks as data, allocated with a simple free list.
Two consequences. First, an inode and its data blocks could end up on opposite ends of the disk — reading a file’s metadata then its data cost a full-stroke seek. Second, the free list had no locality discipline; over time, related blocks scattered.
FFS organised the disk into cylinder groups — each a self-contained “mini file system” of a few cylinders’ worth of tracks:
| superblock copy | inode bitmap | data bitmap | inode table | data blocks | = one cylinder group| superblock copy | inode bitmap | data bitmap | inode table | data blocks | = next cylinder group...Each group has its own inode table, its own free-space bitmaps, and a copy of the superblock (so a corrupted superblock can be recovered from a redundant copy elsewhere). When the file system allocates an inode, it picks a cylinder group; the inode’s data blocks are then allocated in the same group, guaranteeing that the inode and its data are geographically close on disk.
Layout and allocation policy#
The heart of FFS is its allocation heuristics:
- Place a directory in a cylinder group with few directories and many free inodes. Spreads directories across the disk so a top-level directory doesn’t fill its group, while leaving free inodes nearby for the directory’s future contents.
- Place a file’s inode in the same cylinder group as its parent directory. Cousins and siblings cluster.
- Place a file’s data blocks in the same cylinder group as its inode. A
statfollowed by areaddoesn’t seek across the platter.
The large-file exception handles the obvious failure mode: a single huge file would fill its cylinder group and starve everything else in the same group. FFS’s answer:
if file size exceeds threshold: spill to a different cylinder group every N blocksThe threshold and N are tuned so that the seek penalty between groups (one seek per N MB of file) is amortised over a long sequential read, while keeping any one group from being monopolised.
Inside a cylinder group, FFS keeps a rotationally aware free-block list: when allocating the next sequential block of a file, it picks one that will be under the head after the previous block’s transfer finishes — typically a few sectors further on the track, not the immediately-next physical sector. This compensates for the latency of issuing the next I/O command.
Blocks and fragments#
The other big FFS innovation was large blocks with fragments:
- Block size up from
512 bytes(original UNIX) to4 KBor8 KB. Big sequential reads got8xmore bandwidth per inode pointer dereferenced. - But: small files would waste up to one whole block. With a
4 KBblock and average file size of~2 KB, half the disk would be wasted. - Solution: each block can be split into
1 KBfragments for the last portion of a file. A 3 KB file uses three fragments (one 4-KB block split); the unused fragment is available for the tails of other files.
Read and write paths#
Reading /home/alice/notes.txt on cold cache:
- Read root directory’s inode (cylinder group 0, fixed inode number 2 in classic UNIX).
- Read root directory block to find
home’s inode number. - Read
home’s inode — by FFS policy, this is in a group with few directories. - Read
home’s directory block, findalice. - Read
alice’s inode — same group ashome’s inode. - Read
alice’s directory block, findnotes.txt. - Read
notes.txt’s inode — same group. - Read
notes.txt’s data blocks — same group, contiguous.
Because each step’s target is geographically near the previous step’s, the arm moves over a tiny fraction of the disk during a single open() walk. Compared to the original UNIX file system, which would seek across the disk multiple times for the same walk, the speedup is roughly 10x in practice.
Writing a new file follows the same logic in reverse. Allocate the inode in the parent’s cylinder group, write data blocks in the same group, update bitmaps in the group, mark the directory entry. All four updates touch nearby disk regions.
Crash recovery#
FFS predated journaling. Recovery after a crash was the responsibility of fsck, which walked the entire on-disk structure looking for inconsistencies between inodes, bitmaps, and directory entries (covered in Crash Consistency — fsck and Journaling). The redundant per-group superblock copies were the main concession to robustness — a single corrupted superblock didn’t kill the file system.
Later, soft updates (Ganger, Patt; integrated into FreeBSD UFS by McKusick himself) added carefully-ordered writes so that any crash left the file system in a recoverable state. Soft updates kept the FFS layout but removed the need for a long full-disk fsck — recovery became a background pass to clean up leaked resources, while the file system mounted immediately.
The lineage matters: ext2 (1993) cloned FFS’s layout without crash protection; ext3 (2001) added journaling on top of an ext2-shaped layout; ext4 (2008) refined the layout with extents and delayed allocation. Each generation kept the cylinder-group idea, even as the term shifted to “block group” and the underlying disk no longer exposed cylinders.
Operational characteristics#
- Capacity overhead. Cylinder groups duplicated the superblock and reserved space for inode tables per group — a few percent of total capacity. Worth it for the redundancy.
- Free-space reserve. FFS reserved
~10%of each group’s data space and refused to allocate into the reserve unless the user was root. This kept allocations from being forced into pathological “every block is in the last remaining group” patterns. Modern Linux file systems inherit a similar reserve (tune2fs -m). - Mature on real disks. FFS was tuned for the specific cost model of
1980shard disks — slow seeks, fast transfer, rotational latency dominating small-IO. The tuning aged well as disks got bigger (the model stayed the same, just scaled). - Predictable. Workloads behaved consistently because the layout didn’t drift much over time — files allocated together stayed together, in contrast to log-structured file systems where the layout depends on the GC history.
Trade-offs and gotchas#
- Aging. Over years, deletions create holes; a once-contiguous file system fragments. FFS aging is real —
dumpe2fson a 10-year-old ext4 file system will show many small free extents where they used to be large. Defragmentation tools exist; most users never run them. - The cylinder is gone. Modern HDDs don’t expose physical geometry; SSDs don’t have geometry at all. Cylinder groups are a logical abstraction now — groups of “nearby” LBAs by allocation policy. The original rotational-position heuristic is dead weight on SSDs and has been disabled in ext4 for years.
- Small-file overhead. Even with fragments, FFS used
~256bytes of inode per file. A tree of millions of tiny files burned space and inode-table cache. Modern file systems with inline data (storing tiny file contents inside the inode itself) reduce this; FFS didn’t. - No checksums. FFS trusted the disk. Silent corruption is invisible — see Data Integrity — Checksums and Scrubbing for what modern file systems added.
Why was FFS so much faster than the original UNIX file system?
The original UNIX file system had block size 512 bytes and scattered inodes across a single global inode table at one end of the disk. A typical file access required multiple seeks — one to the inode table, one or more to the data blocks (which could be anywhere). McKusick’s measurements showed the original FS getting ~2% of raw disk bandwidth on real workloads. FFS, by clustering and using 4-8 KB blocks, hit ~25-50% of raw bandwidth — an order of magnitude. The two ideas — keep related things close, batch bigger I/Os — together explain the entire speedup.
Related systems#
- File System Implementation — the general structures FFS instantiated.
- Log-Structured File System (LFS) — the alternative design that came out of the same Berkeley group.
- Files and Directories — the abstraction FFS preserved.
- Crash Consistency — fsck and Journaling — what FFS’s update-in-place required.
- Hard Disk Drives — the physical substrate FFS was tuned for.