Persistence
I/O devices, hard drives, RAID, the UNIX file system interface, file-system implementation, FFS, journaling, LFS, SSDs.
Persistence is everything below the system call — the disk, the file system, the consistency model that survives a crash. The mental model is layered: device → driver → block layer → file system → user. Each layer has a clear contract and a clear failure mode.
For interviews, know inodes, journaling (because every modern FS does some form of it), and at least the rough shape of how SSDs differ from spinning disks. The Linux ext4 / XFS / btrfs / ZFS landscape is interesting but the fundamentals are what get tested.
Key concepts
- Files are sequences of bytes; the FS imposes structure on top
- Inodes hold metadata + pointers to data blocks; directories are just files with name→inode mappings
- Crash consistency requires either fsck (slow) or journaling (the modern default)
- SSDs require erase-before-write; the FTL hides that behind a block-device interface
- RAID trades drive capacity for redundancy; parity-based schemes (RAID 5/6) are common in storage arrays
Reference template
// Read path for one byte
1. read(fd, buf, 1) → kernel
2. Lookup VFS / file entry → resolves to inode
3. Compute block address → inode walk
4. Block in page cache? → return
5. Block on disk? → schedule I/O, sleep
6. Disk interrupt → wake the process, finish copy
7. Update access time → journaled if metadata journaling on Adapt to your problem; the structure is the load-bearing part.
Common pitfalls
- Treating
write()as durable — it's not untilfsync()returns - Forgetting that the page cache is between you and the disk on every read and write
- Trusting RAID as backup — RAID handles drive failure, not user error or corruption
- Ignoring SSD wear leveling — write-heavy patterns degrade endurance
Related topics
Items (11)
- I/O Devices and Drivers
The canonical device protocol, interrupts vs polling, DMA, memory-mapped I/O, and the driver as an OS abstraction.
Building Block Foundational - Hard Disk Drives
Geometry, seek + rotational latency + transfer time, disk scheduling (SSTF, SCAN, C-SCAN), and the I/O-cost math.
Building Block Foundational - RAID — Striping, Mirroring, Parity
RAID 0/1/4/5/6 — capacity, performance, reliability trade-offs and where parity-based schemes break down.
Building Block Intermediate - Files and Directories
The UNIX file abstraction, descriptors, inodes, hard vs symbolic links, fsync, mount, and the permission-bit model.
Building Block Foundational - File System Implementation
Inode + data-block layout, free-space tracking, directory structures, access paths, caching, and the I/O cost per system call.
Building Block Intermediate - The Fast File System (FFS)
Cylinder groups, locality policy, large-file exception — the design that made UNIX file systems orders of magnitude faster.
System Intermediate - Crash Consistency — fsck and Journaling
Write-ahead logging, metadata-only vs full journaling, ordered mode, soft updates, and why fsck stopped scaling.
Building Block Intermediate - Log-Structured File System (LFS)
Sequential writes, segments, inode maps, garbage collection — the design that influenced flash file systems and modern databases.
System Advanced - Flash SSDs and the Flash Translation Layer
Cells, banks, planes; erase-before-write; the FTL log-structured mapping; garbage collection; wear leveling; trim.
Building Block Advanced - Data Integrity — Checksums and Scrubbing
Latent sector errors, silent corruption, checksums, mismatched-write protection, periodic scrubbing — bit-rot defense.
Building Block Intermediate - GitLab 2017 — The Database Outage
How a 'wrong terminal' rm on a primary led to ~6 hours of data loss; backups that didn't work; the public postmortem.
Postmortem Foundational