OS Design Goals
Abstractions, low overhead, protection and isolation, reliability. The criteria every kernel decision is weighed against.
Summary#
Every kernel decision — from how a system call is dispatched to which page-replacement policy ships — is weighed against four design goals: build clean abstractions, keep overhead low, enforce protection and isolation, and stay reliable under load and failure. Those four pull in different directions. A richer abstraction means more code paths to defend; tighter isolation means more boundary crossings to pay for; faster paths mean fewer checks. There is no single right answer, only a coherent set of trade-offs.
The goals are also the rubric an interviewer uses when probing your design judgement. “Why is the file descriptor an int?” is a question about abstraction simplicity. “Why does Linux use the page cache?” is about overhead. “Why does each process get its own address space?” is about isolation. Naming the goal first turns each answer into a principle rather than a memorised fact.
Why it matters#
The reason OS design feels mature today is that decades of work have produced abstractions that hide enormous complexity behind a small number of shapes — files, processes, sockets, page tables. A new kernel author who ignores those shapes spends years rediscovering why they exist. Conversely, a kernel that bolts on features without weighing them against the goals accumulates the kind of internal mess that eventually requires a rewrite — see the long arc of Windows 9x or classic Mac OS.
The cost discipline matters because OS code runs on the critical path of every program. A pointless 200-cycle check inside read is a 200-cycle tax on every database query and every web request on the planet. That is why kernel engineers obsess over things that look trivial — cache line layout, branch prediction friendliness, the size of a structure — and why “premature optimization” rules don’t quite apply at this layer.
How it works#
Abstractions#
The kernel’s job is to present a small set of stable shapes that compose well: a process, a file descriptor, an address space, a thread, a signal, a socket. Each abstraction hides hardware variety — a file descriptor reads from a disk, a pipe, a socket, a device, a memory-mapped region, all with the same syscall surface. Good abstractions are orthogonal (composable without surprise), uniform (the same operation behaves the same way across instances), and opaque (the implementation can change without breaking callers). UNIX got mileage out of “everything is a file” precisely because that one abstraction subsumed dozens of device interfaces.
Low overhead#
The cost of an abstraction is measured in cycles per use. A system call costs hundreds of cycles even on the fast path; a context switch costs thousands; a page fault costs tens of thousands. The kernel pays those whether or not the user code does anything interesting, so the kernel claws back performance with caching (page cache, dcache, inode cache, TLB), batching (writev, io_uring submissions), and avoiding work (vDSO for gettimeofday, copy-on-write for fork). The recurring pattern is “make the common case zero-cost, charge the rare case.”
Protection and isolation#
User code must not be able to corrupt the kernel, other processes, or hardware it shouldn’t touch. The mechanisms are hardware-supplied — privilege levels (ring 0 vs ring 3 on x86, EL0/EL1 on ARM), an MMU that enforces per-process address spaces, an IOMMU for device DMA — and the kernel uses them to enforce policy. Isolation goes both ways: the kernel must defend itself from user input (every syscall argument is untrusted), and processes must be defended from each other (signals, file permissions, namespaces, capabilities).
Reliability#
A general-purpose OS runs for months or years without rebooting. It must tolerate misbehaving user code, flaky hardware, partial writes, network outages, and resource exhaustion. The techniques are layered: defensive coding in every subsystem, journaling for crash consistency, watchdog timers for runaway interrupt handlers, OOM killers for memory exhaustion, panic-and-recover paths for unrecoverable errors. The bar that distinguishes a research kernel from a production one is mostly reliability under adversarial conditions.
Variants and trade-offs#
The four goals create predictable conflicts:
- Abstraction vs. overhead. Every layer of indirection costs cycles. The page cache adds a memory copy on every read;
io_uringexists because that copy and the syscall around it became measurable on fast NVMe. - Isolation vs. overhead. Crossing the user-kernel boundary is expensive; that is why VDSO exists, why io_uring batches submissions, and why microkernels are slower than monolithic ones.
- Reliability vs. abstraction. A fancier interface has more code paths to test, more states to corrupt, more invariants to maintain. The simplest interface that does the job is usually the most reliable.
Why 'everything is a file' won and 'everything is an object' didn't
UNIX’s file model gave you read / write / open / close / ioctl — five verbs that worked across every device. Object-oriented kernels (BeOS, the original NeXTSTEP, research projects like Spring) tried to expose richer interfaces per device class. The lesson the industry took is that five verbs everyone knows beats fifty verbs nobody remembers. Linux still occasionally drifts back toward object-thinking (netlink, ioctl explosion) and pays for it in API churn and security CVEs.
A second axis is generality vs. specialisation. A desktop Linux kernel tries to be reasonable across web browsers, databases, video games, and embedded controllers; a real-time kernel like VxWorks gives up that generality and gets bounded latency in return. The further you specialise (unikernels, hypervisors-as-OS, OS-bypass networking via DPDK / SPDK), the more performance you can claw back — at the cost of running only the workloads you designed for.
When this is asked in interviews#
Rarely as a direct question; almost always as an implicit framing the interviewer wants you to use. When you’re asked “design a system call for X” or “should the kernel cache Y,” the strong answers walk through the four goals out loud — “the abstraction here is fd-shaped, overhead is dominated by the copy, isolation needs the bounds check, the failure mode is a partial write” — instead of jumping to an implementation.
The follow-ups divide by seniority:
- “Why is
fdan integer and not a pointer?” Tests whether you can articulate the isolation argument. Foundational. - “What’s the overhead of a system call today vs. 20 years ago?” Tests whether you know about SYSCALL/SYSRET, kernel-page-table isolation, Meltdown/Spectre mitigations. Mid-level.
- “Where would you draw the user/kernel line for io_uring?” Tests judgement about which goals to prioritise when they conflict. Senior.
- “Design an OS for a 1024-core machine — which goals shift?” Tests whether you can re-weight the rubric under new hardware. Staff and above.
Related concepts#