The Process Abstraction — Operating Systems

Summary#

A process is the OS’s abstraction for a running program. From the user’s point of view it has a private CPU, a private memory, a handful of open files, and a unique identity (the PID). From the kernel’s point of view it is a PCB (process control block) — a single struct that records everything needed to suspend the process, schedule something else, and later resume it exactly where it left off. The trick that makes virtualization work is that the kernel can save and restore the entire PCB across a context switch, and the process never knows the gap happened.

A process moves through a small set of states — running on a CPU, ready to run but waiting for a CPU, or blocked on I/O or a synchronization primitive — and transitions between them happen because of syscalls, interrupts, or scheduler decisions. The state machine is small enough to draw on a napkin and is the bedrock vocabulary for everything in CPU virtualization.

Why it matters#

The process abstraction is what lets you run a web browser, a shell, a database, and a music player on one CPU without them stepping on each other. It is also the unit of accounting (which process used how much CPU and memory), the unit of isolation (one process’s bug doesn’t corrupt another’s memory), and the unit of resource control (Linux cgroups, ulimits, and the OOM killer all act on processes).

Understanding the PCB is what separates “knows there are processes” from “knows what a process is.” Almost every kernel question — fork, exec, wait, threading, scheduling, signals, memory layout, file descriptors — touches the PCB. When you can answer “what’s in the PCB and what changes during X” for any X, you have working fluency.

How it works#

What’s in the address space#

Each process has its own virtual address space, conventionally laid out as:

Text — the read-only program code.
Data / BSS — global variables, initialised and zero-initialised.
Heap — grown upward by brk / mmap (via the user-space allocator).
Memory-mapped regions — shared libraries (libc.so), mmap’d files, anonymous mappings.
Stack — grown downward, one per thread; holds local variables and return addresses.
Kernel region — mapped into every process’s address space at the top, only accessible in kernel mode.

The kernel never has to copy a program’s bytes around to virtualize the CPU; it switches the page-table base register (cr3 on x86), and the same virtual addresses now resolve to a different process’s pages.

What’s in the PCB#

Linux’s task_struct is famously large (~1.5 KB), but the essentials are:

Identity — PID, PGID, TGID, UID/GID, namespace pointers, parent pointer.
Saved register state — when the process isn’t running, its general-purpose registers, instruction pointer, stack pointer, and floating-point state live here.
Memory map — pointer to mm_struct, which holds the page-table root and the list of VMAs (virtual memory areas).
File table — array of open file descriptors pointing into the system-wide open-file table.
Signal state — pending signals, blocked signals, handler table.
Scheduling state — current state (R/S/D/T/Z), priority, nice, cgroup, scheduler class.
Kernel stack — every process has its own small kernel stack used while running in kernel mode.

The state machine#

       ─── created ──► READY
                        │ ▲
            schedule    │ │ preempt / yield
                        ▼ │
                      RUNNING
                       │  │
                       │  └─ blocking syscall / wait ──► BLOCKED ── I/O done / wakeup ──► READY
                       │
                       └── exit ──► ZOMBIE ── reaped ──► gone

RUNNING means actually executing on some CPU. READY means runnable but waiting for the scheduler to pick it. BLOCKED (Linux: S for interruptible, D for uninterruptible) means waiting on I/O, a lock, or a wait-queue. ZOMBIE is a terminated process whose exit status hasn’t been reaped by its parent — it consumes only a PCB entry, no memory or CPU.

What happens on each transition#

fork allocates a new PCB and clones the parent’s state — duplicating registers, the file table (refcount-bumped), and the memory map (copy-on-write for pages). The child enters READY. exec replaces the address space, leaving the PCB identity intact. A blocking syscall (read on an empty socket, wait with no done children) transitions the process from RUNNING to BLOCKED and parks it on a wait queue; a wakeup (I/O completion, signal, child exit) moves it back to READY.

Variants and trade-offs#

Heavyweight process — full address space, full file table, full signal state. Strong isolation: one crash doesn’t affect siblings. Expensive to create (fork even with COW costs page-table copies and PCB allocation) and expensive to context-switch (TLB flush on address-space change). The UNIX default since 1969.

Lightweight thread — same address space as siblings, shared file table, separate stack and register state. Cheap to create (no page-table work). Cheap to switch (no TLB flush). Pays for it in isolation — one thread’s bug corrupts the others, and shared state needs synchronization. The model behind every modern application server.

Specific axes of variation:

Heavyweight vs. lightweight. Real systems use both. Chrome uses processes for isolation between tabs and threads within each tab for parallelism. Databases use threads within a server process. The choice is about how much isolation you need vs. how much you’ll pay to get it.
Address-space inheritance. UNIX duplicates on fork; Windows starts fresh on CreateProcess. Both work, but fork plus copy-on-write is what made UNIX shells composable.
Process groups, sessions, namespaces. A process isn’t an isolated atom — it’s part of a group (signals), a session (controlling terminal), a cgroup (resource caps), and on Linux a set of namespaces (PID, network, mount, UTS, IPC, user). Containers are just “processes in a different set of namespaces.”

Why does Linux call its PCB 'task_struct' and not 'process'?

Because Linux unified processes and threads under one struct. A “task” is a schedulable entity; whether it shares its address space and file table with others determines whether userspace calls it a thread or a process. The kernel doesn’t really distinguish — clone() with CLONE_THREAD produces a task that shares everything, clone() with nothing shared produces a task that shares nothing (i.e. a process). One scheduler, one data structure, two names depending on the flags.

A subtle trade-off: the kernel-stack-per-task model (Linux, BSD) gives each task a dedicated kernel stack, which makes syscalls and signal handling clean but uses memory (8 KB per task × millions of tasks adds up on huge servers). Some systems experiment with shared kernel stacks for short-lived tasks; nobody has shipped this at scale.

When this is asked in interviews#

Constantly. “What is a process?” is a screen filter; the strong answer mentions address space, PCB, state machine, isolation, and PID. The weak answer says “a running program” and stops.

Follow-ups:

“Draw the process state diagram and label the transitions.” Foundational.
“What’s in the PCB?” Tests whether you’ve actually looked at kernel data structures. Mid-level.
“What’s the difference between a process and a thread?” Tests whether you understand it’s about sharing flags, not a fundamentally different mechanism. Mid-level.
“What happens to file descriptors when you fork? When you exec?” Tests UNIX fluency. Mid-level.
“How does a zombie process get cleaned up? What if init dies?” Tests edge cases. Senior.
“Walk through what a Linux namespace adds to the PCB.” Tests containers fluency. Senior.

A second context: systems-design questions where you’re asked how many processes a particular service should use vs. threads vs. async tasks. The strong answer reaches for the isolation / cost / sharing axes from the trade-offs section, not a memorised recipe.