RAID — Striping, Mirroring, Parity — Operating Systems

What it is#

RAID — Redundant Array of Independent Disks — combines several physical disks into a single logical volume that is faster, larger, or more reliable than any one disk. The OS sees one block device; the RAID layer below it (a hardware controller, the Linux md driver, or ZFS / btrfs raidz) handles striping, mirroring, and parity.

The trade space is three-dimensional: capacity (how much of the raw disk you can actually use), performance (read and write bandwidth, IOPS), and reliability (how many disks can fail before data loss). No RAID level optimises all three; each is a point on the surface defined by those axes.

When to use it#

RAID 0 (striping) — pure performance, no redundancy. Use for scratch storage, video editing, test infrastructure. One disk failure destroys the array.
RAID 1 (mirroring) — pure redundancy, simple. Use for two-disk reliability (boot drives, small databases) where you can pay 50% capacity overhead.
RAID 5 (rotating parity) — capacity-efficient single-failure tolerance. Use when reads dominate and you have 4-8 disks. Increasingly out of favour as drives get larger (see “RAID 5 is dead” below).
RAID 6 (double parity) — two-failure tolerance. Use for large arrays of >= 8 HDDs in 2026. The capacity overhead is still small (2/N).
RAID 10 (mirror of stripes) — RAID 0 over RAID 1 pairs. Use for write-heavy workloads where RAID 5/6 parity computation is the bottleneck.

In 2026, erasure coding at the file-system or object-store level (Reed-Solomon K + M codes in HDFS, Ceph, S3) has largely replaced traditional RAID for large deployments — same idea, more flexibility, often cheaper.

How it works#

Striping (RAID 0)#

The volume is divided into fixed-size chunks (typically 64 KB to 1 MB). Chunk 0 goes to disk 0, chunk 1 to disk 1, …, chunk N-1 to disk N-1, chunk N back to disk 0. Sequential reads and writes saturate all N disks in parallel.

RAID 0 (4 disks, chunk size = C)
disk 0:  chunk 0   chunk 4   chunk 8  ...
disk 1:  chunk 1   chunk 5   chunk 9  ...
disk 2:  chunk 2   chunk 6   chunk 10 ...
disk 3:  chunk 3   chunk 7   chunk 11 ...

Capacity: N * D where D is the single-disk capacity. MTTF: 1/N of a single disk — any one failure kills the array.

Mirroring (RAID 1)#

Every block is written to two (or more) disks. Reads can come from either copy; writes go to both.

Capacity: D (for a 2-disk mirror). Read throughput: 2x single disk. Write throughput: 1x (bounded by slowest disk). Tolerates one disk failure.

Parity (RAID 4 / 5 / 6)#

Computes a parity block that is the XOR of the corresponding data blocks across the stripe. If any one data block is lost, it can be reconstructed by XOR-ing the other data blocks with the parity block.

RAID 4 dedicates one disk as the parity disk. Easy to reason about; the parity disk becomes the bottleneck on writes.
RAID 5 rotates the parity block among all disks across stripes, eliminating the bottleneck. The dominant single-parity scheme.
RAID 6 uses two parity blocks per stripe, computed with Reed-Solomon codes. Tolerates two failures.

Capacity for RAID 5: (N-1) * D. RAID 6: (N-2) * D.

The math behind parity#

For a stripe with data blocks d0, d1, ..., d_{N-2} and parity p:

p = d0 XOR d1 XOR ... XOR d_{N-2}

If d_k is lost, d_k = p XOR (XOR of all other data blocks). XOR is associative and commutative, so the order doesn’t matter.

For RAID 6’s double parity, one parity is plain XOR; the second uses Galois Field arithmetic (GF(2^8)) so that any two-failure subset is recoverable. The compute cost is small on modern CPUs (hardware AES-NI and PCLMUL instructions); the I/O cost remains the dominant penalty.

Variants#

Hardware vs software RAID#

Hardware — a dedicated controller card with battery-backed cache. Fast (cache absorbs the small-write penalty), opaque (recovery on controller failure can require a matching card), expensive.
Software — Linux md, ZFS, btrfs, Storage Spaces on Windows. Cheaper, more portable, integrates with the file system. Modern CPUs are fast enough that parity compute is no longer the bottleneck.

ZFS and btrfs combine RAID with the file system, which lets them detect (via checksums) and repair silent corruption — something traditional RAID cannot do because it has no way to tell which disk’s copy is “right” when two copies disagree.

RAID 10 and other nesting#

RAID 10 (mirror of stripes) and RAID 50/60 (parity over stripes) trade capacity for performance. RAID 10 is the default for write-heavy OLTP databases because there is no parity penalty.

Erasure coding at the storage layer#

Object stores (S3, Azure Blob, Ceph) use K + M Reed-Solomon codes — split a object into K data shards, compute M parity shards, store one shard per node. Survives M concurrent node failures. More flexible than RAID 6: you can pick any (K, M) pair, balance space and durability, and spread shards across racks / availability zones.

Trade-offs#

RAID 5 — (N-1)/N capacity, single-fault tolerance, cheap. Loses big on random small writes (4x amplification) and on rebuild time. A modern 20 TB drive takes 12+ hours to rebuild; during that window the array is exposed to a second failure.

RAID 6 — (N-2)/N capacity, two-fault tolerance, higher write penalty (6x on small writes). The right choice when drives are large and rebuild windows long — basically anything >= 8 TB per drive in 2026.

Rebuild time and URE risk. “RAID 5 is dead” — a 2009 essay by ZDNet’s Robin Harris — argued that as drives grew past ~1 TB, the unrecoverable-read-error rate (typically 1 per 10^14 bits read) meant that rebuilding RAID 5 on a single failure would statistically hit another error and lose data. The math has gotten worse with 20 TB+ drives. RAID 6 buys you margin; erasure coding with more parity shards buys more.
Write hole. If the system crashes mid-stripe-update on RAID 5/6, the data and parity can become inconsistent — a silent corruption that survives the crash. Hardware controllers with battery-backed cache and software RAID with journaled metadata (or ZFS’s COW) close this hole.
Mixed-size disks. Most RAID implementations cap to the smallest disk in the array. Replacing a 10 TB drive with a 20 TB drive in a 10 TB array gives you 10 TB of usable space.

Common pitfalls#

Treating RAID as backup. It isn’t. RAID protects against disk failure, not against rm -rf, ransomware, file system corruption, or correlated failures (lightning, controller bug, vibration in a rack). You still need backups.
Running RAID 5 with large modern drives. The rebuild window on 16-20 TB drives can be longer than the MTBF you actually observe. Use RAID 6 or higher-redundancy erasure codes.
Mismatched drives causing slow performance. One slow drive (SMR firmware, dying drive doing internal retries) drags the whole RAID’s write throughput to its speed.
Forgetting to monitor degraded state. A failed disk in RAID 5/6 still serves data. Operators sometimes notice only when the second disk fails — by which point recovery is impossible.
Choosing a chunk size mismatched to workload. Tiny chunks (4 KB) mean random IO spreads across all disks (good for high IOPS, bad for sequential bandwidth). Large chunks (1 MB) mean sequential reads come from one disk at a time (good if you want N parallel sequential streams, bad if you want one stream at N-disk bandwidth).

Why do hyperscalers prefer erasure coding to RAID?

RAID is rigid — fixed levels, fixed capacity overhead, no cross-rack redundancy. Erasure codes let an object store pick (K, M) per object class (hot data 6+3, archive 10+4), spread shards across racks and AZs (correlated-failure protection RAID can’t give), and rebuild incrementally over the network without rebuilding an entire drive. The CPU cost is non-trivial but amortises over batched workloads, and it scales horizontally where RAID is locked to a single chassis.

Hard Disk Drives — the physical substrate that makes RAID’s failure model real.
Flash SSDs and the Flash Translation Layer — RAID on SSD has different failure modes (correlated wear-out).
Data Integrity — Checksums and Scrubbing — what RAID can’t detect on its own.
File System Implementation — the layer above the RAID volume.
I/O Devices and Drivers — the driver / block layer that exposes RAID as a single device.