Resource Estimation — Worked Examples — System Design

Summary#

Four worked back-of-envelope estimations: Twitter feed reads, YouTube ingest, WhatsApp messaging, and a web search index. Each runs the same chain — define a user model, derive QPS, translate to bytes/sec and storage, sanity-check against known orders of magnitude — and ends with the one number that decides the design. The goal is to make the mental motion repeatable so that in an interview you don’t get stuck arguing whether the right multiplier is 1.5x or 2x.

Why it matters#

Capacity numbers are not the answer to a system-design question; they are the constraint generator. The math tells you which problems are real (we need to shard) and which are imaginary (a single database is fine). The same architecture can be right or wrong depending on the inputs — without running the numbers, you can’t know which side of the threshold you’re on.

A more practical reason: candidates who skip estimation get graded as if the design is built on guesses. Candidates who show the conversion chain (DAU → QPS → bytes → cost) get credit for reasoning even when the final number is off by 2x. Interviewers care about the chain more than the digits.

How it works#

The mechanical loop is always:

Pick a user model (DAU, sessions/day, actions/session).
Convert to events/sec averaged over a day; multiply by a peak factor (3x–5x is canonical).
Translate each event to a per-event byte cost.
Multiply through for bytes/sec ingress, bytes/sec egress, total stored per year.
Sanity check: does the answer fit on a known fleet shape? “Tens of TB/year” lives on one MySQL; “tens of PB/year” requires a dedicated lake.

Round aggressively. The mental rule: “100 K seconds in a day” (actually 86 400; the 16% margin is in everyone’s favor). “1 day a year” rounds to “1 / 365 ≈ 1/400 of annual traffic per day”. Everything in base-10 powers; if your math has three significant figures you’re working too hard.

Example 1 — Twitter feed reads#

Assumption set:

DAU: 300 M.
Per-user feed loads/day: 20 (pull-to-refresh, open app, switch tabs).
Average feed page: 20 posts. Per post: 1 KB metadata + ~50 KB attached media (most posts have a thumbnail).
Posts written per user per day: 0.5 (most users lurk).

Reads (the dominant axis):

Feed loads/day: 300 M × 20 = 6 B/day.
Average reads/sec: 6 B ÷ 86 400 ≈ 70 K/sec. Round to 100 K/sec sustained.
Peak (evening, US+EU overlap): ~3x → 300 K/sec.

Bytes returned per feed load:

20 posts × 1 KB metadata = 20 KB. Plus thumbnails not always inlined, often referenced via CDN — assume 5 KB inline budget for the API response.
25 KB per response × 100 K/sec = 2.5 GB/sec egress for the JSON tier alone. Round up to ~3 GB/sec sustained, ~10 GB/sec peak. The media itself goes via CDN, off-budget for the application API tier.

Writes:

Posts/day: 300 M × 0.5 = 150 M. ≈ 2 K/sec sustained, 6 K/sec peak.

Storage growth:

150 M posts/day × ~1 KB metadata = 150 GB/day metadata → ~55 TB/year for post text.
Plus media: 150 M × 100 KB average attached image / video thumbnail = 15 TB/day → 5 PB/year. Media dominates.

The one number that matters: the read-to-write ratio is roughly 100 K/sec : 2 K/sec ≈ 50:1. That ratio is what justifies a heavy cache layer and tells you write-time-fan-out (pushing posts into precomputed timelines) makes the read side O(1). If the ratio were 1:1 the architecture is completely different — pull at read time would be fine.

Example 2 — YouTube ingest#

Assumption set:

Creators uploading per day: 50 M.
Average uploads per creator per day: 0.1 (most creators upload a few times per month, heavy creators daily).
Total uploads/day: roughly 5 M (matches public-disclosure ballparks).
Average upload duration: 5 minutes.
Source bitrate: ~5 Mbps (1080p ingest, post-compression).

Upload rate:

5 M uploads/day ÷ 86 400 ≈ 60 uploads/sec average, ~200/sec peak.

Ingress bandwidth:

Each upload: 5 min × 5 Mbps = 1.5 Gb = ~200 MB.
Average ingress: 60 × 200 MB / sec ≈ 12 GB/sec average, ~40 GB/sec peak. That’s ~100 Gbps of sustained upload, which is real-world plausible.
A single 10 G NIC saturates at ~1.25 GB/sec; we need on the order of 30+ active ingest hosts at peak.

Transcode ladder:

Each source video is transcoded to ~6 variants (144p, 360p, 480p, 720p, 1080p, plus codecs). Aggregate output is ~2.5x the source bytes.
200 MB × 2.5 = 500 MB stored per upload.
5 M uploads × 500 MB = 2.5 PB/day of physical storage → ~900 PB/year, growing.

Compute for transcoding:

Modern hardware transcodes 1 minute of 1080p in ~30 s on a CPU box, near-real-time on a GPU.
5 M videos × 5 minutes = 25 M video-minutes/day. At 0.5 minutes of CPU per video-minute on GPU = 12.5 M GPU-minutes/day = ~210 K GPU-hours/day.
At a GPU’s nominal throughput, that’s ~9 K GPUs running 24/7 just for transcoding. (Plus heavy compression / codec experiments, plus thumbnail generation, plus content-ID scans — the real number is multiples of this.)

The one number that matters: storage growth (~1 PB/day) and transcode compute (~10 K GPUs). The shape of the architecture follows: object-store-tier for the bytes, dedicated transcode fleet behind a queue, CDN with regional fill from origin. The 60 uploads/sec by itself is unimpressive; the byte volume is the whole problem.

Example 3 — WhatsApp messaging#

Assumption set:

MAU: 2 B, DAU: ~1.5 B.
Messages per active user per day: 40 (1:1 chats + group chatter, heavy outliers averaged in).
Group ratio: ~2.5x fanout multiplier (average group is small but groups are common).
Bytes per message: 100 B for text (encrypted overhead + envelope) + small fraction with media. Assume 200 B average wire size for the message-bus payload.

Message rate:

Messages/day: 1.5 B × 40 = 60 B/day.
Average: 60 B ÷ 86 400 = ~700 K msgs/sec, peak ~3 M/sec.
Deliveries after group fanout: 60 B × 2.5 = 150 B/day → ~1.7 M deliveries/sec average, ~7 M/sec peak.

Bandwidth (wire):

1.7 M × 200 B = 340 MB/sec average small-message traffic. Trivial in absolute terms; the cost lives in the connection density, not the byte count.

Concurrent connections:

DAU 1.5 B; reasonable concurrency assumption is ~25% online at peak → ~400 M concurrent sockets.
At, say, 100 K sockets per gateway box, that’s 4 000 gateway hosts at peak. The single biggest line item in WhatsApp’s historic infrastructure stories — connection density, not message rate, drove the design.

Push notifications:

Roughly 50% of delivered messages reach an offline recipient → ~850 K pushes/sec average.
APNs and FCM dispatch is the actual rate-limited resource; you negotiate per-app quotas to handle the peak.

Media:

5% of messages have media at ~200 KB average → 1.7 M × 5% × 200 KB = 17 GB/sec of media bandwidth. Goes through a separate CDN with end-to-end encryption.

The one number that matters: 400 M concurrent sockets. The whole architecture (stateful gateway tier, kernel tuning for C10M-per-box, separate ephemeral signal bus from durable storage) follows from this single number. If the design started from “700 K msgs/sec” you’d build it wrong — the message rate alone would suggest you could run it on a normal app tier.

Example 4 — A web search index#

Assumption set:

Indexable web: ~50 B pages (crawlable, deduplicated).
Average page size: ~100 KB raw HTML, ~10 KB compressed text after extraction.
Tokens per page: ~1 000 unique terms.
Index posting list: per (term, doc) entry ~16 B (doc_id 8 B + positions delta-compressed to ~8 B amortized).
Refresh: 1 B pages re-crawled per day (high-priority slice); long tail much less often.

Crawl bandwidth:

1 B pages/day × 100 KB = 100 TB/day raw → ~1 GB/sec ingress, peaks ~3 GB/sec.
Polite crawl across the open web; per-domain rate-limited.

Storage:

Full corpus raw HTML: 50 B × 100 KB = 5 EB if kept verbatim. Most of it isn’t — compressed text + stripped boilerplate is more like 50 B × 10 KB = 500 PB. Still enormous, lives in cold storage with hot tier for recent and high-priority docs.
Forward index (doc → terms): 50 B × ~10 KB compressed = ~500 PB. Effectively the same as the compressed text store.
Inverted index (term → [doc_ids]): each page contributes ~1 000 (term, doc) tuples; 50 B × 1 000 × 16 B = 800 TB compressed inverted index. Fits in a sharded fleet with replication; the inverted index is the hot serving structure.

Query QPS:

Public-disclosure-ish: ~5 B searches/day → ~60 K searches/sec average, ~200 K/sec peak.
Each query touches a handful of shards. With the index spread across, say, 1 000 shards via term-hashing or document-partitioning, each shard serves ~200 qps average — comfortable per box.

Per-query work:

Top-K retrieval across shards in parallel; merge top-K at the root. Each shard scans ~100 K postings per term in budget; multi-term queries fan out to a few terms per shard.
Total compute per query: ~10–30 ms on a properly-tuned index, dominated by I/O when postings aren’t in RAM.

Index refresh:

1 B pages/day means the index has churn of ~2% per day. We use a tiered architecture: a slow base index (rebuilt weekly), a daily delta layer, and a real-time tier for news. Queries merge across all three at serve time.

The one number that matters: the inverted-index byte volume (~800 TB) combined with the query QPS (60 K/sec average). Together they force a sharded distributed index — you can’t serve from one box, and you can’t refresh by rebuilding monolithically. The tiered base + delta + real-time architecture falls out of those two numbers.

Variants and trade-offs#

Bottom-up estimation — start with user model, multiply through to system numbers. The default for interviews. Tractable for any system where DAU is known. Risk: if you pick the wrong user model the entire chain is wrong.

Top-down estimation — start with a known public number (e.g. “the company runs 5 EB of storage”) and reason backward to per-user assumptions. Useful as a sanity check for the bottom-up answer. Harder to do in real time because it requires recalled industry numbers.

Use both. Pick a user model, multiply through, then sanity-check the answer against any known industry figure you can recall (“Netflix is famously a third of global internet traffic, so X EB/year”). If your bottom-up answer is within 5x of the sanity check, ship it.

Two common errors and how to dodge them:

Forgetting peak factor. Average QPS is misleading; the system has to survive peaks. Default peak factor of ~3x for global products, ~5x for regional products, ~10x for events with hard time windows (flash sales, sports finals).
Conflating bandwidth with storage. Bandwidth is bytes/sec; storage is bytes accumulated. A system can have low bandwidth and huge storage (cold archive) or high bandwidth and small storage (chat). Run both calculations; the bottleneck is whichever one trips first.

When this is asked in interviews#

Always. Step 2 of the seven-step framework. The interviewer is grading two things: that you can do the math, and that you can decide which numbers matter. A common follow-up is “what would change if DAU went up 10x?” — that’s an invitation to point at the line item that breaks first.

A senior interviewer specifically watches for whether you derive your peak factor honestly (5x for global, 10x for spiky) or whether you guess. Guessing is fine if you say so out loud and pick a defensible number.

Summary#

Why it matters#

How it works#

Example 1 — Twitter feed reads#

Example 2 — YouTube ingest#

Example 3 — WhatsApp messaging#

Example 4 — A web search index#

Variants and trade-offs#

When this is asked in interviews#

Related concepts#