System Design Workbook
61 topics across foundations, non-functional requirements, building blocks, full system designs, and public postmortems. Every system uses the same 7-step interview walk-through; every building block has a consistent design template.
Foundations
8 items The mental model and shared vocabulary every system-design interview assumes you already have. Frameworks, abstractions, consistency, failure modes, capacity math.
Foundations
8 items- The 7-Step System-Design Walk-Through
A repeatable interview framework — clarify, estimate, contract, high-level, data, detail, evaluate. Read this first; every system writeup on this site follows it.
Concept Foundational - Abstractions in Distributed Systems
What an interviewer means when they say 'service', 'node', 'cluster' — the unit boundaries that shape every design.
Concept Foundational - Remote Procedure Calls (RPC)
Network abstractions over function-call semantics: gRPC, Thrift, REST-as-RPC, the leaks they hide and don't.
Concept Foundational - Consistency Models
From strong to eventual: linearizability, sequential, causal, monotonic, eventual — what each costs and what each buys.
Concept Intermediate - Failure Models
Crash-stop, crash-recovery, omission, Byzantine — the assumption stack under every fault-tolerance claim.
Concept Intermediate - Capacity Estimation Cheatsheet
Latencies, throughputs, sizes, and conversion shortcuts to keep in your head for back-of-envelope math.
Concept Foundational - Resource Estimation — Worked Examples
Four end-to-end estimation walk-throughs (Twitter, YouTube, WhatsApp, a search index) — the math anchored to the Foundations cheatsheet.
Concept Foundational - Interview Frameworks Compared
8-step, 4-step, SNAKE, PACELC-flavored — what each emphasizes and which to anchor on under pressure.
Concept Foundational
Non-Functional Requirements
6 items Availability, reliability, scalability, maintainability, fault tolerance — the dimensions interviewers grade you on without saying so explicitly. NFRs convert vague hand-waves into numbers.
Non-Functional Requirements
6 items- Availability
Nines as a budget, redundancy strategies, failover modes, and the cost of each 9.
Concept Foundational - Reliability
Probability of working under expected conditions; how it differs from availability and where the trade-offs live.
Concept Foundational - Scalability
Vertical, horizontal, and elasticity. Why naive scaling stalls and which axis to pick first.
Concept Foundational - Maintainability
The operational burden of every architectural decision: observability, deployability, on-call load.
Concept Intermediate - Fault Tolerance
Designing for the worst expected failure and degrading gracefully past it.
Concept Intermediate - NFRs in Interviews
Which non-functional requirements to surface, in what order, and how to convert vague asks into numbers.
Concept Foundational
Building Blocks
18 items Reusable distributed-systems primitives — DNS, load balancers, caches, queues, search, monitoring. Every real-world design is a composition of these.
Building Blocks
18 items- Domain Name System (DNS)
Hierarchical name resolution, caching, TTL trade-offs, and DNS as a load-balancing primitive.
Building Block Foundational - Load Balancers
L4 vs L7, global vs local, algorithms (round-robin, least-connections, consistent-hash), placement tiers.
Building Block Foundational - Databases
Relational vs document vs wide-column vs graph: when each shape fits, and the trade-off triangle.
Building Block Foundational - Key-Value Store
Consistent-hash ring, replication factor, versioning (vector clocks), failure detection. Dynamo-style design.
Building Block Intermediate - Content Delivery Network (CDN)
Edge caching, origin shielding, push vs pull, cache invalidation, signed URLs.
Building Block Intermediate - Sequencer
Globally-ordered unique IDs with causality. Snowflake, TrueTime, hybrid logical clocks.
Building Block Advanced - Distributed Monitoring
Metrics, logs, traces — the three pillars and the data structures that scale each.
Building Block Intermediate - Server-Side Error Monitoring
Real-time error capture, deduplication, alerting, blast-radius scoping.
Building Block Intermediate - Client-Side Error Monitoring
Browser and mobile error capture, sampling, PII scrubbing, beacon transport.
Building Block Intermediate - Distributed Cache
Memcached vs Redis, sharding, eviction policies, replication, stampede protection.
Building Block Intermediate - Distributed Messaging Queue
FIFO vs at-least-once vs exactly-once, partitions, consumer groups, dead-letter queues.
Building Block Intermediate - Publish / Subscribe
Topic-based fan-out, ordering guarantees, filtering, retention, and the gap between pub-sub and queues.
Building Block Intermediate - Rate Limiter
Cap request rates per client to protect downstream services. Token bucket vs leaky bucket vs sliding window, with the gotchas of distributed coordination.
Building Block Intermediate - Blob Store
Object storage with metadata indexing: chunking, replication, lifecycle, multipart uploads.
Building Block Intermediate - Distributed Search
Inverted indexes, sharded indexing, replication, query fan-out, ranking pipelines.
Building Block Advanced - Distributed Logging
Log shipping, structured fields, aggregation, retention tiers, search-on-logs vs metrics.
Building Block Intermediate - Distributed Task Scheduler
Priority, idempotency, deduplication, retry policies, resource capacity allocation.
Building Block Advanced - Sharded Counters
Decompose a hot counter into N shards and aggregate on read — the canonical fix for write hotspots.
Building Block Intermediate
System Designs
25 items Full end-to-end designs of real-world products — YouTube, Uber, Twitter, ChatGPT, and more. Each writeup walks through the 7-step framework with one product as the worked example.
System Designs
25 items- URL Shortener
Map long URLs to short ones, redirect in O(1), survive billions of clicks. The canonical first system to whiteboard — sharp scope, real scale, every primitive in play.
System Foundational - Twitter Newsfeed
Generate a personalized timeline at read time, write time, or both. The pull / push / hybrid trade-off and the celebrity-fanout problem.
System Intermediate - YouTube
Video upload, encoding pipeline, CDN-backed delivery, watch-history, recommendations.
System Advanced - TikTok
Short-form video at hyperscale. The for-you ranking pipeline is the design — recommendation, signal collection, and freshness vs personalization trade-offs.
System Advanced - Instagram
Photo upload, feed generation, story expiration, following graph.
System Intermediate - WhatsApp
End-to-end-encrypted messaging at scale: presence, delivery receipts, group chats, media.
System Intermediate - Facebook Messenger
Real-time messaging with rich threads, reactions, typing indicators, and read receipts. WhatsApp's cousin with web-first reach and a different presence model.
System Intermediate - Uber
Driver / rider matching, real-time location, surge pricing, payments, fraud detection.
System Advanced - Google Maps
Map tiles, routing on a road graph, ETA prediction, real-time traffic ingestion.
System Advanced - Proximity Service (Yelp)
Geo-indexing, dynamic segments, search-within-radius queries at city scale.
System Intermediate - Quora
Q&A platform: ranking answers, follow graph, notification fan-out, search.
System Intermediate - Generic Newsfeed System
Pluggable feed engine: ranking, freshness, deduplication, infinite scroll, caching strategy.
System Intermediate - Typeahead Suggestion
Real-time prefix matching: trie indexes, ranking, server-side throttling, personalization.
System Intermediate - Web Crawler
URL frontier, politeness, deduplication, content extraction, crawl traps, index updates.
System Advanced - Google Docs (Collaborative Editing)
Operational transforms vs CRDTs, presence, conflict resolution, offline edits.
System Advanced - Code Deployment System
Pipelines, artifact stores, environment promotion, canary / blue-green / rolling, rollback.
System Intermediate - Payment System
Idempotency, double-entry ledgers, reconciliation, gateway integration, fraud signals.
System Advanced - ChatGPT-style Conversational System
Streaming inference, KV-cache reuse, request routing, safety filters, multi-tenant GPU scheduling.
System Advanced - AI / ML Data Infrastructure
Feature stores, training data pipelines, online vs batch features, lineage, vector storage.
System Advanced - LLM-Powered Customer Support Bot
RAG over knowledge bases, conversation memory, escalation handoff, guardrails.
System Intermediate - AI-Powered Code Assistant
Latency-sensitive completion, repo-aware context, indexing strategies, evaluation harness.
System Advanced - Food Delivery (Uber Eats)
Three-sided marketplace: restaurants, couriers, customers. Order routing, ETA, courier dispatch.
System Advanced - Dropbox (File Sync)
Client-side chunked replication, conflict resolution across devices, delta sync, and the LAN sync trick. Canonical 'consumer cloud storage' design.
System Advanced - Ticketmaster (Flash Sale)
Inventory reservation under coordinated burst load: waiting rooms, holds, atomic seat allocation, and the bot-vs-fan arms race.
System Advanced - Spotify Wrapped (Batch Analytics)
Annual per-user retrospective from a year of plays. Massively-parallel batch pipeline, per-user partitioning, and the once-a-year cost shape.
System Intermediate
Postmortems
4 items Public incident postmortems from real outages at Facebook, AWS, Cloudflare, and others. The teacher you actually learn from is the system that broke last week.
Postmortems
4 items- Facebook / WhatsApp / Instagram — 2021 BGP outage
A routine BGP maintenance command withdrew Facebook from the internet. Six hours of global blackout; internal tools and badge readers locked out of the buildings that hosted the fix.
Postmortem Foundational - AWS Kinesis — 2020 us-east-1 outage
Thread-limit exhaustion cascaded across services that depended on Kinesis for control-plane operations.
Postmortem Intermediate - AWS us-east-1 — repeated cascade failures
Why one region keeps taking down half the internet, and what 'control plane in one region' really costs.
Postmortem Intermediate - Cloudflare — 2019 regex catastrophic backtracking
A single WAF rule with exponential regex backtracking burned 100% CPU across every edge node simultaneously.
Postmortem Foundational