← All system designs

Failures & Postmortems

Public API postmortems — Knight Capital 2012, AWS S3 2017, Facebook + Uber outages. What the contract didn't enforce.

4 items 4 Foundational

Every senior API designer should be able to walk through two or three public API postmortems. They teach what fails in production and what the contract didn't say. The three included here cover three different root causes: a software deploy gone wrong (Knight Capital), a typo in a maintenance command (AWS S3), and cascading dependency failures (Facebook, Uber).

Read the public postmortems linked from each writeup. The writeups here are summaries with the API-design lesson surfaced — what the contract should have enforced, what monitoring should have caught, what guardrails would have prevented the cascade.

Key concepts

  • Most public API failures trace to a deploy or a config change — the API itself rarely fails on its own
  • The five patterns: deploy bug, capacity, contract drift, dependency outage, cascading retries
  • Public postmortems are the gold standard of operational learning
  • Out-of-band access matters when your normal access requires the API you're fixing
  • Cascading failure is the dominant production failure mode; design for blast radius

Reference template

// Reading an API postmortem
1. What was the trigger?      (deploy? config? capacity? dependency?)
2. What broke first?          (which contract failed first?)
3. What cascaded?             (and why)
4. What did recovery look like? (rollback? cold start? manual intervention?)
5. What changed afterwards?    (the API contract? the deploy pipeline? the monitoring?)

Adapt to your problem; the structure is the load-bearing part.

Common pitfalls

  • Treating 'a feature flag revived dead code' as exotic — it's the canonical deploy bug
  • Underestimating the cost of a typo in a privileged command — AWS S3 was that
  • Assuming dependent services will degrade gracefully — they won't, by default
  • Optimising for the happy path and forgetting the recovery path

Related topics

Items (4)

Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.