Quora — System Design · Engineering Playbook

Step 1 — Clarify Requirements#

Functional

A user asks a question; others answer; readers upvote / downvote answers.
A question page shows answers ranked by quality.
Users follow topics and other users; a personalized feed surfaces relevant new questions and answers.
Search across questions and answers.
Notifications when followed-content gets new answers, when a user is mentioned, when an answer gets significant upvotes.
Out of scope: monetization, moderation tooling, anonymous answers (consider variant).

Non-functional

99.95% availability.
p99 question-page load under 400 ms.
300 M MAU; ~50 M questions; ~200 M answers; ~50 K writes/sec peak (votes are the dominant write).
Strong consistency on answer authorship and vote ownership; eventual consistency on aggregate vote counts and feed rankings.

Step 2 — Capacity Estimation#

Question reads: 300 M MAU × ~20 question pages/day = 6 B reads/day ≈ 70 K reads/sec average, 300 K/sec peak.
Writes: votes (50 K/sec peak), new answers (~50/sec), new questions (~5/sec).
Storage: 50 M questions × 1 KB + 200 M answers × 3 KB = 50 + 600 GB = 650 GB of text content. Tiny. Indexes, vote logs, and edit history dominate the total — ~10 TB.
Search index: full-text inverted index over 250 M docs, ~50 GB on disk per shard.
Notifications: ~1 M notifications/sec at peak (mostly aggregated into digests, not delivered individually).

The system is read-heavy and search-heavy. The interesting parts are ranking, search, and fan-out.

Step 3 — System Interface#

POST /questions                    { title, body, topics: [...] }
POST /answers                      { question_id, body }
POST /votes                        { answer_id, value: +1|-1 }
POST /follow                       { type: 'user'|'topic'|'question', target_id }

GET  /questions/:id                (title + answers, paginated by ranking)
GET  /search?q=...                 (across questions and answers)
GET  /feed?cursor=...              (personalized)

GET  /notifications                (paginated, with read/unread)

Posting and voting endpoints are idempotent on (user_id, target_id) — a stale retry must not double-vote.

Step 4 — High-Level Design#

                                                ┌── search index (sharded)
                                                │
client → LB → API ──┬── /questions/:id ──→ question service ──→ relational store (Postgres, sharded)
                    │                              │                    │
                    │                              ▼                    ▼
                    │                       ranking cache         vote counters (sharded)
                    │                       (Redis ZSET)
                    │
                    ├── /votes ─→ vote service ─→ Postgres + sharded counters
                    │                  │
                    │                  └→ async: rerank affected answer
                    │
                    ├── /feed ─→ feed service ─→ feed cache (Redis ZSET per user)
                    │                ▲
                    │                │
                    └── /search ─→ search service (Elasticsearch / Vespa)
                                     │
                                     └─ async: questions / answers indexed via Kafka

   Writes also fan out to: search index, notifications, feed personalization model.

Step 5 — Data Model#

Questions (Postgres, sharded by question_id):

table questions
  question_id    uuid     PK
  title          string
  body           text
  topic_ids      array<uuid>
  asker_id       uuid
  created_at     timestamp
  view_count     bigint   // async
  answer_count   int      // async

Answers:

table answers
  answer_id     uuid     PK
  question_id   uuid
  author_id     uuid
  body          text
  created_at    timestamp
  net_votes     int      // sharded counter; periodically rolled up
  rank_score    float    // precomputed for fast question-page rendering

Votes (immutable log; idempotent):

table votes
  user_id    uuid
  answer_id  uuid
  value      int    // +1 or -1
  ts         timestamp
  PK (user_id, answer_id)

Follow graph:

table follows
  follower_id    uuid
  target_type    enum(user, topic, question)
  target_id      uuid
  PK (follower_id, target_type, target_id)

Ranking cache (Redis ZSET per question, score = rank_score):

key: q:{question_id}:answers → ZSET of answer_ids

Step 6 — Detailed Design#

Answer ranking#

The question page shows the “best” answer first. The classic naive metric (raw vote count) is dominated by old answers that have accumulated votes over years. Better:

score(answer) = (net_votes + α) / (hours_since_post + β)^γ        // Hacker-News-like
        + author_credibility_bonus
        + asker_acceptance_bonus
        + personalization_term(viewer, answer)

The first three terms are global; personalization is computed per-viewer at read time. Quora-style ranking adds heavy ML ranker on top (a BERT-class model scoring question x answer for topic relevance, hedged so simple high-quality answers still surface).

The pre-ranked ZSET is invalidated on:

Vote change on any of the question’s answers (debounced; recompute every ~30 s).
New answer posted on the question.
Periodic decay refresh (every ~1 hour, score depends on age).

Votes at scale#

A viral answer gets 10 K upvotes/minute. The net_votes field can’t be a single row — it’s a hot key. Implementation:

Each upvote writes an immutable row to votes (idempotent on (user, answer)).
A sharded counter (net_votes:answer:{id}:shard:{N}) is incremented.
Periodically, the counter rolls up to the answer’s net_votes summary and triggers a re-rank.

See /system-design/sharded-counters for the pattern.

Personalized feed#

The feed surfaces:

Recent questions in topics the user follows.
Recent answers from users the user follows.
Trending in topics with high user affinity.
Editorial picks (“Best of Quora today”).

Implementation is a per-user Redis ZSET seeded by a personalization model (offline-batched) and topped up by streaming fan-out from new content. Same hybrid push/pull pattern as /system-design/twitter-newsfeed:

when new question created in topic T:
   for each follower of T:
       ZADD feed:{follower} score new_question_id
       trim to most recent ~500

For high-volume topics (e.g., “Programming” with millions of followers), this is unaffordable to push to everyone — pull-on-read for high-fanout topics.

Search#

Inverted index over questions and answers, sharded by document hash. Query path:

search "how to learn rust"
  → tokenize, expand synonyms (Rust language vs metal rust)
  → query each shard, fetch top-K from each
  → global merge by score (BM25 + recency + popularity + ML reranker)
  → return top 20

See /system-design/distributed-search for the substrate. Quora’s twist is the ML reranker on top, which often re-orders based on question-question similarity.

Notifications#

When something interesting happens, push a record to the recipient’s notification queue:

event: "Alice answered a question you follow"
event: "Your answer reached 100 upvotes"
event: "Bob mentioned you in an answer"

Notifications are aggregated into digests (per-hour, per-day) for users who don’t want every ping. The notification service is a fan-out engine with rate limits per user.

Question page latency budget (target 400 ms p99)#

LB + TLS:                          15 ms
Auth:                               5 ms
Question fetch (Postgres):         20 ms
Answer list (Redis ZSET):           3 ms
Hydrate top-5 answer bodies:       30 ms
Personalization rerank (top-5):    20 ms
Vote-state for viewer (Redis):      5 ms
Comments first page:               20 ms
Serialize + network back:          80 ms
                          total:  ~200 ms p99 (server)
                                  + image/font load happens in parallel

Step 7 — Evaluation & Trade-offs#

Bottleneck #1: search index update lag. A new answer should be findable within a minute. Async indexing via Kafka means the index can lag during spikes. A fallback for fresh content: query the relational store directly for very recent answers and merge into search results.

Bottleneck #2: ranking recompute storms. A high-traffic question with hundreds of answers gets re-ranked on every vote. Debounce per question (recompute at most once every 30 s); use sharded counters so the recompute reads a single aggregated value, not 200 sub-counters.

Bottleneck #3: feed personalization cost. Online ranking per feed-load is expensive (200+ candidates × model inference). Offline precompute the per-user candidate pool; only the final rescoring is online. A user with no recent activity gets a generic feed.

Alternative I’d push back on: storing vote counts as a row in the answer record updated on every vote. A celebrity Q&A would tombstone the row with contention. Always sharded counter for write-heavy aggregates.

What breaks first at 10× scale (3 B MAU): the search index. Already large at present scale; at 10× we’d need shard counts in the hundreds, with cross-shard merge becoming the dominant query cost. Pre-partition the index by topic so queries scope to relevant shards by default.

Companies this resembles#

Quora, Stack Overflow (heavier moderation, lighter personalization), Reddit (community-scoped, voting-first), Hacker News (single global feed, simpler ranking).

Generic Newsfeed System — abstraction of the personalized-feed component.
Twitter Newsfeed — same hybrid fan-out pattern for follows.
Distributed Search — substrate for question / answer search.
Typeahead Suggestion — autocomplete as the user types a question.