TikTok — System Design · Engineering Playbook

Step 1 — Clarify Requirements#

Functional

A user opens the app → the For You feed starts playing immediately. Vertical swipe advances to the next video, never running out.
Every video session emits signals (watch-time, completion, like, share, follow, skip, replay, “not interested”) that personalize the next session.
Uploads: short videos (3–180 s), transcoded, captioned, available globally within a few minutes.
Out of scope: live streaming, DMs, the creator-payout model, the parental-control surface.

Non-functional

99.99% availability for the feed. A blank For You is the only thing worse than the wrong For You.
p99 “time to first frame” under 600 ms on a warm app. p50 well under 200 ms.
1 B+ DAU, average session 50–60 min, 6–8 videos per minute → roughly 8 B–10 B video impressions/day.
Feed personalization must reflect signals from the same session — not next-day. The window we promise the recommender is closer to 30 seconds, not 30 minutes.

Out of scope explicitly: the moderation pipeline (separate writeup), the creator-side analytics dashboard, ads insertion (lives as a slot inside the feed but doesn’t change the ranking story).

Step 2 — Capacity Estimation#

Assume 1 B DAU. Session metrics from public disclosures and reasonable extrapolation:

Impressions: 1 B DAU × 8 videos/min × 60 min ≈ 480 B/day, or about 5.5 M impressions/sec average, peaking at 15 M/sec in the evening (US + EU overlap).
Recommendation QPS: each “swipe” requires a ranked candidate set. We don’t re-rank on every swipe — we prefetch 5–10 videos. So real recommender calls run at impressions / 8 ≈ 700 K/sec average, 2 M/sec peak.
Signal events (every play, pause, like, scrub): ~10 events per video × 5.5 M videos/sec = 55 M events/sec average, peaks near 150 M/sec. This is the dominant telemetry stream.
Uploads: 100 M creators upload roughly once/day → 1 200 uploads/sec average, 5 000/sec peak.
Video storage: 100 M new videos/day × ~5 MB average (post-transcode, multi-bitrate aggregate ~20 MB) = 500 TB/day raw, 2 PB/day after multi-bitrate ladder. Annualized: ~700 PB/year of cold-leaning content.
Egress bandwidth: 5.5 M streams/sec × ~1.5 Mbps average bitrate ≈ 1 Pbps sustained, peaks closer to 3 Pbps. This is what makes the CDN bill the largest line item.
Feature store reads: ranking needs ~200 features per (user, video) candidate × ~500 candidates per request × 700 K rps = 70 B feature lookups/sec. This is why a feature store is its own scaling problem.

Two takeaways shape the rest of the design. (1) The ranking pipeline, not the video pipeline, is the system. (2) Signal collection must be near-real-time — minutes-late personalization is detectable to users.

Step 3 — System Interface#

GET   /feed?session=<id>&cursor=<opaque>
        Returns: { videos: [{ id, manifest_url, preroll_features, ... }], next_cursor }
        Side effect: server commits an impression record per returned video.

POST  /event
        Body: { session, video_id, type, watch_ms, position_ms, ... }
        Returns: 202 (fire and forget; ordering preserved per session)

POST  /upload  (multipart, resumable via tus-style)
        Body: video bytes + metadata
        Returns: { video_id, status: 'processing' }

GET   /video/:id/manifest
        Returns: ABR manifest (HLS / DASH) pointing at CDN edges.

/feed returns 5–10 candidates at a time. The client autoplays one and pre-buffers two. /event is the chatty endpoint and is batched on-device every few hundred ms to amortize cost.

Step 4 — High-Level Design#

              ┌──────────────── signal bus (Kafka, 50M+ msgs/s) ──────────────┐
              │                                                                │
              ▼                                                                │
client → CDN edge ──→ feed gateway ──→ ranker ──→ feature store (online)       │
              ▲              │           │                                     │
              │              │           ├──→ candidate generator (ANN + heuristics)
              │              │           │
              │              └──→ session state (Redis)                        │
              │                                                                │
              └── HLS / DASH manifest ←── video CDN ←── packager ←── transcoder│
                                                                               │
                upload service → object store (S3/HDFS) ─→ transcode farm ─────┘
                                                          │
                                                          └─→ content embedding (multimodal model)
                                                                  │
                                                                  └→ ANN index (HNSW / IVF-PQ)

offline:  signal lake (S3 + Iceberg) → daily training jobs → new ranker model → A/B → rollout

Three coupled pipelines.

Serve path (hot, sub-second): edge → feed gateway → candidate gen → ranker → response. Streamed video comes from the CDN entirely separately; the API call is small JSON.
Signal path (warm, seconds): client batches events → ingest → Kafka → streaming feature updater → online feature store. The same Kafka stream tees to S3 for offline training.
Content path (cold, minutes): upload → transcode ladder → embed → add to ANN index → eligible for serving. New videos appear in candidates within 5–10 minutes for most creators; faster for trusted accounts.

The ranker is the load-bearing component. Everything else is plumbing that exists to feed the ranker the right signals fast enough.

Step 5 — Data Model#

Video metadata (sharded by video_id, KV store):

table videos
  video_id        uuid     PK
  creator_id      uuid
  uploaded_at     timestamp
  duration_ms     int
  hashtags        list<string>
  language        string
  audio_id        uuid           // shared audio tracks drive trends
  status          enum(processing, live, removed, restricted)
  embedding_ref   uuid           // pointer into ANN index

User profile (sharded by user_id):

table users
  user_id         uuid     PK
  long_term_emb   vector(128)    // refreshed daily from offline training
  short_term_emb  vector(64)     // refreshed on signal, last 30 minutes
  language_prefs  list<string>
  do_not_recommend list<creator_id>

Impressions (write-heavy, append-only, partition by (user_id, day)):

table impressions
  user_id      uuid
  video_id     uuid
  ts           timestamp
  watch_ms     int
  outcome      enum(skip, complete, replay, like, share, ...)
  rank_score   float    // what the model said
  actual_position int

The impressions table feeds dedup (“don’t show what I just saw”), offline training, and creator-side analytics. It’s the single largest store in the system after the videos themselves.

Online feature store (in-memory, distributed hash table layered over RocksDB):

user:{id}:short_term_features — last-N-watched topics, time spent, recent skips. TTL roughly 30 minutes.
video:{id}:counters — view count, like rate, completion rate. Updated by streaming aggregator every few seconds.
(user, creator) interaction features. Sparse; keyed by hashed pair.

Step 6 — Detailed Design#

Candidate generation#

For each /feed request the ranker needs ~500 candidate videos to score. Generating them naively (rank every video) is impossible — there are billions of eligible items. Three retrieval rails fan out in parallel:

candidate set =
  (200) two-tower ANN: user_embedding → kNN over video_embedding (HNSW, sharded by video_id)
+ (150) social: videos liked / shared by accounts you follow or interact with
+ (100) trending: per-region trending list, refreshed every 30 seconds
+ ( 50) exploration: random sample from underexposed videos to break filter bubbles
- impressions seen by this user in the last 30 days  (Bloom filter check)

The Bloom filter is per-user, ~10 KB, kept hot in Redis. False-positive rate ~1% — we occasionally don’t re-show something we could have, but never re-show something we shouldn’t.

Ranking#

A two-stage model. Stage 1 (lightweight, ~5 ms): score all 500 candidates with a logistic / small DNN over precomputed features. Keep the top 50. Stage 2 (heavy, ~30 ms): a deep multi-task model that predicts multiple labels (p_complete, p_like, p_share, p_follow, p_skip) and combines them into a weighted utility:

score = w1·p_complete + w2·p_like + w3·p_share + w4·p_follow - w5·p_skip - w6·repetition_penalty

The weights are themselves A/B-tested; the company-wide one we care about (w_share, w_follow) is what drives long-term retention. Optimizing only p_complete collapses the feed into doomscroll trash and tanks D30 retention — visible within 2 weeks of a bad weight change.

Single-objective ranking — one model predicts a single “engagement” score. Simple, easy to debug, fast. The risk: short-term engagement and long-term satisfaction diverge, and you only notice when retention drops.

Multi-task ranking — predict each signal independently, combine with tunable weights. More features to learn, more inference cost, but the weights become a steering wheel for product strategy without retraining.

Freshness vs personalization#

A pure ANN over user / video embeddings recommends videos similar to what you’ve already engaged with. Without an injection of new content, the feed calcifies. Mechanisms:

Recency boost: score += alpha · exp(-age_hours / 24) for the first 48 hours. Tuned per market.
Cold-start sampling: every Nth slot is reserved for a video uploaded in the last hour with limited prior exposure. Tracks how the video performs on this small sample and decides whether to widen distribution.
Audio-based pivots: short-form videos cluster heavily around shared audio. When a user engages with a sound, similar-sound videos enter the candidate pool even if the video embeddings differ.

Signal collection latency budget#

on-device event batch (200 ms)  →  edge ingest (50 ms)
  →  Kafka (10 ms append)  →  streaming feature updater (1–3 s)
  →  online feature store visible to ranker

End-to-end: a “skip” at second 5 of a video influences the next feed request roughly 3–5 seconds later. Good enough — most users swipe again within 8 seconds.

Video serving#

The video pipeline is conceptually boring at this scale. Upload → resumable object store → transcode farm (FFmpeg + GPUs for the heaviest codecs) → packager builds an HLS/DASH ladder (3–5 bitrates) → CDN. We pre-warm the edge with the lowest-bitrate variant for the predicted next video the moment /feed returns — this is what makes scroll feel instant.

The interesting part is CDN selection. A user in Jakarta watching a video from a creator in Brazil — we don’t want a transpacific pull every time. The packager replicates hot content to regional CDN tiers; the long tail lives in a colder origin and only fills the edge on demand. About 1% of videos drive 80% of plays, so the cache hit rate at the edge sits north of 95%.

Cold start#

A new user with zero history gets a fallback feed: trending-by-region, mixed with a few exploration probes. After ~30 watched videos (5–8 minutes of session) the short-term embedding has enough signal to drive the ranker. After ~3 sessions across 2 days the long-term embedding stabilizes. The “first 7 minutes” model is its own training target.

Step 7 — Evaluation & Trade-offs#

Bottleneck #1: feature store fanout. 70 B lookups/sec is the kind of number that only works because most lookups hit a few hundred GBs of in-memory data on hosts collocated with the ranker. The moment a model adds a feature with a long tail of keys (e.g., per-hashtag history), the cache-hit rate collapses and the ranker p99 doubles. New features go through a feature-cost review before rollout.

Bottleneck #2: signal pipeline backpressure. Kafka at 50 M+ msgs/sec is operationally non-trivial. A producer storm (the app retries aggressively on flaky networks) can push it to 200 M briefly. We shed load by sampling: less-than-1-second watches and idle scrolls get sampled at 10%, while completes / likes / shares are kept at 100%. The model is retrained on the sampled distribution with reweighting.

Bottleneck #3: ANN index freshness. A video uploaded at minute T should be eligible by minute T+5. The HNSW index doesn’t support cheap incremental rebuilds at billions of items, so we run a hot delta index (last 24 h, rebuilt every 5 min) merged at query time with the static base index (rebuilt nightly). The delta merge costs ~3 ms per query — acceptable.

Bottleneck #4: hot-creator amplification. A single viral video gets 100 M plays in an hour. The video CDN absorbs it, but the impression-counter store can shard-hotspot on video:{id}:counters. We shard those counters per-region and reconcile every minute, accepting brief inconsistency in “view count” displays.

Alternative I’d push back on: ranking entirely with a single end-to-end transformer over raw session history at request time. The latency budget (50–80 ms total for ranker) does not survive a 1 B-parameter forward pass per request at 2 M rps. We use distilled student models at the edge and a periodically-refreshed teacher offline. A “transformer-everywhere” pitch is what trainee engineers propose; the cost shape rules it out at this scale.

What breaks first at 10× scale (10 B DAU equivalent): the signal pipeline. The video CDN scales linearly with edges, the ranker scales linearly with feature-store shards, but the offline training pipeline (joining a year of impressions with content embeddings for the next-gen model) hits storage-bandwidth walls. The lake architecture (columnar formats, tiered storage, materialized rollups) is where the next platform investment goes.

Companies this resembles#

TikTok, Instagram Reels, YouTube Shorts, Snap Spotlight, Kuaishou. The video-CDN and upload-transcode pieces also resemble Vimeo / Twitch VOD; the ranking pipeline resembles any large recommender (Netflix, Spotify Discover Weekly) but with a much tighter signal-to-impression loop.

YouTube — the video-storage and transcode plumbing, at longer-form scale.
Instagram — the fan-out feed model, with a similar Reels surface bolted on.
Newsfeed System — the generic ranked-feed substrate.
ML Data Infrastructure — the offline training and feature-store layer.