Photo upload, feed generation, story expiration, following graph.
Step 1 — Clarify Requirements#
Functional
- A user uploads a photo (or short video) → it appears in their followers’ feeds, sorted reverse-chronologically (and later, by ranker).
- A user opens the app → sees a personalized feed of recent posts.
- Stories: posts that expire after 24 hours.
- Following graph: bidirectional opt-in for private accounts, asymmetric for public.
- Out of scope here: Direct Messages, Reels recommendation, comments moderation, ads.
Non-functional
- 99.99% availability for the feed read path.
- p99 feed-load latency under 300 ms (with thumbnails, before full image bytes arrive).
- 2 B MAU; ~500 M DAU; ~100 M posts/day; ~5 B feed-loads/day.
- Eventual consistency on follower fan-out (1-5 second delay is fine).
Step 2 — Capacity Estimation#
- DAU: 500 M; average opens per day: 10 → 5 B feed-loads/day ≈ 60 K feed-loads/sec average, ~200 K/sec peak.
- Post writes: 100 M/day ≈ 1.2 K posts/sec average, ~5 K/sec peak.
- Per-post storage: original photo 2-4 MB; transcoded thumbnails (4 sizes) ~500 KB total. Net ~5 MB/post.
- Storage growth: 100 M × 5 MB = 500 TB/day, ~180 PB/year of media. Metadata is a rounding error.
- Fan-out: average user has ~150 followers. 1.2 K posts/sec × 150 = 180 K timeline writes/sec.
- Stories: ~500 M stories/day, expire in 24h → steady-state working set ~500 M stories × 5 MB = 2.5 PB hot.
The two design pressures: keep feed reads cheap, and don’t store 180 PB/year naively.
Step 3 — System Interface#
POST /uploads (resumable, returns upload_id)POST /posts (finalize upload, attach metadata) Body: { upload_id, caption, location?, tags? }
GET /feed?cursor=<opaque>&limit=20 Returns: { posts: [...], next_cursor }
POST /storiesGET /stories/:user_id (returns active stories only)POST /follow/:user_idDELETE /follow/:user_idPOST /like/:post_idThe feed cursor encodes (timestamp, post_id) so pagination is stable as new posts arrive.
Step 4 — High-Level Design#
┌─→ media CDN (photos, videos, thumbnails) │client → CDN → LB → API gateway ─┬─ /feed ───→ feed assembler ── timeline cache (Redis) │ ▲ │ │ push │ │ ├─ /posts ──→ post service → media store (blob) + metadata DB │ │ │ └→ fan-out worker → timeline cache │ └─ /stories ──→ story service → ephemeral store (24h TTL)Three loops, in increasing temperature:
- Story path: write-only with TTL. No need for a forever-store.
- Post path: durable write + fan-out to followers.
- Feed path: hot read; mostly cache-served.
Step 5 — Data Model#
Posts (sharded by user_id):
table posts user_id uuid PK post_id timeuuid CK media list<{ blob_uri, kind, w, h }> caption string created_at timestamp like_count bigint // async, sharded countersTimeline cache (Redis sorted sets, one per user, score = timestamp, capped at ~1000):
user:{follower_id}:timeline → ZSET of post_idsFollow graph (sharded by user):
table follows follower_id uuid PK followee_id uuid CK
table followers (inverse index, read at fan-out time) user_id uuid PK follower_id uuid CKStories (Redis or hot KV with TTL):
key: stories:{user_id} → list of story_idseach story: { blob_uri, expires_at }; auto-deleted by TTLStep 6 — Detailed Design#
Photo upload pipeline#
client uploads original ─→ blob store (S3-class) │ ▼ async transcode worker │ ├→ thumbnail 150×150 ├→ feed image 1080×1080 (or 1080×1350 portrait) ├→ low-bandwidth 480×480 └→ preview blurhash (32 chars, embedded in metadata)The feed shows the blurhash placeholder while the real image bytes are still loading, hiding latency. By the time the user scrolls past, the real image has arrived.
Feed fan-out (hybrid push / pull)#
Same hybrid pattern as /system-design/twitter-newsfeed: push to most users’ timelines, pull-on-read for celebrity accounts (>1 M followers). At Instagram’s follower distribution, ~5,000 accounts qualify as celebrities and represent a disproportionate share of post traffic.
When Bob loads his feed:
Bob's feed = MERGE( ZREVRANGE user:bob:timeline 0 100, // push: precomputed for each celebrity Bob follows: latest 20 posts from celebrity:{id}:posts // pull)then RANK(merged, model_features) and TAKE top 20The ranking model takes the merged candidates and re-orders by predicted engagement; Instagram’s feed hasn’t been strictly reverse-chronological for years.
Stories#
Stories are intentionally write-once, read-many for 24 hours, then gone. Implementation:
- Store the original media in the same blob store, tagged with
lifecycle = 24h. - Store a story-feed entry in Redis with TTL = 24 hours.
- A read of
/stories/:user_idreturns only un-expired entries. - A background sweep deletes the blob after 26 hours (2h grace for downloads in progress).
This means stories never enter the timeline cache; they’re a parallel data plane.
Feed read latency budget (target p99 300 ms)#
LB + TLS: 15 msAuth + edge: 5 msFollow-graph check (celebs): 5 msZREVRANGE on timeline cache: 3 msCeleb pull (parallel): 20 msHydrate post metadata: 20 msRanker inference (cached feats): 30 msSerialize + network back: 50 ms total: ~150 ms p99 (response) image bytes stream after via CDNLikes and counters#
like_count is a sharded counter (see /system-design/sharded-counters): each like increments one of N shards, aggregated lazily for display. Without this, a celebrity post (Taylor Swift dropping a photo) writes to one row at 100 K/sec and contention kills the database.
Following graph at write time#
When Alice follows Bob, two rows get written:
follows(alice, bob)followers(bob, alice) // inverse indexBoth writes go through a transactional outbox or dual-write with reconciliation. The inverse index is what fan-out workers read; keeping it consistent is critical.
Step 7 — Evaluation & Trade-offs#
Bottleneck #1: media bandwidth at the CDN edge. A celebrity story can drive 10s of Gbps from one regional edge for an hour. Mitigations: aggressive prewarm to all POPs when a celebrity goes live, multiple-tier caching, and per-ISP peering.
Bottleneck #2: the followers inverse index for fan-out. Reading “who follows celebrity X” is unbounded. Solution: don’t fan out for celebrities at all; they’re pull-only. The cutoff (1 M followers) is tuned by load.
Bottleneck #3: ranker inference latency. A heavy ranking model adds 100+ ms per feed load. Two-tier ranking — cheap candidate generator + expensive scorer over top-200 — keeps the heavy work bounded. Precompute user embeddings hourly; only the scoring step is online.
Alternative I’d push back on: storing every story write durably and querying a TTL’d index. Looks cleaner but does 100s of TB of writes per day that get deleted in 24 hours. Use a true ephemeral store with native TTL eviction, not a relational table with a delete WHERE expires_at < now cron.
What breaks first at 10× scale: the timeline cache memory budget. At 500 M users × 1000 entries × ~40 bytes per entry = 20 TB of Redis. Already painful; at 5 B DAU it’s 200 TB. Solution: shrink the cap (cache only the last 100 posts; reconstruct further history from the source on demand) and accept that “scroll back 6 months” gets slower.
Companies this resembles#
Instagram, Pinterest (interest-graph variant), Snapchat (story-first), TikTok (algorithmic feed, no explicit follow requirement).
Related systems#
- Twitter Newsfeed — same hybrid fan-out, less media weight.
- Generic Newsfeed System — abstracted version of this design.
- YouTube — media pipeline at a different aspect ratio and asset duration.
- Blob Store — substrate for photo and story storage.