Instagram — System Design · Engineering Playbook

Step 1 — Clarify Requirements#

Functional

A user uploads a photo (or short video) → it appears in their followers’ feeds, sorted reverse-chronologically (and later, by ranker).
A user opens the app → sees a personalized feed of recent posts.
Stories: posts that expire after 24 hours.
Following graph: bidirectional opt-in for private accounts, asymmetric for public.
Out of scope here: Direct Messages, Reels recommendation, comments moderation, ads.

Non-functional

99.99% availability for the feed read path.
p99 feed-load latency under 300 ms (with thumbnails, before full image bytes arrive).
2 B MAU; ~500 M DAU; ~100 M posts/day; ~5 B feed-loads/day.
Eventual consistency on follower fan-out (1-5 second delay is fine).

Step 2 — Capacity Estimation#

DAU: 500 M; average opens per day: 10 → 5 B feed-loads/day ≈ 60 K feed-loads/sec average, ~200 K/sec peak.
Post writes: 100 M/day ≈ 1.2 K posts/sec average, ~5 K/sec peak.
Per-post storage: original photo 2-4 MB; transcoded thumbnails (4 sizes) ~500 KB total. Net ~5 MB/post.
Storage growth: 100 M × 5 MB = 500 TB/day, ~180 PB/year of media. Metadata is a rounding error.
Fan-out: average user has ~150 followers. 1.2 K posts/sec × 150 = 180 K timeline writes/sec.
Stories: ~500 M stories/day, expire in 24h → steady-state working set ~500 M stories × 5 MB = 2.5 PB hot.

The two design pressures: keep feed reads cheap, and don’t store 180 PB/year naively.

Step 3 — System Interface#

POST  /uploads                       (resumable, returns upload_id)
POST  /posts                         (finalize upload, attach metadata)
      Body: { upload_id, caption, location?, tags? }

GET   /feed?cursor=<opaque>&limit=20
      Returns: { posts: [...], next_cursor }

POST  /stories
GET   /stories/:user_id              (returns active stories only)
POST  /follow/:user_id
DELETE /follow/:user_id
POST  /like/:post_id

The feed cursor encodes (timestamp, post_id) so pagination is stable as new posts arrive.

Step 4 — High-Level Design#

                                                      ┌─→ media CDN (photos, videos, thumbnails)
                                                      │
client → CDN → LB → API gateway ─┬─ /feed ───→ feed assembler ── timeline cache (Redis)
                                 │                                       ▲
                                 │                                       │ push
                                 │                                       │
                                 ├─ /posts ──→ post service → media store (blob) + metadata DB
                                 │                  │
                                 │                  └→ fan-out worker → timeline cache
                                 │
                                 └─ /stories ──→ story service → ephemeral store (24h TTL)

Three loops, in increasing temperature:

Story path: write-only with TTL. No need for a forever-store.
Post path: durable write + fan-out to followers.
Feed path: hot read; mostly cache-served.

Step 5 — Data Model#

Posts (sharded by user_id):

table posts
  user_id     uuid    PK
  post_id     timeuuid CK
  media       list<{ blob_uri, kind, w, h }>
  caption     string
  created_at  timestamp
  like_count  bigint   // async, sharded counters

Timeline cache (Redis sorted sets, one per user, score = timestamp, capped at ~1000):

user:{follower_id}:timeline → ZSET of post_ids

Follow graph (sharded by user):

table follows
  follower_id  uuid  PK
  followee_id  uuid  CK

table followers   (inverse index, read at fan-out time)
  user_id      uuid  PK
  follower_id  uuid  CK

Stories (Redis or hot KV with TTL):

key: stories:{user_id} → list of story_ids
each story: { blob_uri, expires_at }; auto-deleted by TTL

Step 6 — Detailed Design#

Photo upload pipeline#

client uploads original ─→ blob store (S3-class)
                              │
                              ▼
                        async transcode worker
                              │
                              ├→ thumbnail 150×150
                              ├→ feed image 1080×1080 (or 1080×1350 portrait)
                              ├→ low-bandwidth 480×480
                              └→ preview blurhash (32 chars, embedded in metadata)

The feed shows the blurhash placeholder while the real image bytes are still loading, hiding latency. By the time the user scrolls past, the real image has arrived.

Feed fan-out (hybrid push / pull)#

Same hybrid pattern as /system-design/twitter-newsfeed: push to most users’ timelines, pull-on-read for celebrity accounts (>1 M followers). At Instagram’s follower distribution, ~5,000 accounts qualify as celebrities and represent a disproportionate share of post traffic.

When Bob loads his feed:

Bob's feed = MERGE(
   ZREVRANGE user:bob:timeline 0 100,    // push: precomputed
   for each celebrity Bob follows:
       latest 20 posts from celebrity:{id}:posts  // pull
)
then RANK(merged, model_features) and TAKE top 20

The ranking model takes the merged candidates and re-orders by predicted engagement; Instagram’s feed hasn’t been strictly reverse-chronological for years.

Stories#

Stories are intentionally write-once, read-many for 24 hours, then gone. Implementation:

Store the original media in the same blob store, tagged with lifecycle = 24h.
Store a story-feed entry in Redis with TTL = 24 hours.
A read of /stories/:user_id returns only un-expired entries.
A background sweep deletes the blob after 26 hours (2h grace for downloads in progress).

This means stories never enter the timeline cache; they’re a parallel data plane.

Feed read latency budget (target p99 300 ms)#

LB + TLS:                        15 ms
Auth + edge:                      5 ms
Follow-graph check (celebs):      5 ms
ZREVRANGE on timeline cache:      3 ms
Celeb pull (parallel):           20 ms
Hydrate post metadata:           20 ms
Ranker inference (cached feats): 30 ms
Serialize + network back:        50 ms
                          total: ~150 ms p99 (response)
                                 image bytes stream after via CDN

Likes and counters#

like_count is a sharded counter (see /system-design/sharded-counters): each like increments one of N shards, aggregated lazily for display. Without this, a celebrity post (Taylor Swift dropping a photo) writes to one row at 100 K/sec and contention kills the database.

Following graph at write time#

When Alice follows Bob, two rows get written:

follows(alice, bob)
followers(bob, alice)    // inverse index

Both writes go through a transactional outbox or dual-write with reconciliation. The inverse index is what fan-out workers read; keeping it consistent is critical.

Step 7 — Evaluation & Trade-offs#

Bottleneck #1: media bandwidth at the CDN edge. A celebrity story can drive 10s of Gbps from one regional edge for an hour. Mitigations: aggressive prewarm to all POPs when a celebrity goes live, multiple-tier caching, and per-ISP peering.

Bottleneck #2: the followers inverse index for fan-out. Reading “who follows celebrity X” is unbounded. Solution: don’t fan out for celebrities at all; they’re pull-only. The cutoff (1 M followers) is tuned by load.

Bottleneck #3: ranker inference latency. A heavy ranking model adds 100+ ms per feed load. Two-tier ranking — cheap candidate generator + expensive scorer over top-200 — keeps the heavy work bounded. Precompute user embeddings hourly; only the scoring step is online.

Alternative I’d push back on: storing every story write durably and querying a TTL’d index. Looks cleaner but does 100s of TB of writes per day that get deleted in 24 hours. Use a true ephemeral store with native TTL eviction, not a relational table with a delete WHERE expires_at < now cron.

What breaks first at 10× scale: the timeline cache memory budget. At 500 M users × 1000 entries × ~40 bytes per entry = 20 TB of Redis. Already painful; at 5 B DAU it’s 200 TB. Solution: shrink the cap (cache only the last 100 posts; reconstruct further history from the source on demand) and accept that “scroll back 6 months” gets slower.

Companies this resembles#

Instagram, Pinterest (interest-graph variant), Snapchat (story-first), TikTok (algorithmic feed, no explicit follow requirement).

Twitter Newsfeed — same hybrid fan-out, less media weight.
Generic Newsfeed System — abstracted version of this design.
YouTube — media pipeline at a different aspect ratio and asset duration.
Blob Store — substrate for photo and story storage.

Step 1 — Clarify Requirements#

Step 2 — Capacity Estimation#

Step 3 — System Interface#

Step 4 — High-Level Design#

Step 5 — Data Model#

Step 6 — Detailed Design#

Photo upload pipeline#

Feed fan-out (hybrid push / pull)#

Stories#

Feed read latency budget (target p99 300 ms)#

Likes and counters#

Following graph at write time#

Step 7 — Evaluation & Trade-offs#

Companies this resembles#

Related systems#