URL Shortener

Map long URLs to short ones, redirect in O(1), survive billions of clicks. The canonical first system to whiteboard — sharp scope, real scale, every primitive in play.

System Foundational
6 min read
hash-map kv-store redirect analytics
Companies this resembles: TinyURL · bit.ly · Twitter t.co

Step 1 — Clarify Requirements#

Functional

  • Given a long URL, produce a short URL (~7-8 characters past the host).
  • Given a short URL, redirect to the original long URL.
  • Optional custom alias (/sm/my-talk) — first-come-first-served.
  • Optional expiration (default: never).
  • Clicks per shortened URL — basic analytics (count, last-clicked, geo).

Non-functional

  • 99.99% availability for redirects (3 nines is too low — every broken redirect is a lost click).
  • p99 redirect latency: under 50 ms end-to-end.
  • Read:write ratio is ~100:1 — heavy redirect traffic, light shortening traffic.
  • 5-year retention. Short codes never get reused (renaming breaks links forever).

Out of scope: editing existing shortened URLs after creation, fine-grained per-click analytics dashboards, programmable redirects.

Step 2 — Capacity Estimation#

Assume 500 M new short URLs per month (canonical exam number).

  • Writes: 500M / 30 / 86400 ≈ 190 writes/sec, peaks at ~500/sec.
  • Reads: 100× writes ≈ 19 000 reads/sec, peaks at ~50 000/sec.
  • Storage per record: ~500 bytes (long URL 200B, short code 8B, metadata 250B, indexes 50B).
  • Storage growth: 500M × 500 B = 250 GB/month, 15 TB over 5 years.
  • Bandwidth: redirects are tiny (302 + headers ≈ 300 B). 19 000 × 300 B ≈ 6 MB/sec read traffic; effectively nothing.
  • Cache: top 20% of URLs serve 80% of clicks (Pareto). With 1 B URLs over 5 years, hot set ≈ 200 M × 500 B = 100 GB of cache — sharded Redis at 4 × 32 GB.

These numbers reshape the design: shortening is a slow path; redirects need to be cache-served. The whole interesting design is around the read path.

Step 3 — System Interface#

POST /shorten
Body: { url: string, alias?: string, ttl_days?: number }
Returns: { short_url: string } (idempotent on (url, user))
GET /:short_code
Returns: HTTP 301 with Location: <original_url>
Returns: HTTP 404 if missing or expired
GET /:short_code/stats (optional analytics endpoint)
Returns: { clicks, last_clicked, top_geos }

POST /shorten should be idempotent on (url, user) — same input from same user returns the same short code (no need to re-encode). Idempotency-Key header for the API-call retry case.

Step 4 — High-Level Design#

┌─→ analytics queue ──→ ClickHouse
client → CDN → LB → API gateway ──┬──→ Redis (short → long) ─cache miss─┐
│ │ │
└─ /shorten│ ▼
└──→ ID generator ─→ KV store (DynamoDB / Cassandra)

Two paths through the system:

  • Redirect path (hot): LB → API gateway → Redis lookup → 301. Misses fall through to the KV store and backfill the cache.
  • Shorten path (cold): API gateway → ID generator → write to KV → invalidate / write-through cache.

The CDN handles DDoS protection at the edge and can short-circuit popular redirects via cache-control headers (clients then redirect from their browser cache, never hitting our infrastructure).

Step 5 — Data Model#

The whole system is one wide-column or KV table:

table url_map
short_code string PK // 7-8 chars base62
long_url string
user_id string? // null for anonymous
created_at timestamp
expires_at timestamp? // null = never
click_count bigint // updated asynchronously

Partition key: short_code. Reads are point lookups. No range queries needed; no secondary indexes needed unless we add user-facing dashboards.

A separate user_urls table (PK (user_id, created_at)) handles “show me my history” queries — kept out of the hot redirect path.

Generating the short code#

Three viable approaches:

Hash + base62base62(md5(long_url + salt))[:8]. Deterministic (same input → same code) but collision-prone at scale, requires a retry loop on conflict.
Counter + base62 — sequential monotonic ID via a sequencer service, encoded in base62. No collisions ever. Predictable URLs (information leakage about volume) — mitigate by encrypting the counter before encoding.

A third option is to pre-allocate ranges: each shortener instance reserves a block of 10,000 IDs from a central allocator and serves them locally. No coordination per-request; bounded waste on instance crash. This is what TinyURL-style systems usually run.

8 base62 characters = 62^8 = 218 trillion codes. We won’t run out.

Step 6 — Detailed Design#

The redirect path (the hot loop)#

GET /:code
→ check Redis (LRU, ~100 GB sharded)
→ on hit: return 301
→ on miss: read KV, write-back to Redis, return 301
→ on KV miss: return 404

Target p99 of 50 ms. Realistic budget:

  • LB + TLS handshake: 10 ms
  • Edge auth / abuse check: 2 ms
  • Redis lookup: 1–2 ms
  • KV lookup (on cache miss): 5–10 ms
  • Network back to client: 5 ms

Redis hit rate at 95% means 5% of redirects pay an extra 5–10 ms — still well within budget. The CDN can short-circuit the top 0.1% of URLs entirely.

The shorten path#

POST /shorten
→ idempotency check (user, url) in Redis (5-minute TTL)
→ allocate short code (counter-based, encoded)
→ write to KV (consistency: write to majority, replicate async to others)
→ write-through Redis with TTL = expires_at - now
→ return short_url

The write must be durable (we just promised the user a URL). The cache write is best-effort — if it fails we’ll backfill on the first read.

Analytics#

Asynchronous. Every redirect emits an event to a queue (Kafka), which feeds a column store (ClickHouse) for aggregation. The hot path never blocks on analytics. Worst case: a few seconds of click delay during a Kafka hiccup — acceptable for our spec.

click_count in the main table is updated in batches (every 30 sec) from the column store. Live dashboards read directly from ClickHouse.

Custom aliases#

Custom aliases go through a CHECK-IF-EXISTS → CLAIM Lua script in Redis (atomic) before the KV write. Race conditions are bounded by the script’s atomicity. Reserved alias-prefix list (/admin, /api, /login, etc.) lives in config.

Expiration#

A daily cron scans expires_at and tombstones expired records. Reads on tombstones return 404 immediately. We don’t physically delete — preserves the no-reuse guarantee.

Step 7 — Evaluation & Trade-offs#

Bottleneck #1: the KV store at peak write. 500 writes/sec is well within DynamoDB / Cassandra capacity, but the analytics write amplification (one event per redirect = 50k writes/sec to Kafka) needs careful capacity planning.

Bottleneck #2: cache size. 100 GB hot set is large enough that we shard Redis. With consistent-hash sharding and N=3 replicas, single-shard failures lose 1/N of the hot set — recoverable in seconds from KV.

Bottleneck #3: thundering herds on viral URLs. A tweet linking to /abc123 gets 1 M clicks in 5 minutes. If Redis evicted it, the first 100 requests stampede the KV. Mitigations:

  • Single-flight at the cache (lock the key while one request fetches).
  • Probabilistic early expiration so we don’t synchronously expire popular keys.
  • CDN with Cache-Control: max-age=300 on the 301 response — the client redirects from browser cache after the first hit.

Alternative I’d push back on: storing per-click analytics in the main KV. Cheaper-looking but kills write IOPS budget on day one. Always offload analytics to a column store, even at small scale.

What breaks first at 10× scale (5 B URLs/month): the analytics pipeline. The hot redirect path scales linearly with more Redis shards; the analytics fan-out grows superlinearly with cardinality (top-N queries, per-user breakdowns). Plan for that re-architecture before it bites.

Companies this resembles#

bit.ly, TinyURL, t.co, goo.gl (RIP), Firebase Dynamic Links.

  • Twitter Newsfeed — same read-heavy / heavy-cache shape at a different magnitude.
  • Web Crawler — same base62 encoding trick for crawl URLs.
  • Rate LimiterPOST /shorten must be rate-limited or it becomes a free spam pipe.
Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.