Distributed Cache — System Design · Engineering Playbook

Use cases#

A distributed cache is an in-memory KV layer fronting a slower backing store. It buys two things: lower latency and higher backend throughput, in exchange for managed staleness:

Read-heavy workloads — a hot product page, a popular user profile, a feature-flag config. Hit ratio 95-99% turns 100k QPS into a few thousand DB QPS.
Computation results — expensive JOINs, aggregated counts, rendered templates, ML inference outputs.
Rate limiter counters, session storage, idempotency keys — caches as primary stores for ephemeral state.
Pub-sub and queue substrates — Redis Streams, Pub/Sub channels, sorted-set leaderboards.

The boundary: a cache is expected to lose data (process restart, eviction). If losing the data is unacceptable, you need a real database, not a cache.

Functional requirements#

get(key), set(key, value, ttl), delete(key), atomic counters.
TTL-based expiration; LRU/LFU/random eviction when memory full.
Optional: pub-sub, streams, sorted sets, transactions, scripting (Redis-specific).
Sharding across nodes; replication for HA; client-side or proxy-based routing.

Non-functional requirements#

Latency: p99 under 1 ms within a datacenter (LAN RTT dominates). Sub-100 µs for unix-socket access.
Throughput: a single Redis instance handles 100k QPS on a single core; clustered Redis scales linearly to millions. Memcached pushes past 1M ops/sec per node by being multi-threaded.
Memory efficiency: critical — RAM is the dominant cost. Memcached’s slab allocator and Redis’s OBJECT ENCODING reuse compact representations (int, embstr, ziplist).
Availability: 99.99% via replication; 99.9% is acceptable if the backing DB can handle a full miss-storm.

High-level design#

client ──┬─> hash(key) → shard ──> primary ──(async)──> replica
         │                          │
         │                          └── eviction (LRU/LFU/TTL)
         ▼
   client-side ring                ┌── miss → fetch from DB
   (Memcached: clients pick shard) │
   (Redis Cluster: server tells)   └── set on populate

The client computes a hash of the key to pick a shard, then talks directly to that shard’s primary (or its replica for reads). On miss, the application fetches from the backing DB and SETs the cache; future requests are fast.

Detailed design#

Memcached vs Redis#

Memcached — simple KV with TTL. Multi-threaded. Stateless server (no replication). Slab allocator pre-buckets memory for fragmentation control. Designed for raw throughput as a transparent front for a DB.

Redis — single-threaded per shard, rich data types (lists, sets, sorted sets, hashes, streams, HyperLogLog, bitfields, geo), Lua scripting, pub-sub, replication, optional persistence (AOF/RDB), Redis Cluster for sharding.

Rule of thumb: pick Memcached for pure caching at massive scale; pick Redis when you want any of the rich data types or persistence. Facebook famously runs Memcached fleets in the hundreds of thousands of nodes; Twitter, Discord, and Stripe lean Redis-heavy for richer semantics.

Sharding#

Three flavors:

Client-side hashing       — driver hashes key, picks server. (Memcached, twemproxy)
                           Cluster changes require coordinated client config update.

Server-side routing       — Redis Cluster: each node owns a hash slot range;
                           on a wrong-shard request it returns `MOVED 3999 10.0.0.2:6379`
                           and the client redirects + caches the mapping.

Proxy-based              — twemproxy, Envoy, AWS ElastiCache reader endpoint:
                           clients connect to a proxy that hides sharding.

Consistent hashing (or Redis Cluster’s 16384 hash slots) ensures resharding moves only 1/N of keys.

Eviction policies#

Memcached and Redis both run out of memory eventually. Eviction policies:

allkeys-lru   — evict least-recently-used across all keys
allkeys-lfu   — evict least-frequently-used across all keys
volatile-lru  — evict LRU only among keys with TTL set
volatile-ttl  — evict the soonest-to-expire keys first
noeviction    — return error on writes when full (use when the cache is also a primary store)
random        — pure random eviction (cheap, surprisingly effective for hot/cold mix)

LFU (Redis 4.0+) usually wins because LRU thrashes on a single big scan — one analytical query reads every product once, evicting your hot pages.

Replication#

Redis replication is asynchronous: primary streams writes to replica(s). Replica lag is the data-loss window if the primary dies before replication catches up. Multi-AZ replication is standard for HA.

Failover is via Redis Sentinel (3+ sentinel nodes vote to promote a replica) or Redis Cluster’s built-in gossip. Both have a 10-30 s failover window typically.

Cache stampede protection#

When a hot key expires, every concurrent request misses simultaneously and stampedes the DB. Three defenses:

Probabilistic early expiration — refresh slightly before TTL with probability exp(-delta * beta). Spreads regeneration over time.
Single-flight (request coalescing) — only one request per key recomputes; others wait. sync.Once in Go; lockmap patterns elsewhere.
Logical TTL + soft expiry — store the value with an expiry timestamp inside; readers return stale on soft-expiry and trigger a background refresh. Equivalent to HTTP’s stale-while-revalidate.

// Single-flight pattern
shared_ptr<Value> get_or_compute(const string& key) {
    auto cached = cache.get(key);
    if (cached) return cached;

    auto fut = inflight.get_or_create(key, [&] {
        return std::async([&] {
            auto v = db.fetch(key);
            cache.set(key, v, TTL);
            return v;
        });
    });
    return fut.get();
}

Cache-aside vs read-through vs write-behind#

Cache-aside (look-aside) — app reads cache, on miss reads DB and populates cache, on write invalidates cache. Simplest, most common.
Read-through — cache library reads DB on miss. Encapsulates the pattern but blurs failure modes.
Write-through — every write hits both cache and DB synchronously. Cache is always fresh; writes are slower.
Write-behind — writes hit cache, async-flushed to DB. Fast writes, risk of loss on cache failure.

Cache-aside is the right default unless you have a specific reason otherwise.

Topology: regional vs global#

Most distributed caches are regional — a Redis cluster per region, with cache-misses falling back to a regional database replica. Cross-region cache invalidation is hard; the typical pattern is to use short TTLs (60 s) on cross-region data and accept the staleness.

Trade-offs#

Other axes:

Persistence on vs off — Redis can persist (AOF every fsync, RDB snapshots). Persistence costs latency (~5-10% throughput) but speeds restart. Caches usually run with persistence off.
In-memory only vs disk-backed — Memcached pure RAM; some forks (Pelikan, Twemcache) and EnterpriseDB add SSD tiers. Disk tier expands capacity at 10× lower cost per byte but at 100-1000× latency.
Cluster vs single-instance — single instance is simpler and faster (no MOVED redirects); cluster gives horizontal scaling but pinning is harder.

Real-world examples#

Facebook Memcached — the original at-scale deployment; their 2013 NSDI paper describes thousands of Memcached servers fronting MySQL, with lease tokens to prevent stampedes.
Twitter Twemcache — fork of Memcached with random eviction and slab automove.
Pelikan (Twitter) — modern reimplementation of the cache server family; powers Twitter’s hot timeline cache.
GitHub — Redis-backed counters, queues (Resque), and feature flags. Discussed in their 2022 outage post-mortem when Redis Cluster failover failed.
Discord — Redis for presence (live “online” status across millions of users), Cassandra for messages.
AWS ElastiCache — managed Redis and Memcached; the easy button for most AWS-native shops.
Cloudflare’s tiered cache — distributed cache between edge POPs and origin, reducing origin egress dramatically.

Key-Value Store — durable cousin; caches and KV stores share the same wire protocol shape.
Content Delivery Network — HTTP-layer distributed cache.
Rate Limiter — most rate limiters store counters in a distributed cache.
Pub-Sub — Redis Pub/Sub is a common in-process messaging substrate.