Distributed Cache
Memcached vs Redis, sharding, eviction policies, replication, stampede protection.
Use cases#
A distributed cache is an in-memory KV layer fronting a slower backing store. It buys two things: lower latency and higher backend throughput, in exchange for managed staleness:
- Read-heavy workloads — a hot product page, a popular user profile, a feature-flag config. Hit ratio 95-99% turns 100k QPS into a few thousand DB QPS.
- Computation results — expensive
JOINs, aggregated counts, rendered templates, ML inference outputs. - Rate limiter counters, session storage, idempotency keys — caches as primary stores for ephemeral state.
- Pub-sub and queue substrates — Redis Streams, Pub/Sub channels, sorted-set leaderboards.
The boundary: a cache is expected to lose data (process restart, eviction). If losing the data is unacceptable, you need a real database, not a cache.
Functional requirements#
get(key),set(key, value, ttl),delete(key), atomic counters.- TTL-based expiration; LRU/LFU/random eviction when memory full.
- Optional: pub-sub, streams, sorted sets, transactions, scripting (Redis-specific).
- Sharding across nodes; replication for HA; client-side or proxy-based routing.
Non-functional requirements#
- Latency: p99 under 1 ms within a datacenter (LAN RTT dominates). Sub-100 µs for unix-socket access.
- Throughput: a single Redis instance handles 100k QPS on a single core; clustered Redis scales linearly to millions. Memcached pushes past 1M ops/sec per node by being multi-threaded.
- Memory efficiency: critical — RAM is the dominant cost. Memcached’s slab allocator and Redis’s
OBJECT ENCODINGreuse compact representations (int, embstr, ziplist). - Availability: 99.99% via replication; 99.9% is acceptable if the backing DB can handle a full miss-storm.
High-level design#
client ──┬─> hash(key) → shard ──> primary ──(async)──> replica │ │ │ └── eviction (LRU/LFU/TTL) ▼ client-side ring ┌── miss → fetch from DB (Memcached: clients pick shard) │ (Redis Cluster: server tells) └── set on populateThe client computes a hash of the key to pick a shard, then talks directly to that shard’s primary (or its replica for reads). On miss, the application fetches from the backing DB and SETs the cache; future requests are fast.
Detailed design#
Memcached vs Redis#
Rule of thumb: pick Memcached for pure caching at massive scale; pick Redis when you want any of the rich data types or persistence. Facebook famously runs Memcached fleets in the hundreds of thousands of nodes; Twitter, Discord, and Stripe lean Redis-heavy for richer semantics.
Sharding#
Three flavors:
Client-side hashing — driver hashes key, picks server. (Memcached, twemproxy) Cluster changes require coordinated client config update.
Server-side routing — Redis Cluster: each node owns a hash slot range; on a wrong-shard request it returns `MOVED 3999 10.0.0.2:6379` and the client redirects + caches the mapping.
Proxy-based — twemproxy, Envoy, AWS ElastiCache reader endpoint: clients connect to a proxy that hides sharding.Consistent hashing (or Redis Cluster’s 16384 hash slots) ensures resharding moves only 1/N of keys.
Eviction policies#
Memcached and Redis both run out of memory eventually. Eviction policies:
allkeys-lru — evict least-recently-used across all keysallkeys-lfu — evict least-frequently-used across all keysvolatile-lru — evict LRU only among keys with TTL setvolatile-ttl — evict the soonest-to-expire keys firstnoeviction — return error on writes when full (use when the cache is also a primary store)random — pure random eviction (cheap, surprisingly effective for hot/cold mix)LFU (Redis 4.0+) usually wins because LRU thrashes on a single big scan — one analytical query reads every product once, evicting your hot pages.
Replication#
Redis replication is asynchronous: primary streams writes to replica(s). Replica lag is the data-loss window if the primary dies before replication catches up. Multi-AZ replication is standard for HA.
Failover is via Redis Sentinel (3+ sentinel nodes vote to promote a replica) or Redis Cluster’s built-in gossip. Both have a 10-30 s failover window typically.
Cache stampede protection#
When a hot key expires, every concurrent request misses simultaneously and stampedes the DB. Three defenses:
- Probabilistic early expiration — refresh slightly before TTL with probability
exp(-delta * beta). Spreads regeneration over time. - Single-flight (request coalescing) — only one request per key recomputes; others wait.
sync.Oncein Go;lockmappatterns elsewhere. - Logical TTL + soft expiry — store the value with an expiry timestamp inside; readers return stale on soft-expiry and trigger a background refresh. Equivalent to HTTP’s
stale-while-revalidate.
// Single-flight patternshared_ptr<Value> get_or_compute(const string& key) { auto cached = cache.get(key); if (cached) return cached;
auto fut = inflight.get_or_create(key, [&] { return std::async([&] { auto v = db.fetch(key); cache.set(key, v, TTL); return v; }); }); return fut.get();}Cache-aside vs read-through vs write-behind#
- Cache-aside (look-aside) — app reads cache, on miss reads DB and populates cache, on write invalidates cache. Simplest, most common.
- Read-through — cache library reads DB on miss. Encapsulates the pattern but blurs failure modes.
- Write-through — every write hits both cache and DB synchronously. Cache is always fresh; writes are slower.
- Write-behind — writes hit cache, async-flushed to DB. Fast writes, risk of loss on cache failure.
Cache-aside is the right default unless you have a specific reason otherwise.
Topology: regional vs global#
Most distributed caches are regional — a Redis cluster per region, with cache-misses falling back to a regional database replica. Cross-region cache invalidation is hard; the typical pattern is to use short TTLs (60 s) on cross-region data and accept the staleness.
Trade-offs#
Other axes:
- Persistence on vs off — Redis can persist (AOF every fsync, RDB snapshots). Persistence costs latency (~5-10% throughput) but speeds restart. Caches usually run with persistence off.
- In-memory only vs disk-backed — Memcached pure RAM; some forks (Pelikan, Twemcache) and EnterpriseDB add SSD tiers. Disk tier expands capacity at 10× lower cost per byte but at 100-1000× latency.
- Cluster vs single-instance — single instance is simpler and faster (no MOVED redirects); cluster gives horizontal scaling but pinning is harder.
Real-world examples#
- Facebook Memcached — the original at-scale deployment; their 2013 NSDI paper describes thousands of Memcached servers fronting MySQL, with lease tokens to prevent stampedes.
- Twitter Twemcache — fork of Memcached with random eviction and slab automove.
- Pelikan (Twitter) — modern reimplementation of the cache server family; powers Twitter’s hot timeline cache.
- GitHub — Redis-backed counters, queues (Resque), and feature flags. Discussed in their 2022 outage post-mortem when Redis Cluster failover failed.
- Discord — Redis for presence (live “online” status across millions of users), Cassandra for messages.
- AWS ElastiCache — managed Redis and Memcached; the easy button for most AWS-native shops.
- Cloudflare’s tiered cache — distributed cache between edge POPs and origin, reducing origin egress dramatically.
Related building blocks#
- Key-Value Store — durable cousin; caches and KV stores share the same wire protocol shape.
- Content Delivery Network — HTTP-layer distributed cache.
- Rate Limiter — most rate limiters store counters in a distributed cache.
- Pub-Sub — Redis Pub/Sub is a common in-process messaging substrate.