Distributed Cache
A distributed cache trades memory for tail-latency reduction and database protection. The interesting design decisions are not "do we cache?" but which write pattern (cache-aside, write-through, write-back), how to route keys across nodes (consistent hashing), how to survive a hot key (replication, request coalescing), and what to do when a popular key expires under load (stampede). The wrong choices turn the cache from a shield into the failure domain.
Architecture
Capacity Estimation
For a service at 200 K reads/s with target 95% hit rate and average value size 2 KB:
| Metric | Value | Notes |
|---|---|---|
| Cache reads/s | 200 K | peak |
| DB reads/s on miss | 10 K | 5% miss |
| Working set | ~200 GB | 100 M items × 2 KB |
| Cluster RAM | 500 GB | 2.5× for replicas + headroom |
| Nodes (32 GB each) | ~16 | + replicas |
| p99 GET latency | ~1 ms | same-AZ Redis |
Write Patterns
Cache-aside (lazy-loading): app reads cache; on miss, reads DB and populates cache. Writes update DB and invalidate (or update) the cache. Default choice. Hazard: stale read if a concurrent reader reloads after a writer's invalidation but before the new write commits. Mitigate by setting a short post-write TTL (cache-aside double-delete).
Write-through: app writes to cache; cache writes synchronously to DB before ack. Strong consistency between cache and DB at the cost of write latency. Common for in-process caches and CDN edge writes.
Write-back (write-behind): app writes to cache; cache asynchronously flushes to DB. Lowest latency, highest data-loss risk on cache crash. Used for counters and metrics where some loss is tolerable, or paired with durable WAL.
Refresh-ahead: cache proactively reloads soon-to-expire hot keys. Reduces miss latency on predictable hotspots; wasted work on cold ones.
Consistent Hashing for Routing
The naive scheme hash(key) % N moves nearly every key when N changes. Consistent hashing places nodes on a 0..2³² ring; each key hashes to a point and is owned by the next node clockwise. Adding or removing a node moves only ~1/N of keys.
Vanilla consistent hashing has load imbalance; use virtual nodes (each physical node placed at 100–200 ring positions) to smooth it. Even better: jump consistent hash (Lamping & Veach) gives near-perfect balance with no per-node state, used by Vitess and Discord.
Client-side routing (Redis Cluster) puts the responsibility in the SDK. Proxy-side routing (Twemproxy, Envoy) centralizes it but adds a hop. See Redis Cluster for the slot-based variant.
Replication
Replication serves two goals: availability (failover) and read scaling. Master-replica with async replication (Redis primary/replica) is standard; expect ~1 ms replica lag in-AZ, 50–200 ms cross-region.
Choices: read-from-replica for hot keys (load split), with the caveat of stale reads. Quorum reads across replicas for stronger consistency at the cost of latency. Active-active with CRDT semantics for multi-region writes — powerful but complex; only the most demanding cases need it.
Hot Key Problem
One key for "Taylor Swift's profile" gets 50 K req/s — one node melts, the rest are bored. Three mitigations:
- Replicate hot keys across N nodes; the client picks one randomly. Detect heat via top-K counters (count-min sketch + min-heap) and promote keys dynamically.
- Local in-process L1 cache with short TTL (1–10 s). Caffeine on the JVM, in-process LRU in Go. The cache cluster becomes L2.
- Probabilistic sharding — store the hot key under N synthetic keys (
profile:taylor:0..profile:taylor:N); client picks one. Writes fan out to all replicas.
Cache Stampede
A popular key expires; 10 K concurrent readers all miss simultaneously and hammer the database. Fixes:
- Request coalescing (singleflight): the first miss takes a per-key lock; concurrent missers wait on the result. Trivially implemented in Go via
singleflight.Group; see also Memcached'sadd-as-lock pattern. - Probabilistic early expiration (XFetch): each reader recomputes a randomized refresh probability that increases as the key approaches TTL; one reader proactively refreshes ahead of the herd. The classic Vattani & Chierichetti paper.
- Soft TTL + hard TTL — serve the stale value past the soft TTL while one async worker refreshes; only block on a true miss past the hard TTL.
Eviction Policies
- LRU — default, good general-purpose. Approximate LRU (Redis) tracks last-access via a sample of K keys, not full history.
- LFU — biases toward long-popular keys; resists scan pollution but slow to adapt to traffic shifts.
- TinyLFU / W-TinyLFU (Caffeine) — admission filter combines frequency and recency; near-optimal hit rate for most workloads.
- FIFO — cheapest, used for streaming windows.
Set maxmemory-policy explicitly. The default of noeviction turns the cache into a single point of failure when full.
Redis vs Memcached vs DragonflyDB
- Memcached — simplest, multi-threaded, only string KV with byte values. Best for pure caching where you do not need data structures.
- Redis — rich data structures (lists, sets, sorted sets, streams, HyperLogLog), single-threaded core (multi-threaded I/O since 6.0), pub/sub, Lua scripting, cluster mode with hash slots. Default for most teams.
- DragonflyDB — Redis-compatible, multi-threaded shared-nothing per core, claims 25× throughput on a single instance. Worth evaluating when you would otherwise scale a Redis Cluster horizontally for CPU. See Redis vs Dragonfly for the deep comparison.
- Hazelcast / Infinispan — in-JVM grid caches; the right answer if your stack is heavily Java and you want compute-near-data.
Failure Modes
- Cache outage => DB meltdown — the cache hides 95% of the load; when it dies, the DB sees 20× traffic. Ensure DB has at least 30% headroom and the app has client-side per-key request coalescing.
- Split brain — partitioned replicas accept writes; on heal, last-writer-wins eats data. Use a single master with manual failover or a consensus-backed coordinator.
- Memory pressure — eviction storms when working set exceeds memory; p99 spikes to 50 ms. Alert on
used_memory / maxmemory > 0.9. - Slow keys —
KEYS *,SMEMBERSon million-element sets, blocking the single-threaded core. Disable dangerous commands; useSCANpatterns.
FAQ
What is a reasonable cache hit rate target?
Depends on workload. 90–99% for read-heavy product catalogs; 60–80% for personalized feeds. Below 50% the cache is hurting more than helping (still pays the network round trip on miss).
Cache invalidation or short TTL?
Both. TTL is your safety net — if invalidation logic has a bug, data eventually self-corrects. Invalidation cuts staleness window to seconds; TTL bounds it to minutes.
One big cache or per-service caches?
Per-service for isolation and independent scaling. Shared only for cross-cutting data like session tokens. Shared caches become political hot potatoes during incidents.
How do you migrate cache topology?
Run new cluster in parallel, dual-write from the app, switch reads when warmed, drop old cluster. Never live-resize a hash ring under traffic without dual-write.