Caching Strategies: Complete Guide for System Design
Overview
Caching is the single most impactful technique for scaling read-heavy systems. Understanding cache internals, eviction policies, and consistency patterns is essential for every system design interview.
The fundamental tradeoff: Speed vs. freshness. Cached data is fast but potentially stale.
1. Cache Eviction Policies
LRU (Least Recently Used)
How it works: Evict the item that hasn't been accessed for the longest time.
Implementation:
Data Structure: HashMap + Doubly Linked List
HashMap: key → node pointer (O(1) lookup)
Linked List: ordered by access time (most recent at head)
Operations:
- GET: Move accessed node to head, return value — O(1)
- PUT: Add to head, if full evict tail — O(1)
┌─────────────────────────────────────────┐
│ Head Tail │
│ [Most Recent] ↔ [...] ↔ [...] ↔ [LRU] │
│ ↑ │
│ Evict this │
└─────────────────────────────────────────┘Pros: Simple, works well for temporal locality Cons: Susceptible to scan pollution (one full scan evicts all hot items)
Real-world use: Memcached, Redis (approximated with sampling)
LFU (Least Frequently Used)
How it works: Evict the item with the lowest access count.
Implementation:
Data Structure: HashMap + Frequency Buckets (min-heap or linked lists)
freq=1: [item_D, item_E] ← evict from lowest frequency
freq=2: [item_B]
freq=5: [item_A, item_C]
On access: increment frequency, move to next bucket
On eviction: remove from lowest frequency bucket (LRU within bucket)Pros: Keeps genuinely popular items cached Cons:
- New items start at freq=1 (cold start problem)
- Historical frequency may not reflect current popularity
- Items that were popular long ago stay cached
Real-world use: Redis 4.0+ (LFU mode)
ARC (Adaptive Replacement Cache)
How it works: Dynamically balances between recency (LRU) and frequency (LFU).
┌─────────────────────────────────────────────────────┐
│ │
│ Ghost B1 T1 (Recency) │
│ [evicted LRU] ← [recent items] ──────┐ │
│ │ │
│ Cache (size c) ◄───┤ Adaptive │
│ │ boundary │
│ Ghost B2 T2 (Frequency) │ │
│ [evicted LFU] ← [frequent items] ─────┘ │
│ │
└─────────────────────────────────────────────────────┘
Adaptation:
- Hit in B1 ghost → increase T1 size (recency matters more)
- Hit in B2 ghost → increase T2 size (frequency matters more)Pros: Self-tuning, handles varying workloads Cons: More complex, higher memory overhead (ghost lists)
Real-world use: ZFS, IBM DB2
W-TinyLFU (Window Tiny Least Frequently Used)
How it works: Used by Caffeine (Java's best cache library). Combines:
- Admission window (1% of cache, LRU) — gives new items a chance
- Main cache (99%, segmented LRU) — holds proven items
- TinyLFU frequency sketch — decides admission to main cache
┌──────────────────────────────────────────────────────┐
│ │
│ New Item ──► Window (1%) ──► TinyLFU Filter ──► Main │
│ (LRU) (frequency check) (SLRU)│
│ │ │
│ ▼ │
│ Reject if new item's │
│ frequency < victim's │
│ frequency │
└──────────────────────────────────────────────────────┘TinyLFU uses Count-Min Sketch:
- 4-bit counters (max frequency 15)
- Periodic halving (aging) to adapt to changing patterns
- ~8 bytes per entry overhead
Pros: Near-optimal hit rates, low overhead, handles all workload types Cons: Complex implementation
Real-world use: Caffeine (Java), Ristretto (Go)
Comparison
| Policy | Hit Rate | Memory Overhead | Scan Resistant | Adaptive |
|---|---|---|---|---|
| LRU | Good | Low | No | No |
| LFU | Good (stable workloads) | Medium | Yes | No |
| ARC | Very Good | High (2x) | Yes | Yes |
| W-TinyLFU | Best | Low | Yes | Yes |
2. Cache Invalidation Strategies
"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton
TTL-Based (Time-To-Live)
SET user:123 "{name: 'Alice'}" EX 300 // Expires in 5 minutes
Timeline:
t=0 SET (fresh)
t=150 GET → hit (still fresh)
t=300 GET → miss (expired, refetch from DB)
t=301 SET (refreshed from DB)Pros: Simple, bounded staleness, self-healing Cons: Stale during TTL window, thundering herd on expiry
TTL selection guidelines:
| Data Type | TTL | Rationale |
|---|---|---|
| User session | 30 min | Security requirement |
| Product price | 60s | Business-critical accuracy |
| User profile | 5-15 min | Moderate change frequency |
| Static config | 1-24 hours | Rarely changes |
| DNS record | 300s | Balance freshness/load |
Event-Based Invalidation
┌─────────┐ write ┌─────────┐ invalidate ┌───────┐
│ Client │────────────►│ DB │───────────────►│ Cache │
└─────────┘ └─────────┘ └───────┘
│
│ publish event
▼
┌──────────────────┐
│ Message Queue │
│ (Kafka/Redis) │
└────────┬─────────┘
│ consume
▼
┌──────────────────┐
│ Cache Invalidator │──► DELETE cache key
└──────────────────┘Pros: Near-real-time freshness, precise invalidation Cons: Complex infrastructure, eventual consistency window
Write-Through
Write Request
│
▼
┌─────────┐ 1. Write ┌───────┐ 2. Write ┌─────┐
│ Client │──────────►│ Cache │───────────►│ DB │
└─────────┘ └───────┘ └─────┘
(synchronous)Behavior: Every write updates both cache and DB synchronously.
Pros: Cache always consistent with DB, simple reads Cons: Write latency increased (two writes), cache may hold rarely-read data
Write-Behind (Write-Back)
Write Request
│
▼
┌─────────┐ 1. Write ┌───────┐ 2. Async batch ┌─────┐
│ Client │──────────►│ Cache │─ ─ ─ ─ ─ ─ ─ ─►│ DB │
└─────────┘ └───────┘ (delayed) └─────┘
Returns immediatelyBehavior: Write to cache immediately, asynchronously persist to DB.
Pros: Very fast writes, batching reduces DB load Cons: Data loss risk if cache crashes before persist, complexity
Write-Around
Write Request
│
├──────────────────────────────────────►┌─────┐
│ │ DB │
│ (cache not updated) └─────┘
│
│ Subsequent read:
│ Cache miss → read from DB → populate cacheBehavior: Writes go directly to DB, bypassing cache.
Pros: Cache not polluted with write-once data Cons: First read after write is always a cache miss
3. Cache Stampede / Thundering Herd Prevention
The Problem
Popular cache key expires at t=100
t=100: 1000 concurrent requests all get cache miss
→ 1000 identical DB queries simultaneously
→ DB overwhelmed, cascading failure
Timeline:
t=99: Cache HIT (all fast)
t=100: Cache EXPIRED
t=100: Request 1 → DB query (slow)
t=100: Request 2 → DB query (slow) ← stampede!
t=100: Request 3 → DB query (slow)
...
t=100: Request 1000 → DB query (slow)
t=101: All 1000 responses come back, all write to cacheSolution 1: Locking (Mutex)
GET cache_key
IF miss:
IF acquire_lock(cache_key, timeout=5s):
value = query_database()
SET cache_key value EX 300
release_lock(cache_key)
return value
ELSE:
// Another thread is rebuilding
sleep(50ms)
retry GET cache_key // Should be populated nowPros: Only one DB query, simple Cons: Other requests wait (increased latency), lock management
Solution 2: Probabilistic Early Expiration
def get_with_early_expiry(key):
value, ttl, fetch_time = cache.get_with_metadata(key)
# Probabilistically refresh before actual expiry
# As TTL approaches 0, probability of refresh increases
remaining_ttl = ttl - (now() - fetch_time)
if remaining_ttl < 0 or random() < exp(-remaining_ttl / beta):
# Refresh in background
async_refresh(key)
return valuePros: No coordination needed, gradual refresh Cons: Occasional redundant refreshes, tuning required
Solution 3: Background Refresh
┌─────────────────────────────────────────────┐
│ Cache Entry: │
│ value: "data" │
│ hard_ttl: 300s (actual expiry) │
│ soft_ttl: 240s (trigger refresh) │
│ │
│ At soft_ttl: trigger background refresh │
│ Serve stale data until refresh completes │
└─────────────────────────────────────────────┘Pros: Zero latency impact, always serves data Cons: Briefly stale during refresh, background job infrastructure
Solution 4: Request Coalescing (Singleflight)
// Go's singleflight pattern
group := singleflight.Group{}
value, err, shared := group.Do(cacheKey, func() (interface{}, error) {
// Only ONE goroutine executes this
return fetchFromDatabase(key)
})
// All concurrent callers get the same resultPros: Exactly one DB call per key, no locks Cons: All waiters blocked until first completes
4. Multi-Level Caching Architecture
┌─────────────────────────────────────────────────────────────┐
│ │
│ L1: Application Cache (in-process) │
│ ┌─────────────────────────────────────┐ │
│ │ Caffeine/Guava (JVM heap) │ Latency: ~100ns │
│ │ Size: 100MB-1GB per instance │ Hit rate: 60-80% │
│ └─────────────────────────────────────┘ │
│ │ miss │
│ ▼ │
│ L2: Distributed Cache │
│ ┌─────────────────────────────────────┐ │
│ │ Redis Cluster / Memcached │ Latency: ~1ms │
│ │ Size: 10GB-1TB shared │ Hit rate: 90-95% │
│ └─────────────────────────────────────┘ │
│ │ miss │
│ ▼ │
│ L3: CDN Edge Cache │
│ ┌─────────────────────────────────────┐ │
│ │ CloudFront / Fastly / Akamai │ Latency: ~10-50ms │
│ │ Size: Distributed globally │ (from edge) │
│ └─────────────────────────────────────┘ │
│ │ miss │
│ ▼ │
│ Origin: Database / API │
│ ┌─────────────────────────────────────┐ │
│ │ PostgreSQL / DynamoDB │ Latency: 5-50ms │
│ └─────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘L1: Application-Level Cache
Characteristics:
- In-process memory (no network hop)
- Per-instance (not shared between servers)
- Fastest possible access (~100ns)
- Limited by JVM/process heap size
- Lost on restart/deploy
Best for: Hot configuration, user sessions, computed results
Invalidation challenge: Each instance has its own copy. Options:
- Short TTL (accept brief staleness)
- Pub/sub notification (Redis pub/sub, Kafka)
- Versioned keys (increment version on change)
L2: Distributed Cache (Redis/Memcached)
Redis vs Memcached:
| Feature | Redis | Memcached |
|---|---|---|
| Data structures | Rich (lists, sets, sorted sets, hashes) | Key-value only |
| Persistence | RDB + AOF | None |
| Replication | Built-in (master-replica) | None |
| Clustering | Redis Cluster (auto-sharding) | Client-side sharding |
| Memory efficiency | Less (overhead per key) | More (slab allocator) |
| Max value size | 512MB | 1MB |
| Threads | Single-threaded (6.0+ I/O threads) | Multi-threaded |
| Use case | Feature-rich caching, pub/sub | Simple high-throughput caching |
Redis Cluster topology:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Master 1 │ │ Master 2 │ │ Master 3 │
│ Slots 0-5460 │ │ Slots 5461- │ │ Slots 10923- │
│ │ │ 10922 │ │ 16383 │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
┌──────┴──────┐ ┌──────┴──────┐ ┌──────┴──────┐
│ Replica 1 │ │ Replica 2 │ │ Replica 3 │
└─────────────┘ └─────────────┘ └─────────────┘L3: CDN Cache
Best for: Static assets, API responses that are cacheable, media files
Cache-Control headers:
Cache-Control: public, max-age=3600, s-maxage=86400
│ │ │
│ │ └─ CDN caches for 24h
│ └─ Browser caches for 1h
└─ CDN can cache (not private)
Cache-Control: private, no-store
│ │
│ └─ Don't cache at all
└─ Only browser can cache (not CDN)5. Cache Consistency in Distributed Systems
The Consistency Problem
Server A writes to DB, invalidates local cache
Server B still has stale data in its local cache
User hits Server B → gets stale data
Timeline:
t=0: User updates profile (Server A)
t=0: Server A invalidates its L1 cache
t=0: Server A writes to DB
t=1: User reads profile (routed to Server B)
t=1: Server B returns stale L1 cache → INCONSISTENTSolutions
1. Cache-Aside with Short TTL:
Read: Check cache → miss → read DB → populate cache (TTL=60s)
Write: Update DB → delete cache key
Consistency window: up to TTL seconds of staleness2. Pub/Sub Invalidation:
Write to DB → Publish invalidation event → All instances delete key
┌──────────┐ ┌─────────┐ ┌──────────┐
│ Server A │────►│ Redis │────►│ Server B │
│ (writer) │ │ Pub/Sub │ │ (reader) │
└──────────┘ └─────────┘ └──────────┘
│
▼
┌──────────┐
│ Server C │
└──────────┘3. Versioned Cache Keys:
Key format: user:{id}:v{version}
Write: increment version in DB
Read: get current version → construct key → cache lookup
Old versions naturally expire via TTL
No explicit invalidation needed4. Lease-based Invalidation (Facebook's Memcache):
On cache miss: server gets a "lease" (token)
On cache fill: must present valid lease
If key was invalidated between miss and fill: lease is invalid → don't cache stale data6. When NOT to Cache
❌ Don't cache when:
| Scenario | Why |
|---|---|
| Write-heavy data | Cache constantly invalidated, overhead > benefit |
| Large objects accessed once | Wastes cache space, evicts hot items |
| Highly personalized data | Low reuse across users, huge cache size |
| Data requiring strong consistency | Staleness is unacceptable (financial) |
| Cheap-to-compute data | Caching overhead exceeds computation cost |
| Unpredictable access patterns | No temporal/spatial locality |
✅ Cache when:
- Read-to-write ratio > 10:1
- Data is expensive to compute or fetch
- Data is accessed by many users (shared)
- Slight staleness is acceptable
- Access patterns show temporal locality
Cache Sizing
Required cache size = working_set_size × (1 + overhead_factor)
Working set = items accessed within one TTL period
Overhead = ~20-30% for hash table, metadata, fragmentation
Example:
- 10M users, 80/20 rule → 2M active users
- Profile size: 2KB
- Working set: 2M × 2KB = 4GB
- With overhead: 4GB × 1.3 = ~5.2GB Redis7. Common Caching Patterns in System Design
Read-Through Cache
Application ──► Cache ──► (on miss) ──► Database
│
└── Cache handles miss logic internallyRefresh-Ahead
Cache proactively refreshes entries BEFORE they expire
Based on access patterns and predicted need
Good for: items with predictable access (e.g., homepage data)Cache Warming
On deploy/restart:
1. Query DB for top-N hot keys
2. Pre-populate cache
3. Start serving traffic
Prevents cold-start stampede after deploymentInterview Cheat Sheet
| When interviewer asks... | Key points to mention |
|---|---|
| "How to speed up reads?" | Multi-level cache, cache-aside pattern, TTL strategy |
| "Cache invalidation?" | Event-based for freshness, TTL as safety net |
| "What if cache goes down?" | Circuit breaker, fallback to DB, cache warming on recovery |
| "Cache consistency?" | Accept eventual consistency, pub/sub invalidation, versioned keys |
| "Thundering herd?" | Locking, singleflight, probabilistic early refresh |
| "What eviction policy?" | W-TinyLFU for best hit rate, LRU for simplicity |
| "Redis vs Memcached?" | Redis for features/persistence, Memcached for pure speed/simplicity |
| "How much cache do we need?" | Working set × overhead, 80/20 rule, measure hit rate |