Web Cache - System Architecture
High-Level Architecture
System Components
┌─────────────────────────────────────────────────────────────────┐
│ Client Layer │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Browser │ │ Mobile │ │ API │ │ CDN │ │
│ │ Clients │ │ Apps │ │ Clients │ │ Edge │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Load Balancer Layer │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Global Load Balancer (GeoDNS, Anycast) │ │
│ │ - Geographic routing │ │
│ │ - Health checking │ │
│ │ - DDoS protection │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Cache Layer (L1) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Cache │ │ Cache │ │ Cache │ │ Cache │ │
│ │ Node 1 │ │ Node 2 │ │ Node 3 │ │ Node N │ │
│ │ │ │ │ │ │ │ │ │
│ │ Memory │ │ Memory │ │ Memory │ │ Memory │ │
│ │ + SSD │ │ + SSD │ │ + SSD │ │ + SSD │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼ (Cache Miss)
┌─────────────────────────────────────────────────────────────────┐
│ Origin Layer │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Origin Servers (Web Servers, App Servers, Storage) │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘Cache Node Architecture
Internal Components
┌─────────────────────────────────────────────────────────────────┐
│ Cache Node │
├─────────────────────────────────────────────────────────────────┤
│ Request Handler │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ - HTTP/HTTPS termination │ │
│ │ - Request parsing │ │
│ │ - Cache key generation │ │
│ │ - Header processing │ │
│ └──────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ Cache Lookup Engine │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ L1: Memory Cache (128GB) │ │
│ │ - Hash table for O(1) lookup │ │
│ │ - LRU eviction policy │ │
│ │ - <1ms latency │ │
│ ├──────────────────────────────────────────────────────────┤ │
│ │ L2: SSD Cache (10TB) │ │
│ │ - B-tree index for fast lookup │ │
│ │ - LRU eviction policy │ │
│ │ - <5ms latency │ │
│ └──────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ Origin Fetcher │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ - Connection pooling │ │
│ │ - Request coalescing │ │
│ │ - Circuit breaker │ │
│ │ - Retry logic │ │
│ └──────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ Cache Manager │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ - Eviction policy enforcement │ │
│ │ - TTL management │ │
│ │ - Invalidation handling │ │
│ │ - Statistics collection │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘Request Flow
Cache Hit Flow
1. Client Request
↓
2. Load Balancer → Route to nearest cache node
↓
3. Cache Node → Parse request, generate cache key
↓
4. L1 Memory Cache → Lookup (hit!)
↓
5. Validate freshness (TTL, ETag)
↓
6. Return cached response to client
Total Time: <1msCache Miss Flow
1. Client Request
↓
2. Load Balancer → Route to nearest cache node
↓
3. Cache Node → Parse request, generate cache key
↓
4. L1 Memory Cache → Lookup (miss)
↓
5. L2 SSD Cache → Lookup (miss)
↓
6. Origin Fetcher → Request coalescing check
↓
7. Fetch from origin server
↓
8. Store in L2 SSD Cache
↓
9. Store in L1 Memory Cache
↓
10. Return response to client
Total Time: <100msConditional Request Flow
1. Client Request with If-Modified-Since header
↓
2. Cache Lookup → Found cached content
↓
3. Compare timestamps
↓
4. If not modified:
- Return 304 Not Modified
- No body transfer
↓
5. If modified:
- Fetch fresh content from origin
- Update cache
- Return 200 OK with new contentCache Key Generation
Cache Key Components
Cache Key = hash(
scheme + // http or https
host + // www.example.com
port + // 80, 443
path + // /api/users
query_string + // ?id=123&sort=name
vary_headers // Accept-Encoding, Accept-Language
)
Examples:
1. Simple: "GET:example.com:/index.html"
2. With query: "GET:example.com:/api/users?id=123"
3. With vary: "GET:example.com:/page.html:gzip:en-US"
4. Custom: "GET:example.com:/api/data:user_id=456"Vary Header Handling
Response Headers:
Vary: Accept-Encoding, Accept-Language
Cache Keys Generated:
- "GET:example.com:/page:gzip:en-US"
- "GET:example.com:/page:gzip:es-ES"
- "GET:example.com:/page:identity:en-US"
- "GET:example.com:/page:identity:es-ES"
Each combination cached separatelyEviction Policies
LRU (Least Recently Used)
Implementation:
- Hash table for O(1) lookup
- Doubly linked list for O(1) eviction
- Move to head on access
- Evict from tail when full
Pros: Simple, effective for most workloads
Cons: One-time access can evict useful dataLFU (Least Frequently Used)
Implementation:
- Track access frequency per item
- Min-heap for O(log n) eviction
- Evict least frequently accessed
Pros: Better for skewed access patterns
Cons: Higher overhead, slow to adaptTTL-Based Eviction
Implementation:
- Store expiration timestamp
- Background thread scans for expired items
- Lazy eviction on access
Pros: Respects cache-control headers
Cons: May keep stale data longerCache Invalidation
Invalidation Methods
1. TTL Expiration:
- Automatic expiration after TTL
- Lazy deletion on access
- Background cleanup
2. Explicit Purge:
- Purge specific URL
- Purge by pattern/wildcard
- Purge by tag
- Purge all
3. Conditional Invalidation:
- If-Modified-Since validation
- ETag validation
- Stale-while-revalidate
4. Event-Based:
- Webhook triggers
- Message queue events
- Database change streamsInvalidation Propagation
┌─────────────┐
│ Invalidation│
│ Request │
└──────┬──────┘
│
▼
┌─────────────────┐
│ Invalidation │
│ Coordinator │
└──────┬──────────┘
│
├──────────────┬──────────────┬──────────────┐
▼ ▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Cache │ │ Cache │ │ Cache │ │ Cache │
│ Node 1 │ │ Node 2 │ │ Node 3 │ │ Node N │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
Propagation Time: <1 secondCache Stampede Prevention
Request Coalescing
Scenario: 1000 concurrent requests for same uncached URL
Without Coalescing:
- 1000 requests to origin
- Origin overload
- High latency
With Coalescing:
1. First request → Fetch from origin
2. Subsequent 999 requests → Wait for first
3. All 1000 requests served from single origin fetch
Implementation:
- Lock on cache key
- First request fetches
- Others wait on lock
- Broadcast result to all waitersStale-While-Revalidate
Cache-Control: max-age=3600, stale-while-revalidate=86400
Behavior:
- Content fresh for 1 hour
- After 1 hour, serve stale content
- Async refresh in background
- Next request gets fresh content
Benefits:
- No user-facing latency
- Origin load spread over time
- Always serve fast responsesCompression and Content Encoding
Compression Support
Supported Algorithms:
- Gzip: Good compression, wide support
- Brotli: Better compression, modern browsers
- Deflate: Legacy support
Compression Strategy:
1. Check Accept-Encoding header
2. Serve pre-compressed if available
3. Compress on-the-fly if needed
4. Cache both compressed and uncompressed
Storage:
- Store compressed version
- Decompress if client doesn't support
- Trade-off: CPU vs storageMonitoring and Observability
Key Metrics
Performance:
- Cache hit rate (%)
- P50, P95, P99 latency
- Requests per second
- Bandwidth (in/out)
Resource:
- CPU utilization
- Memory usage
- Disk usage
- Network bandwidth
Cache:
- Cache size (items, bytes)
- Eviction rate
- Invalidation rate
- Miss rate by reason
Origin:
- Origin requests
- Origin latency
- Origin errors
- Origin bandwidthThis architecture provides a comprehensive foundation for building a high-performance, scalable distributed web caching system.