Web Cache - Scale and Constraints
Traffic Estimation
Request Volume
- Total Requests: 100 billion requests per day
- Cache Hits: 90 billion (90% hit rate)
- Cache Misses: 10 billion (10% miss rate)
- Peak Traffic: 3x average during peak hours
- Requests per Second: ~1.16M average, 3.5M peak
- Geographic Distribution: 40% US, 30% EU, 20% Asia, 10% Other
Request Types
- Static Assets: 60% (images, CSS, JS, fonts)
- HTML Pages: 20% (full page caching)
- API Responses: 15% (JSON/XML data)
- Video Content: 5% (streaming segments)
User Metrics
- Active Users: 500 million daily active users
- Requests per User: 200 requests per day average
- Session Duration: 15 minutes average
- Concurrent Users: 10 million peak concurrent
Storage Capacity Planning
Content Size Distribution
- Tiny (<10KB): 40% - HTML, CSS, JS, JSON (avg 5KB)
- Small (10KB-100KB): 30% - Images, small files (avg 50KB)
- Medium (100KB-1MB): 20% - Large images, documents (avg 500KB)
- Large (1MB-10MB): 8% - Videos, PDFs (avg 5MB)
- Very Large (>10MB): 2% - Large videos, downloads (avg 50MB)
Storage Requirements
Calculation:
- Unique URLs: 10 billion
- Average object size: 200KB (weighted average)
- Total raw data: 10B × 200KB = 2PB
- Compression ratio: 2x (gzip/brotli)
- Compressed data: 1PB
- Hot data (30 days): 300TB
- Warm data (90 days): 700TB
- Total active cache: 1PB
Per-Node Storage (100 nodes):
- Hot tier (NVMe): 3TB per node
- Warm tier (SSD): 7TB per node
- Total: 10TB per nodeCache Growth
- Daily New Content: 100GB per day
- Monthly Growth: 3TB per month
- Annual Growth: 36TB per year
- Retention Policy: 90 days for most content
- Long-term Storage: Archive old content to object storage
Performance Requirements
Latency Targets
- Memory Cache Hit: <500μs (P99)
- SSD Cache Hit: <5ms (P99)
- Cache Miss: <100ms (P99, includes origin fetch)
- Cache Write: <10ms (async, non-blocking)
- Invalidation: <1s propagation across cluster
- Health Check: <100ms response
Throughput Requirements
- Per-Node Read: 100K requests/sec
- Per-Node Write: 10K cache updates/sec
- Cluster Read: 10M requests/sec (100 nodes)
- Cluster Write: 1M cache updates/sec
- Origin Requests: 1M requests/sec (10% miss rate)
- Invalidations: 10K/sec cluster-wide
Connection Handling
- Concurrent Connections: 100K per node
- New Connections: 10K/sec per node
- Keep-Alive: 60 seconds default
- Connection Pool: 1K connections to origin per node
- WebSocket Support: 10K concurrent WebSocket connections
Network Bandwidth
Inbound Traffic
- Cache Misses: 10B requests/day × 200KB = 2PB/day
- Average Inbound: 23GB/sec
- Peak Inbound: 69GB/sec (3x average)
- Per-Node Inbound: 230MB/sec average, 690MB/sec peak
Outbound Traffic
- Cache Hits: 90B requests/day × 200KB = 18PB/day
- Cache Misses: 10B requests/day × 200KB = 2PB/day
- Total Outbound: 20PB/day
- Average Outbound: 231GB/sec
- Peak Outbound: 693GB/sec
- Per-Node Outbound: 2.3GB/sec average, 6.9GB/sec peak
Internal Traffic
- Cache Synchronization: 1TB/day (invalidations, metadata)
- Health Checks: 100MB/day
- Monitoring: 10GB/day (metrics, logs)
- Total Internal: ~1.1TB/day = 12.7MB/sec
Memory Requirements
Per-Node Memory Allocation
- Hot Cache: 128GB (most frequently accessed)
- Index/Metadata: 16GB (cache keys, headers, metadata)
- Connection Buffers: 8GB (100K connections × 80KB)
- Request Processing: 4GB (request parsing, response building)
- Operating System: 4GB
- Total per Node: 160GB RAM
Cluster Memory
- Total Nodes: 100 nodes
- Total Cluster Memory: 16TB RAM
- Effective Cache: 12.8TB (hot data)
- Cache Hit Rate: 95% for hot data
- Memory Efficiency: 80% utilization
Memory Distribution
- L1 Cache (Memory): 128GB per node, <1ms latency
- L2 Cache (SSD): 10TB per node, <5ms latency
- L3 (Origin): Unlimited, <100ms latency
Compute Resources
CPU Requirements
- Per-Node CPU: 32 cores (2 × 16-core processors)
- CPU Utilization: 50% average, 80% peak
- Request Processing: 0.1ms CPU per request
- Compression: 10% CPU for gzip/brotli
- TLS Termination: 15% CPU for HTTPS
- Cache Management: 5% CPU for eviction, invalidation
Cluster Compute
- Total Nodes: 100 nodes
- Total CPU Cores: 3,200 cores
- Compute Capacity: 1.16M req/sec at 50% CPU
- Headroom: 50% capacity for growth and spikes
Disk I/O Requirements
Disk Specifications
Hot Tier: NVMe SSD, 3TB per node
- Read IOPS: 500K IOPS
- Write IOPS: 100K IOPS
- Sequential Read: 3GB/sec
- Sequential Write: 2GB/sec
Warm Tier: SATA SSD, 7TB per node
- Read IOPS: 100K IOPS
- Write IOPS: 50K IOPS
- Sequential Read: 500MB/sec
- Sequential Write: 400MB/sec
I/O Patterns
- Read I/O: 10K IOPS per node (10% miss rate)
- Write I/O: 5K IOPS per node (cache updates)
- Sequential Reads: 500MB/sec (large file streaming)
- Sequential Writes: 200MB/sec (cache population)
Cache Hit Rate Analysis
Hit Rate by Content Type
- Static Assets: 95% hit rate (CSS, JS, images)
- HTML Pages: 85% hit rate (with personalization)
- API Responses: 70% hit rate (shorter TTL)
- Video Content: 90% hit rate (popular content)
- Overall: 90% weighted average hit rate
Hit Rate by Time
- Peak Hours: 92% hit rate (hot content in cache)
- Off-Peak: 88% hit rate (some cache eviction)
- After Deployment: 60% hit rate (cold cache)
- Steady State: 90% hit rate (warm cache)
Factors Affecting Hit Rate
- TTL Configuration: Longer TTL = higher hit rate
- Cache Size: Larger cache = higher hit rate
- Traffic Patterns: Predictable traffic = higher hit rate
- Content Popularity: Zipf distribution (80/20 rule)
- Invalidation Frequency: More invalidations = lower hit rate
Geographic Distribution
Regional Deployment
Region: US-East
- Nodes: 30
- Traffic: 40% of total
- Latency: <5ms local, <50ms cross-region
Region: US-West
- Nodes: 20
- Traffic: 20% of total
- Latency: <5ms local, <70ms cross-region
Region: EU-West
- Nodes: 25
- Traffic: 30% of total
- Latency: <5ms local, <100ms cross-US
Region: Asia-Pacific
- Nodes: 15
- Traffic: 15% of total
- Latency: <5ms local, <150ms cross-US
Region: Other
- Nodes: 10
- Traffic: 5% of total
- Latency: <10ms local, <200ms cross-regionCross-Region Traffic
- Cache Synchronization: Invalidations propagated globally
- Origin Failover: Route to nearest healthy origin
- Content Replication: Popular content replicated across regions
- Latency Impact: <100ms for cross-region cache misses
Cost Estimation
Infrastructure Costs (Monthly)
- Compute: 100 nodes × $500 = $50,000
- Memory: 16TB RAM included in compute
- Storage: 1PB × $0.10/GB = $100,000
- Network: 600TB/month × $0.05/GB = $30,000
- Load Balancers: $5,000
- Total Infrastructure: $185,000/month
Operational Costs (Monthly)
- Monitoring: $3,000
- Logging: $2,000
- Support: $10,000
- Personnel: 3 engineers × $15,000 = $45,000
- Total Operational: $60,000/month
Total Cost of Ownership
- Monthly Total: $245,000
- Annual Total: $2,940,000
- Cost per Request: $0.0000000245 per request
- Cost per GB Served: $0.012 per GB
Cost Savings
- Origin Infrastructure Savings: $500,000/month (85% offload)
- Bandwidth Savings: $200,000/month (80% reduction)
- Total Savings: $700,000/month
- Net Savings: $455,000/month ($5.46M/year)
- ROI: 186% return on investment
This scale analysis provides the foundation for capacity planning, infrastructure provisioning, and cost optimization for a production-grade distributed web caching system.