Scale & Constraints

📖 7 min read 📄 Part 2 of 10

Web Cache - Scale and Constraints

Traffic Estimation

Request Volume

  • Total Requests: 100 billion requests per day
  • Cache Hits: 90 billion (90% hit rate)
  • Cache Misses: 10 billion (10% miss rate)
  • Peak Traffic: 3x average during peak hours
  • Requests per Second: ~1.16M average, 3.5M peak
  • Geographic Distribution: 40% US, 30% EU, 20% Asia, 10% Other

Request Types

  • Static Assets: 60% (images, CSS, JS, fonts)
  • HTML Pages: 20% (full page caching)
  • API Responses: 15% (JSON/XML data)
  • Video Content: 5% (streaming segments)

User Metrics

  • Active Users: 500 million daily active users
  • Requests per User: 200 requests per day average
  • Session Duration: 15 minutes average
  • Concurrent Users: 10 million peak concurrent

Storage Capacity Planning

Content Size Distribution

  • Tiny (<10KB): 40% - HTML, CSS, JS, JSON (avg 5KB)
  • Small (10KB-100KB): 30% - Images, small files (avg 50KB)
  • Medium (100KB-1MB): 20% - Large images, documents (avg 500KB)
  • Large (1MB-10MB): 8% - Videos, PDFs (avg 5MB)
  • Very Large (>10MB): 2% - Large videos, downloads (avg 50MB)

Storage Requirements

Calculation:
- Unique URLs: 10 billion
- Average object size: 200KB (weighted average)
- Total raw data: 10B × 200KB = 2PB
- Compression ratio: 2x (gzip/brotli)
- Compressed data: 1PB
- Hot data (30 days): 300TB
- Warm data (90 days): 700TB
- Total active cache: 1PB

Per-Node Storage (100 nodes):
- Hot tier (NVMe): 3TB per node
- Warm tier (SSD): 7TB per node
- Total: 10TB per node

Cache Growth

  • Daily New Content: 100GB per day
  • Monthly Growth: 3TB per month
  • Annual Growth: 36TB per year
  • Retention Policy: 90 days for most content
  • Long-term Storage: Archive old content to object storage

Performance Requirements

Latency Targets

  • Memory Cache Hit: <500μs (P99)
  • SSD Cache Hit: <5ms (P99)
  • Cache Miss: <100ms (P99, includes origin fetch)
  • Cache Write: <10ms (async, non-blocking)
  • Invalidation: <1s propagation across cluster
  • Health Check: <100ms response

Throughput Requirements

  • Per-Node Read: 100K requests/sec
  • Per-Node Write: 10K cache updates/sec
  • Cluster Read: 10M requests/sec (100 nodes)
  • Cluster Write: 1M cache updates/sec
  • Origin Requests: 1M requests/sec (10% miss rate)
  • Invalidations: 10K/sec cluster-wide

Connection Handling

  • Concurrent Connections: 100K per node
  • New Connections: 10K/sec per node
  • Keep-Alive: 60 seconds default
  • Connection Pool: 1K connections to origin per node
  • WebSocket Support: 10K concurrent WebSocket connections

Network Bandwidth

Inbound Traffic

  • Cache Misses: 10B requests/day × 200KB = 2PB/day
  • Average Inbound: 23GB/sec
  • Peak Inbound: 69GB/sec (3x average)
  • Per-Node Inbound: 230MB/sec average, 690MB/sec peak

Outbound Traffic

  • Cache Hits: 90B requests/day × 200KB = 18PB/day
  • Cache Misses: 10B requests/day × 200KB = 2PB/day
  • Total Outbound: 20PB/day
  • Average Outbound: 231GB/sec
  • Peak Outbound: 693GB/sec
  • Per-Node Outbound: 2.3GB/sec average, 6.9GB/sec peak

Internal Traffic

  • Cache Synchronization: 1TB/day (invalidations, metadata)
  • Health Checks: 100MB/day
  • Monitoring: 10GB/day (metrics, logs)
  • Total Internal: ~1.1TB/day = 12.7MB/sec

Memory Requirements

Per-Node Memory Allocation

  • Hot Cache: 128GB (most frequently accessed)
  • Index/Metadata: 16GB (cache keys, headers, metadata)
  • Connection Buffers: 8GB (100K connections × 80KB)
  • Request Processing: 4GB (request parsing, response building)
  • Operating System: 4GB
  • Total per Node: 160GB RAM

Cluster Memory

  • Total Nodes: 100 nodes
  • Total Cluster Memory: 16TB RAM
  • Effective Cache: 12.8TB (hot data)
  • Cache Hit Rate: 95% for hot data
  • Memory Efficiency: 80% utilization

Memory Distribution

  • L1 Cache (Memory): 128GB per node, <1ms latency
  • L2 Cache (SSD): 10TB per node, <5ms latency
  • L3 (Origin): Unlimited, <100ms latency

Compute Resources

CPU Requirements

  • Per-Node CPU: 32 cores (2 × 16-core processors)
  • CPU Utilization: 50% average, 80% peak
  • Request Processing: 0.1ms CPU per request
  • Compression: 10% CPU for gzip/brotli
  • TLS Termination: 15% CPU for HTTPS
  • Cache Management: 5% CPU for eviction, invalidation

Cluster Compute

  • Total Nodes: 100 nodes
  • Total CPU Cores: 3,200 cores
  • Compute Capacity: 1.16M req/sec at 50% CPU
  • Headroom: 50% capacity for growth and spikes

Disk I/O Requirements

Disk Specifications

  • Hot Tier: NVMe SSD, 3TB per node

    • Read IOPS: 500K IOPS
    • Write IOPS: 100K IOPS
    • Sequential Read: 3GB/sec
    • Sequential Write: 2GB/sec
  • Warm Tier: SATA SSD, 7TB per node

    • Read IOPS: 100K IOPS
    • Write IOPS: 50K IOPS
    • Sequential Read: 500MB/sec
    • Sequential Write: 400MB/sec

I/O Patterns

  • Read I/O: 10K IOPS per node (10% miss rate)
  • Write I/O: 5K IOPS per node (cache updates)
  • Sequential Reads: 500MB/sec (large file streaming)
  • Sequential Writes: 200MB/sec (cache population)

Cache Hit Rate Analysis

Hit Rate by Content Type

  • Static Assets: 95% hit rate (CSS, JS, images)
  • HTML Pages: 85% hit rate (with personalization)
  • API Responses: 70% hit rate (shorter TTL)
  • Video Content: 90% hit rate (popular content)
  • Overall: 90% weighted average hit rate

Hit Rate by Time

  • Peak Hours: 92% hit rate (hot content in cache)
  • Off-Peak: 88% hit rate (some cache eviction)
  • After Deployment: 60% hit rate (cold cache)
  • Steady State: 90% hit rate (warm cache)

Factors Affecting Hit Rate

  • TTL Configuration: Longer TTL = higher hit rate
  • Cache Size: Larger cache = higher hit rate
  • Traffic Patterns: Predictable traffic = higher hit rate
  • Content Popularity: Zipf distribution (80/20 rule)
  • Invalidation Frequency: More invalidations = lower hit rate

Geographic Distribution

Regional Deployment

Region: US-East
- Nodes: 30
- Traffic: 40% of total
- Latency: <5ms local, <50ms cross-region

Region: US-West
- Nodes: 20
- Traffic: 20% of total
- Latency: <5ms local, <70ms cross-region

Region: EU-West
- Nodes: 25
- Traffic: 30% of total
- Latency: <5ms local, <100ms cross-US

Region: Asia-Pacific
- Nodes: 15
- Traffic: 15% of total
- Latency: <5ms local, <150ms cross-US

Region: Other
- Nodes: 10
- Traffic: 5% of total
- Latency: <10ms local, <200ms cross-region

Cross-Region Traffic

  • Cache Synchronization: Invalidations propagated globally
  • Origin Failover: Route to nearest healthy origin
  • Content Replication: Popular content replicated across regions
  • Latency Impact: <100ms for cross-region cache misses

Cost Estimation

Infrastructure Costs (Monthly)

  • Compute: 100 nodes × $500 = $50,000
  • Memory: 16TB RAM included in compute
  • Storage: 1PB × $0.10/GB = $100,000
  • Network: 600TB/month × $0.05/GB = $30,000
  • Load Balancers: $5,000
  • Total Infrastructure: $185,000/month

Operational Costs (Monthly)

  • Monitoring: $3,000
  • Logging: $2,000
  • Support: $10,000
  • Personnel: 3 engineers × $15,000 = $45,000
  • Total Operational: $60,000/month

Total Cost of Ownership

  • Monthly Total: $245,000
  • Annual Total: $2,940,000
  • Cost per Request: $0.0000000245 per request
  • Cost per GB Served: $0.012 per GB

Cost Savings

  • Origin Infrastructure Savings: $500,000/month (85% offload)
  • Bandwidth Savings: $200,000/month (80% reduction)
  • Total Savings: $700,000/month
  • Net Savings: $455,000/month ($5.46M/year)
  • ROI: 186% return on investment

This scale analysis provides the foundation for capacity planning, infrastructure provisioning, and cost optimization for a production-grade distributed web caching system.