Scaling Considerations

📖 12 min read 📄 Part 6 of 10

CDN Network - Scaling Considerations

Adding New PoPs

Site Selection Criteria

1. Network connectivity:
   - Proximity to Internet Exchange Points (IXPs)
   - Available peering partners (Tier 1 ISPs, eyeball networks)
   - Backbone connectivity to existing PoPs
   - Latency to target user population (<10ms goal)

2. User demand analysis:
   - DNS query patterns showing underserved regions
   - Latency measurements from RUM (Real User Monitoring)
   - Traffic volume from region exceeding threshold (>5 Gbps sustained)
   - Customer requests for coverage in specific markets

3. Infrastructure availability:
   - Data center space and power (minimum 2-5 MW)
   - Redundant power feeds and cooling
   - Physical security and compliance certifications
   - Expansion capacity for 3-5 year growth

4. Cost considerations:
   - Colocation costs per kW/rack
   - Local transit pricing (varies 10x between regions)
   - Peering availability (settlement-free vs paid)
   - Labor costs for on-site maintenance

Peering Strategy

Peering types at each PoP:
- Public peering: Connect at IXP route servers (low cost, many peers)
- Private peering: Direct cross-connects to major ISPs (better performance)
- Paid transit: Backup connectivity via Tier 1 providers (Cogent, Lumen, NTT)
- ISP embedding: Place servers inside ISP networks (best latency)

Peering economics:
- Target: 80%+ traffic via settlement-free peering
- Remaining: 15% paid transit, 5% backbone to other PoPs
- Break-even: PoP becomes profitable at ~20 Gbps sustained traffic
- ROI timeline: 12-18 months for Tier 2 PoPs, 6-12 months for Tier 1

Capacity Planning Per PoP

Initial deployment (Tier 2 PoP):
- 4-8 racks of servers (48-96 servers)
- 100 Gbps aggregate network capacity
- 200 TB SSD + 1 PB HDD storage
- 2x 100GE uplinks (redundant)

Growth triggers for expansion:
- CPU utilization > 60% sustained
- Network utilization > 50% of capacity
- Storage utilization > 75%
- Cache eviction rate increasing (hit ratio dropping)
- P95 latency exceeding SLA

Scaling unit: Add capacity in "pods" of 2 racks (24 servers)
- Each pod adds: ~25 Gbps throughput, 50 TB SSD, 250 TB HDD
- Deployment time: 2-4 weeks (hardware procurement to production)

Cache Warming Strategies for New Edges

Proactive Warming

Strategy 1: Top-N content pre-population
- Identify top 10,000 objects by request count from nearest existing PoP
- Pre-fetch from origin or peer PoP before announcing the new edge
- Covers 60-80% of expected requests on day one
- Execution: Background job pulling content at 10 Gbps for 2-4 hours

Strategy 2: Peer-assisted warming
- New PoP initially routes cache misses to nearest peer PoP (not origin)
- Peer PoP acts as temporary origin shield
- Gradually shift to direct origin pulls as local cache fills
- Timeline: 24-72 hours to reach steady-state hit ratio

Strategy 3: Traffic shadowing
- Before going live, mirror a percentage of traffic from nearby PoP
- Process requests but discard responses (just populate cache)
- Validate cache correctness before serving real users
- Duration: 4-12 hours of shadowing

Gradual Traffic Migration

Phase 1 (Day 0): Shadow traffic only, no real serving
Phase 2 (Day 1): 5% of regional traffic via weighted DNS
Phase 3 (Day 2-3): 25% of traffic, monitor hit ratio and latency
Phase 4 (Day 4-7): 50% of traffic, validate at scale
Phase 5 (Day 7+): 100% of traffic, full production

Rollback criteria:
- Cache hit ratio < 70% (expected >85% by Phase 3)
- P95 latency > 2x existing PoPs in region
- Error rate > 0.1%
- Any hardware failures during ramp

Origin Shield / Mid-Tier Caching

Architecture

Without origin shield:
  Client → Edge PoP → Origin
  Problem: 200+ PoPs each independently fetching same content from origin
  Origin load: cache_miss_rate * total_requests * num_pops

With origin shield:
  Client → Edge PoP → Shield PoP → Origin
  Benefit: Shield consolidates misses from all edge PoPs in a region
  Origin load: cache_miss_rate * total_requests (not multiplied by num_pops)

Typical reduction: 60-80% fewer origin requests

Shield Placement

Shield regions (typically 3-8 globally):
- US East (Virginia) - covers NA East + EU overflow
- US West (Oregon) - covers NA West + APAC overflow
- EU West (Frankfurt) - covers EU + Middle East
- AP Northeast (Tokyo) - covers APAC North
- AP Southeast (Singapore) - covers APAC South
- SA East (Sao Paulo) - covers Latin America

Selection criteria:
- Low latency to origin servers (most origins in cloud regions)
- High bandwidth connectivity to edge PoPs
- Sufficient storage for full content catalog
- Redundancy: each shield has a failover shield

Shield Cache Behavior

Shield-specific optimizations:
- Larger cache capacity (10-50x edge PoP)
- Longer TTLs (can serve stale-while-revalidate to edges)
- Request coalescing: collapse concurrent misses for same object
- Negative caching: cache 404s to prevent origin hammering
- Connection pooling: maintain persistent connections to origin

Request coalescing detail:
- First request for uncached object: fetch from origin
- Concurrent requests for same object: queue and wait
- When origin responds: serve all queued requests from single fetch
- Prevents thundering herd on popular content expiry
- Implementation: per-URL mutex with timeout (5-10 seconds)

Consistent Hashing for Cache Distribution Within a PoP

Problem Statement

Within a PoP with 100+ servers:
- Each server has limited storage (50-500 TB)
- Total unique content >> single server capacity
- Need to route requests to the server most likely to have the content cached
- Must handle server additions/removals gracefully

Implementation

Hash ring configuration:
- Hash function: xxHash64 (fast, good distribution)
- Virtual nodes per server: 150-200 (ensures even distribution)
- Key: SHA-256(url + vary_key) mod ring_size
- Ring size: 2^64 (full 64-bit space)

Request routing within PoP:
1. Load balancer receives request
2. Compute hash of cache key (URL + relevant headers)
3. Find responsible server on hash ring (binary search)
4. Route request to that server
5. If server is down: route to next server on ring (replication)

Rebalancing on server add/remove:
- Adding 1 server to 100-server PoP: only 1% of objects need to move
- Removing 1 server: its objects redistribute to adjacent ring nodes
- No full re-hash required (unlike modulo-based sharding)

Two-Tier Routing

Tier 1: Consistent hash determines "primary" server for an object
Tier 2: Hot objects replicated to multiple servers

Detection of hot objects:
- Request rate > 10,000 req/s for single object
- Single server CPU > 80% due to one object
- Automatic promotion: replicate to 3-5 servers
- Load balancer distributes hot object requests across replicas

Implementation:
- Maintain "hot object list" updated every 5 seconds
- Hot objects bypass consistent hash → round-robin across replicas
- Cool-down: remove from hot list after 60 seconds below threshold

Hot Content Handling

Viral Content Detection

Signals for hot content:
- Request rate acceleration: >100% increase in 60 seconds
- Absolute threshold: >50,000 req/s for single URL
- Geographic spread: requests from >10 PoPs simultaneously
- Referrer analysis: traffic from social media platforms

Detection latency: <10 seconds from onset to detection

Mitigation Strategies

Strategy 1: Request coalescing (collapse)
- Multiple concurrent requests for same uncached object
- Only one request goes to origin
- All others wait and receive the same response
- Effective for: cache miss thundering herd

Strategy 2: Micro-caching
- Cache even "uncacheable" responses for 1-5 seconds
- Reduces origin load by 100-1000x during spikes
- Trade-off: slight staleness for massive load reduction

Strategy 3: Edge replication
- Replicate hot objects to all servers in PoP (not just hash-assigned)
- Spread load across entire PoP capacity
- Automatic: triggered when single-server load exceeds threshold

Strategy 4: Stale-while-revalidate
- Serve expired content while fetching fresh copy in background
- Users get instant response (stale by seconds/minutes)
- Origin gets single revalidation request instead of thundering herd

Strategy 5: Pre-positioning for known events
- Sports events, product launches: pre-warm all PoPs
- Push content to edges before event starts
- Coordinate with content providers for early access to assets

Video Streaming Optimization

Adaptive Bitrate (ABR) Delivery

HLS/DASH segment caching:
- Manifest files (.m3u8, .mpd): cache 1-5 seconds (live) or 1 hour (VOD)
- Video segments (.ts, .m4s): cache for hours/days (immutable by URL)
- Segment sizes: 2-10 seconds of video, 2-10 MB per segment
- Bitrate ladder: 6-8 renditions (360p to 4K)

Optimization techniques:
- Predictive prefetch: pre-cache next 2-3 segments based on playback position
- Bitrate-aware caching: prioritize popular bitrates (720p, 1080p)
- Manifest manipulation: inject CDN-specific segment URLs at edge
- CMAF: Common Media Application Format for unified HLS+DASH

Chunked Delivery and Pre-positioning

Live streaming optimization:
- Segments available at edge within 1-2 seconds of encoding
- Push-based distribution for live content (don't wait for pull)
- Regional fanout: origin → shield → edges (tree distribution)
- Latency target: <5 seconds glass-to-glass for live

VOD optimization:
- Pre-position popular titles to all PoPs during off-peak hours
- Tiered storage: first 10 minutes on SSD, rest on HDD
- Range request optimization: serve partial segments efficiently
- Byte-range coalescing: combine small range requests into larger reads

Video-Specific Caching Strategies

Cache key design for video:
- Include: URL path, segment number
- Exclude: session tokens, tracking params
- Normalize: remove cache-busting params that don't affect content

Storage optimization:
- Deduplication: same content at different bitrates shares base layer
- Compression: video segments already compressed, skip re-compression
- Tiered eviction: evict low-popularity bitrates first, keep 720p/1080p
- Storage allocation: 60-70% of edge storage dedicated to video

Edge Compute Scaling

Serverless at Edge Architecture

Execution model:
- V8 isolates (like Cloudflare Workers): <1ms cold start
- Container-based (like Lambda@Edge): 5-50ms cold start
- WebAssembly modules: <1ms cold start, near-native performance

Resource limits per invocation:
- CPU time: 5-50ms (viewer events), 5-30 seconds (origin events)
- Memory: 128 MB per isolate
- Subrequests: 50 fetch() calls per invocation
- Response size: 10 MB maximum
- Script size: 1-10 MB compressed

Scaling Edge Compute

Challenges:
- Millions of functions deployed across hundreds of PoPs
- Cold start latency must be minimal (<5ms for V8 isolates)
- Memory pressure from many concurrent isolates
- CPU contention between compute and cache serving

Solutions:
- Isolate pooling: pre-warm isolates for popular functions
- Tiered execution: simple functions at all edges, complex at regional
- Resource quotas: per-customer CPU/memory limits
- Overflow routing: redirect compute-heavy requests to compute PoPs
- Auto-scaling: spin up additional isolate capacity based on demand

Deployment model:
- Deploy to all 200+ PoPs within 30 seconds
- Canary deployment: 1% of traffic → 10% → 100%
- Instant rollback: revert to previous version in <5 seconds
- A/B testing: route percentage of traffic to different versions

Multi-CDN Strategies and Failover

Multi-CDN Architecture

Why multi-CDN:
- Redundancy: no single CDN is a SPOF
- Performance: different CDNs perform better in different regions
- Cost optimization: leverage competitive pricing
- Capacity: aggregate bandwidth across providers for mega-events

Implementation approaches:
1. DNS-based switching:
   - GeoDNS routes to best CDN per region
   - Health checks detect CDN outages
   - Failover time: 30-60 seconds (DNS TTL)

2. Client-side switching:
   - JavaScript/player logic detects failures
   - Automatic retry with alternate CDN URL
   - Failover time: <5 seconds
   - Best for video players

3. Origin-side routing:
   - Origin decides which CDN to use per request
   - Based on real-time performance data
   - Most control but adds origin complexity

Traffic Distribution Strategies

Strategy 1: Active-Active (performance-based)
- Continuously measure latency/throughput per CDN per region
- Route traffic to best-performing CDN for each user
- Rebalance every 5-15 minutes based on measurements
- Typical split: 60/40 or 70/30 between primary/secondary

Strategy 2: Active-Passive (failover)
- Primary CDN handles 100% of traffic
- Secondary CDN on standby (warm cache via prefetch)
- Automatic failover on primary degradation
- Failback after primary recovers (with validation)

Strategy 3: Content-based splitting
- Static assets → CDN A (best price for bandwidth)
- Video streaming → CDN B (best video optimization)
- API/dynamic → CDN C (best edge compute)
- Each CDN optimized for its content type

Failover Detection and Response

Health monitoring:
- Synthetic probes from 50+ global locations every 30 seconds
- RUM (Real User Monitoring) data from actual users
- Origin-side monitoring of CDN pull patterns
- Third-party monitoring (Catchpoint, ThousandEyes)

Failover triggers:
- Availability < 99.5% over 5-minute window
- P95 latency > 2x baseline for region
- Error rate > 1% for 3+ consecutive minutes
- Complete unreachability from 3+ probe locations

Failover execution:
- Update DNS records (TTL: 30-60 seconds)
- Notify operations team
- Begin cache warming on failover CDN
- Monitor failover CDN performance
- Document incident for post-mortem

Capacity Planning and Growth

Forecasting Model

Inputs:
- Historical traffic growth (typically 20-40% YoY for internet traffic)
- Customer pipeline (new large customers onboarding)
- Seasonal patterns (holiday shopping, summer streaming)
- One-time events (Olympics, elections, product launches)

Planning horizons:
- Short-term (0-3 months): handle with existing capacity + burst
- Medium-term (3-12 months): hardware procurement and deployment
- Long-term (1-3 years): new PoP construction, technology refresh

Capacity buffer:
- Maintain 40-50% headroom above average utilization
- Burst capacity: handle 3-5x average for 1-hour periods
- Emergency capacity: graceful degradation plan for >5x spikes

Technology Refresh Cycle

Hardware lifecycle:
- Servers: 3-4 year refresh cycle
- Network equipment: 5-7 year lifecycle
- Storage (SSD): 3-5 years (write endurance dependent)
- Storage (HDD): 4-5 years

Refresh strategy:
- Rolling replacement: 25-33% of fleet per year
- Performance improvement: each generation 30-50% better perf/watt
- Capacity growth: refresh provides organic capacity increase
- Zero-downtime: drain server, replace, re-add to pool