Scale & Constraints

📖 7 min read 📄 Part 2 of 10

API Rate Limiter - Scale and Constraints

Traffic Scale Analysis

Request Volume

  • Peak Requests: 10 million requests per second globally
  • Average Requests: 3 million requests per second
  • Daily Requests: 250 billion requests per day
  • Burst Multiplier: 5x normal traffic during spikes
  • Geographic Distribution: 40% US, 30% Europe, 20% Asia, 10% Other
  • Time Zone Patterns: 3x variation between peak and off-peak hours

User and Client Scale

  • Total Users: 500 million registered users
  • Active Users: 100 million daily active users
  • Concurrent Users: 10 million concurrent active sessions
  • API Keys: 10 million unique API keys across all tiers
  • IP Addresses: 50 million unique IP addresses per day
  • Endpoints: 5,000 unique API endpoints to rate limit

Rate Limiting Rules

  • Total Rules: 100,000 active rate limiting rules
  • Per-User Rules: 50 million individual user quotas
  • Per-API-Key Rules: 10 million API key configurations
  • Per-Endpoint Rules: 5,000 endpoint-specific limits
  • Global Rules: 100 system-wide rate limiting policies
  • Dynamic Rules: 1,000 rules updated per hour

Storage Requirements

In-Memory Storage (Redis/Memcached)

  • Counter Storage: 10 million active counters × 100 bytes = 1 GB
  • User Metadata: 100 million users × 200 bytes = 20 GB
  • API Key Cache: 10 million keys × 500 bytes = 5 GB
  • Rule Cache: 100,000 rules × 2 KB = 200 MB
  • Sliding Window Logs: 10 million users × 10 KB = 100 GB (if using log-based)
  • Total Hot Storage: ~130 GB per region (with overhead)

Persistent Storage (Database)

  • Configuration Data: 100,000 rules × 5 KB = 500 MB
  • Historical Metrics: 1 TB per month (aggregated data)
  • Audit Logs: 10 TB per month (detailed logs)
  • User Quotas: 100 million users × 1 KB = 100 GB
  • API Key Metadata: 10 million keys × 2 KB = 20 GB
  • Total Persistent Storage: ~12 TB per year

Backup and Archival

  • Configuration Backups: 1 GB (daily snapshots)
  • Metrics Archive: 100 GB per year (compressed)
  • Audit Archive: 1 TB per year (compressed)
  • Disaster Recovery: 3x replication across regions
  • Total Archive Storage: ~3 TB per year

Network Bandwidth

Inbound Traffic

  • Request Headers: 10M req/s × 2 KB = 20 GB/s
  • Rate Limit Queries: 10M req/s × 100 bytes = 1 GB/s
  • Configuration Updates: 1,000 updates/hour × 10 KB = ~3 KB/s
  • Total Inbound: ~21 GB/s peak

Outbound Traffic

  • Rate Limit Responses: 10M req/s × 200 bytes = 2 GB/s
  • Metrics Export: 100 MB/s continuous
  • Log Streaming: 500 MB/s continuous
  • Replication Traffic: 5 GB/s cross-region
  • Total Outbound: ~8 GB/s peak

Cross-Region Replication

  • Configuration Sync: 10 MB/s continuous
  • Counter Synchronization: 1 GB/s (if using global counters)
  • Metrics Aggregation: 100 MB/s continuous
  • Total Replication: ~1.1 GB/s per region pair

Compute Requirements

CPU Resources

  • Rate Limit Decisions: 10M req/s × 0.1ms = 1,000 CPU cores
  • Counter Updates: 10M req/s × 0.05ms = 500 CPU cores
  • Cache Operations: 10M req/s × 0.02ms = 200 CPU cores
  • Metrics Processing: 100 CPU cores continuous
  • Total CPU: ~2,000 cores per region at peak

Memory Resources

  • Application Memory: 50 GB per server (100 servers) = 5 TB
  • Cache Memory: 130 GB per region × 5 regions = 650 GB
  • OS and Overhead: 20% additional = 1.1 TB
  • Total Memory: ~6.8 TB globally

Server Infrastructure

  • Rate Limiter Servers: 100 servers per region × 5 regions = 500 servers
  • Cache Servers: 20 Redis servers per region × 5 regions = 100 servers
  • Database Servers: 10 servers per region × 5 regions = 50 servers
  • Load Balancers: 5 per region × 5 regions = 25 load balancers
  • Total Servers: ~675 servers globally

Latency Constraints

Response Time Requirements

  • Rate Limit Decision: <5ms P99 latency
  • Cache Lookup: <1ms P99 latency
  • Counter Increment: <2ms P99 latency
  • Configuration Fetch: <10ms P99 latency
  • Cross-Region Sync: <100ms P99 latency
  • End-to-End Overhead: <10ms P99 added to request

Geographic Latency

  • Same Region: <5ms between client and rate limiter
  • Cross-Region: <100ms for configuration sync
  • Global Consistency: <1 second for counter convergence
  • CDN Edge: <2ms for edge-based rate limiting
  • Database Queries: <20ms for persistent storage access

Consistency and Accuracy

Distributed Counter Accuracy

  • Single Server: 100% accuracy
  • Same Region: 99.9% accuracy (eventual consistency)
  • Cross-Region: 99% accuracy (higher latency sync)
  • Convergence Time: <1 second for 99% accuracy
  • Race Condition Impact: <0.1% over-limit requests
  • Clock Skew Tolerance: ±5 seconds across servers

Data Consistency Models

  • Strong Consistency: Configuration changes (CP system)
  • Eventual Consistency: Rate limit counters (AP system)
  • Causal Consistency: User quota updates
  • Session Consistency: Same user, same server affinity
  • Monotonic Reads: Counter values never decrease
  • Read-Your-Writes: User sees their own quota updates immediately

Failure Scenarios and Tolerances

Server Failures

  • Single Server Failure: <0.1% traffic impact (load balancer failover)
  • Region Failure: 20% traffic impact (redirect to other regions)
  • Cache Failure: Degrade to database (10x latency increase)
  • Database Failure: Use cached data (stale up to 5 minutes)
  • Network Partition: Fail open or closed based on configuration

Recovery Time Objectives

  • Server Recovery: <30 seconds (automatic restart)
  • Cache Rebuild: <5 minutes (warm cache from database)
  • Region Failover: <2 minutes (DNS + load balancer)
  • Database Recovery: <15 minutes (replica promotion)
  • Full System Recovery: <1 hour (disaster recovery)

Cost Analysis

Infrastructure Costs (Monthly)

  • Compute: 675 servers × $200/month = $135,000
  • Memory/Cache: 650 GB Redis × $0.50/GB/month = $325
  • Storage: 12 TB × $0.10/GB/month = $1,200
  • Network: 100 TB egress × $0.05/GB = $5,000
  • Load Balancers: 25 × $20/month = $500
  • Total Infrastructure: ~$142,000/month

Operational Costs (Monthly)

  • Monitoring: $5,000/month (metrics, logs, traces)
  • Support: $10,000/month (on-call, incident response)
  • Development: $50,000/month (feature development, maintenance)
  • Total Operational: ~$65,000/month

Cost Per Request

  • Total Monthly Cost: $207,000
  • Monthly Requests: 250B requests × 30 days = 7.5 trillion requests
  • Cost Per Million Requests: $0.028
  • Cost Per Request: $0.000000028 (~$0.03 per million)

Scaling Strategies

Horizontal Scaling

  • Add Servers: Linear scaling up to 1,000 servers per region
  • Shard by User ID: Distribute users across servers
  • Shard by API Key: Distribute API keys across servers
  • Geographic Sharding: Deploy in additional regions
  • Auto-Scaling: Scale based on request volume (2-10x capacity)

Vertical Scaling

  • Increase Server Size: Up to 64 cores, 256 GB RAM per server
  • Faster Storage: NVMe SSDs for cache and database
  • Network Upgrades: 10 Gbps to 100 Gbps network interfaces
  • Memory Optimization: Larger Redis instances (up to 1 TB)

Caching Strategies

  • Multi-Level Cache: L1 (in-memory), L2 (Redis), L3 (database)
  • Cache Warming: Pre-populate cache with hot data
  • Cache Partitioning: Separate caches for different data types
  • TTL Optimization: Shorter TTL for hot data, longer for cold data
  • Cache Aside Pattern: Application manages cache population

Bottleneck Analysis

Primary Bottlenecks

  1. Redis Throughput: 100K ops/sec per instance limit
  2. Network Bandwidth: 10 Gbps per server limit
  3. Database Write Throughput: 10K writes/sec per shard
  4. Cross-Region Latency: 100ms+ for global consistency
  5. Memory Capacity: 130 GB cache per region

Mitigation Strategies

  1. Redis Clustering: Shard across 20+ Redis instances
  2. Network Optimization: Use 100 Gbps interfaces, compression
  3. Batch Writes: Aggregate database writes every 10 seconds
  4. Local Enforcement: Rate limit locally, sync asynchronously
  5. Cache Optimization: Compress data, use efficient data structures

This scale analysis provides the foundation for designing a rate limiting system that can handle massive traffic volumes while maintaining low latency and high accuracy.