API Rate Limiter - Scale and Constraints
Traffic Scale Analysis
Request Volume
- Peak Requests: 10 million requests per second globally
- Average Requests: 3 million requests per second
- Daily Requests: 250 billion requests per day
- Burst Multiplier: 5x normal traffic during spikes
- Geographic Distribution: 40% US, 30% Europe, 20% Asia, 10% Other
- Time Zone Patterns: 3x variation between peak and off-peak hours
User and Client Scale
- Total Users: 500 million registered users
- Active Users: 100 million daily active users
- Concurrent Users: 10 million concurrent active sessions
- API Keys: 10 million unique API keys across all tiers
- IP Addresses: 50 million unique IP addresses per day
- Endpoints: 5,000 unique API endpoints to rate limit
Rate Limiting Rules
- Total Rules: 100,000 active rate limiting rules
- Per-User Rules: 50 million individual user quotas
- Per-API-Key Rules: 10 million API key configurations
- Per-Endpoint Rules: 5,000 endpoint-specific limits
- Global Rules: 100 system-wide rate limiting policies
- Dynamic Rules: 1,000 rules updated per hour
Storage Requirements
In-Memory Storage (Redis/Memcached)
- Counter Storage: 10 million active counters × 100 bytes = 1 GB
- User Metadata: 100 million users × 200 bytes = 20 GB
- API Key Cache: 10 million keys × 500 bytes = 5 GB
- Rule Cache: 100,000 rules × 2 KB = 200 MB
- Sliding Window Logs: 10 million users × 10 KB = 100 GB (if using log-based)
- Total Hot Storage: ~130 GB per region (with overhead)
Persistent Storage (Database)
- Configuration Data: 100,000 rules × 5 KB = 500 MB
- Historical Metrics: 1 TB per month (aggregated data)
- Audit Logs: 10 TB per month (detailed logs)
- User Quotas: 100 million users × 1 KB = 100 GB
- API Key Metadata: 10 million keys × 2 KB = 20 GB
- Total Persistent Storage: ~12 TB per year
Backup and Archival
- Configuration Backups: 1 GB (daily snapshots)
- Metrics Archive: 100 GB per year (compressed)
- Audit Archive: 1 TB per year (compressed)
- Disaster Recovery: 3x replication across regions
- Total Archive Storage: ~3 TB per year
Network Bandwidth
Inbound Traffic
- Request Headers: 10M req/s × 2 KB = 20 GB/s
- Rate Limit Queries: 10M req/s × 100 bytes = 1 GB/s
- Configuration Updates: 1,000 updates/hour × 10 KB = ~3 KB/s
- Total Inbound: ~21 GB/s peak
Outbound Traffic
- Rate Limit Responses: 10M req/s × 200 bytes = 2 GB/s
- Metrics Export: 100 MB/s continuous
- Log Streaming: 500 MB/s continuous
- Replication Traffic: 5 GB/s cross-region
- Total Outbound: ~8 GB/s peak
Cross-Region Replication
- Configuration Sync: 10 MB/s continuous
- Counter Synchronization: 1 GB/s (if using global counters)
- Metrics Aggregation: 100 MB/s continuous
- Total Replication: ~1.1 GB/s per region pair
Compute Requirements
CPU Resources
- Rate Limit Decisions: 10M req/s × 0.1ms = 1,000 CPU cores
- Counter Updates: 10M req/s × 0.05ms = 500 CPU cores
- Cache Operations: 10M req/s × 0.02ms = 200 CPU cores
- Metrics Processing: 100 CPU cores continuous
- Total CPU: ~2,000 cores per region at peak
Memory Resources
- Application Memory: 50 GB per server (100 servers) = 5 TB
- Cache Memory: 130 GB per region × 5 regions = 650 GB
- OS and Overhead: 20% additional = 1.1 TB
- Total Memory: ~6.8 TB globally
Server Infrastructure
- Rate Limiter Servers: 100 servers per region × 5 regions = 500 servers
- Cache Servers: 20 Redis servers per region × 5 regions = 100 servers
- Database Servers: 10 servers per region × 5 regions = 50 servers
- Load Balancers: 5 per region × 5 regions = 25 load balancers
- Total Servers: ~675 servers globally
Latency Constraints
Response Time Requirements
- Rate Limit Decision: <5ms P99 latency
- Cache Lookup: <1ms P99 latency
- Counter Increment: <2ms P99 latency
- Configuration Fetch: <10ms P99 latency
- Cross-Region Sync: <100ms P99 latency
- End-to-End Overhead: <10ms P99 added to request
Geographic Latency
- Same Region: <5ms between client and rate limiter
- Cross-Region: <100ms for configuration sync
- Global Consistency: <1 second for counter convergence
- CDN Edge: <2ms for edge-based rate limiting
- Database Queries: <20ms for persistent storage access
Consistency and Accuracy
Distributed Counter Accuracy
- Single Server: 100% accuracy
- Same Region: 99.9% accuracy (eventual consistency)
- Cross-Region: 99% accuracy (higher latency sync)
- Convergence Time: <1 second for 99% accuracy
- Race Condition Impact: <0.1% over-limit requests
- Clock Skew Tolerance: ±5 seconds across servers
Data Consistency Models
- Strong Consistency: Configuration changes (CP system)
- Eventual Consistency: Rate limit counters (AP system)
- Causal Consistency: User quota updates
- Session Consistency: Same user, same server affinity
- Monotonic Reads: Counter values never decrease
- Read-Your-Writes: User sees their own quota updates immediately
Failure Scenarios and Tolerances
Server Failures
- Single Server Failure: <0.1% traffic impact (load balancer failover)
- Region Failure: 20% traffic impact (redirect to other regions)
- Cache Failure: Degrade to database (10x latency increase)
- Database Failure: Use cached data (stale up to 5 minutes)
- Network Partition: Fail open or closed based on configuration
Recovery Time Objectives
- Server Recovery: <30 seconds (automatic restart)
- Cache Rebuild: <5 minutes (warm cache from database)
- Region Failover: <2 minutes (DNS + load balancer)
- Database Recovery: <15 minutes (replica promotion)
- Full System Recovery: <1 hour (disaster recovery)
Cost Analysis
Infrastructure Costs (Monthly)
- Compute: 675 servers × $200/month = $135,000
- Memory/Cache: 650 GB Redis × $0.50/GB/month = $325
- Storage: 12 TB × $0.10/GB/month = $1,200
- Network: 100 TB egress × $0.05/GB = $5,000
- Load Balancers: 25 × $20/month = $500
- Total Infrastructure: ~$142,000/month
Operational Costs (Monthly)
- Monitoring: $5,000/month (metrics, logs, traces)
- Support: $10,000/month (on-call, incident response)
- Development: $50,000/month (feature development, maintenance)
- Total Operational: ~$65,000/month
Cost Per Request
- Total Monthly Cost: $207,000
- Monthly Requests: 250B requests × 30 days = 7.5 trillion requests
- Cost Per Million Requests: $0.028
- Cost Per Request: $0.000000028 (~$0.03 per million)
Scaling Strategies
Horizontal Scaling
- Add Servers: Linear scaling up to 1,000 servers per region
- Shard by User ID: Distribute users across servers
- Shard by API Key: Distribute API keys across servers
- Geographic Sharding: Deploy in additional regions
- Auto-Scaling: Scale based on request volume (2-10x capacity)
Vertical Scaling
- Increase Server Size: Up to 64 cores, 256 GB RAM per server
- Faster Storage: NVMe SSDs for cache and database
- Network Upgrades: 10 Gbps to 100 Gbps network interfaces
- Memory Optimization: Larger Redis instances (up to 1 TB)
Caching Strategies
- Multi-Level Cache: L1 (in-memory), L2 (Redis), L3 (database)
- Cache Warming: Pre-populate cache with hot data
- Cache Partitioning: Separate caches for different data types
- TTL Optimization: Shorter TTL for hot data, longer for cold data
- Cache Aside Pattern: Application manages cache population
Bottleneck Analysis
Primary Bottlenecks
- Redis Throughput: 100K ops/sec per instance limit
- Network Bandwidth: 10 Gbps per server limit
- Database Write Throughput: 10K writes/sec per shard
- Cross-Region Latency: 100ms+ for global consistency
- Memory Capacity: 130 GB cache per region
Mitigation Strategies
- Redis Clustering: Shard across 20+ Redis instances
- Network Optimization: Use 100 Gbps interfaces, compression
- Batch Writes: Aggregate database writes every 10 seconds
- Local Enforcement: Rate limit locally, sync asynchronously
- Cache Optimization: Compress data, use efficient data structures
This scale analysis provides the foundation for designing a rate limiting system that can handle massive traffic volumes while maintaining low latency and high accuracy.