Scale & Constraints

📖 7 min read 📄 Part 2 of 10

API Rate Limiter - Scale and Constraints

Traffic Scale Analysis

Request Volume

Peak Requests: 10 million requests per second globally
Average Requests: 3 million requests per second
Daily Requests: 250 billion requests per day
Burst Multiplier: 5x normal traffic during spikes
Geographic Distribution: 40% US, 30% Europe, 20% Asia, 10% Other
Time Zone Patterns: 3x variation between peak and off-peak hours

User and Client Scale

Total Users: 500 million registered users
Active Users: 100 million daily active users
Concurrent Users: 10 million concurrent active sessions
API Keys: 10 million unique API keys across all tiers
IP Addresses: 50 million unique IP addresses per day
Endpoints: 5,000 unique API endpoints to rate limit

Rate Limiting Rules

Total Rules: 100,000 active rate limiting rules
Per-User Rules: 50 million individual user quotas
Per-API-Key Rules: 10 million API key configurations
Per-Endpoint Rules: 5,000 endpoint-specific limits
Global Rules: 100 system-wide rate limiting policies
Dynamic Rules: 1,000 rules updated per hour

Storage Requirements

In-Memory Storage (Redis/Memcached)

Counter Storage: 10 million active counters × 100 bytes = 1 GB
User Metadata: 100 million users × 200 bytes = 20 GB
API Key Cache: 10 million keys × 500 bytes = 5 GB
Rule Cache: 100,000 rules × 2 KB = 200 MB
Sliding Window Logs: 10 million users × 10 KB = 100 GB (if using log-based)
Total Hot Storage: ~130 GB per region (with overhead)

Persistent Storage (Database)

Configuration Data: 100,000 rules × 5 KB = 500 MB
Historical Metrics: 1 TB per month (aggregated data)
Audit Logs: 10 TB per month (detailed logs)
User Quotas: 100 million users × 1 KB = 100 GB
API Key Metadata: 10 million keys × 2 KB = 20 GB
Total Persistent Storage: ~12 TB per year

Backup and Archival

Configuration Backups: 1 GB (daily snapshots)
Metrics Archive: 100 GB per year (compressed)
Audit Archive: 1 TB per year (compressed)
Disaster Recovery: 3x replication across regions
Total Archive Storage: ~3 TB per year

Network Bandwidth

Inbound Traffic

Request Headers: 10M req/s × 2 KB = 20 GB/s
Rate Limit Queries: 10M req/s × 100 bytes = 1 GB/s
Configuration Updates: 1,000 updates/hour × 10 KB = ~3 KB/s
Total Inbound: ~21 GB/s peak

Outbound Traffic

Rate Limit Responses: 10M req/s × 200 bytes = 2 GB/s
Metrics Export: 100 MB/s continuous
Log Streaming: 500 MB/s continuous
Replication Traffic: 5 GB/s cross-region
Total Outbound: ~8 GB/s peak

Cross-Region Replication

Configuration Sync: 10 MB/s continuous
Counter Synchronization: 1 GB/s (if using global counters)
Metrics Aggregation: 100 MB/s continuous
Total Replication: ~1.1 GB/s per region pair

Compute Requirements

CPU Resources

Rate Limit Decisions: 10M req/s × 0.1ms = 1,000 CPU cores
Counter Updates: 10M req/s × 0.05ms = 500 CPU cores
Cache Operations: 10M req/s × 0.02ms = 200 CPU cores
Metrics Processing: 100 CPU cores continuous
Total CPU: ~2,000 cores per region at peak

Memory Resources

Application Memory: 50 GB per server (100 servers) = 5 TB
Cache Memory: 130 GB per region × 5 regions = 650 GB
OS and Overhead: 20% additional = 1.1 TB
Total Memory: ~6.8 TB globally

Server Infrastructure

Rate Limiter Servers: 100 servers per region × 5 regions = 500 servers
Cache Servers: 20 Redis servers per region × 5 regions = 100 servers
Database Servers: 10 servers per region × 5 regions = 50 servers
Load Balancers: 5 per region × 5 regions = 25 load balancers
Total Servers: ~675 servers globally

Latency Constraints

Response Time Requirements

Rate Limit Decision: <5ms P99 latency
Cache Lookup: <1ms P99 latency
Counter Increment: <2ms P99 latency
Configuration Fetch: <10ms P99 latency
Cross-Region Sync: <100ms P99 latency
End-to-End Overhead: <10ms P99 added to request

Geographic Latency

Same Region: <5ms between client and rate limiter
Cross-Region: <100ms for configuration sync
Global Consistency: <1 second for counter convergence
CDN Edge: <2ms for edge-based rate limiting
Database Queries: <20ms for persistent storage access

Consistency and Accuracy

Distributed Counter Accuracy

Single Server: 100% accuracy
Same Region: 99.9% accuracy (eventual consistency)
Cross-Region: 99% accuracy (higher latency sync)
Convergence Time: <1 second for 99% accuracy
Race Condition Impact: <0.1% over-limit requests
Clock Skew Tolerance: ±5 seconds across servers

Data Consistency Models

Strong Consistency: Configuration changes (CP system)
Eventual Consistency: Rate limit counters (AP system)
Causal Consistency: User quota updates
Session Consistency: Same user, same server affinity
Monotonic Reads: Counter values never decrease
Read-Your-Writes: User sees their own quota updates immediately

Failure Scenarios and Tolerances

Server Failures

Single Server Failure: <0.1% traffic impact (load balancer failover)
Region Failure: 20% traffic impact (redirect to other regions)
Cache Failure: Degrade to database (10x latency increase)
Database Failure: Use cached data (stale up to 5 minutes)
Network Partition: Fail open or closed based on configuration

Recovery Time Objectives

Server Recovery: <30 seconds (automatic restart)
Cache Rebuild: <5 minutes (warm cache from database)
Region Failover: <2 minutes (DNS + load balancer)
Database Recovery: <15 minutes (replica promotion)
Full System Recovery: <1 hour (disaster recovery)

Cost Analysis

Infrastructure Costs (Monthly)

Compute: 675 servers × $200/month = $135,000
Memory/Cache: 650 GB Redis × $0.50/GB/month = $325
Storage: 12 TB × $0.10/GB/month = $1,200
Network: 100 TB egress × $0.05/GB = $5,000
Load Balancers: 25 × $20/month = $500
Total Infrastructure: ~$142,000/month

Operational Costs (Monthly)

Monitoring: $5,000/month (metrics, logs, traces)
Support: $10,000/month (on-call, incident response)
Development: $50,000/month (feature development, maintenance)
Total Operational: ~$65,000/month

Cost Per Request

Total Monthly Cost: $207,000
Monthly Requests: 250B requests × 30 days = 7.5 trillion requests
Cost Per Million Requests: $0.028
Cost Per Request: $0.000000028 (~$0.03 per million)

Scaling Strategies

Horizontal Scaling

Add Servers: Linear scaling up to 1,000 servers per region
Shard by User ID: Distribute users across servers
Shard by API Key: Distribute API keys across servers
Geographic Sharding: Deploy in additional regions
Auto-Scaling: Scale based on request volume (2-10x capacity)

Vertical Scaling

Increase Server Size: Up to 64 cores, 256 GB RAM per server
Faster Storage: NVMe SSDs for cache and database
Network Upgrades: 10 Gbps to 100 Gbps network interfaces
Memory Optimization: Larger Redis instances (up to 1 TB)

Caching Strategies

Multi-Level Cache: L1 (in-memory), L2 (Redis), L3 (database)
Cache Warming: Pre-populate cache with hot data
Cache Partitioning: Separate caches for different data types
TTL Optimization: Shorter TTL for hot data, longer for cold data
Cache Aside Pattern: Application manages cache population

Bottleneck Analysis

Primary Bottlenecks

Redis Throughput: 100K ops/sec per instance limit
Network Bandwidth: 10 Gbps per server limit
Database Write Throughput: 10K writes/sec per shard
Cross-Region Latency: 100ms+ for global consistency
Memory Capacity: 130 GB cache per region

Mitigation Strategies

Redis Clustering: Shard across 20+ Redis instances
Network Optimization: Use 100 Gbps interfaces, compression
Batch Writes: Aggregate database writes every 10 seconds
Local Enforcement: Rate limit locally, sync asynchronously
Cache Optimization: Compress data, use efficient data structures

This scale analysis provides the foundation for designing a rate limiting system that can handle massive traffic volumes while maintaining low latency and high accuracy.