API Rate Limiter - Scaling Considerations
Horizontal Scaling Strategies
Stateless Rate Limiter Servers
Benefits:
- Easy to add/remove servers
- No session affinity required
- Simple load balancing
- Fast deployment and rollback
Implementation:
- All state in Redis/database
- Servers are interchangeable
- Auto-scaling based on CPU/memory
- Rolling deployments without downtime
Scaling Pattern:
Normal Load: 50 servers
Peak Load: 150 servers (3x)
Auto-scale triggers:
- CPU > 70% for 5 minutes → scale up
- CPU < 30% for 15 minutes → scale downRedis Cluster Scaling
Single Redis Instance Limits:
- 100K operations/second
- 256 GB memory
- Single point of failure
Redis Cluster Solution:
- 20 master nodes
- 20 replica nodes (1 per master)
- 2M operations/second total
- 5 TB total memory
- Automatic failover
Sharding Strategy:
- Hash slot based (16384 slots)
- Consistent hashing
- Automatic rebalancing
- Minimal data movement
Scaling Process:
1. Add new master+replica pair
2. Rebalance hash slots
3. Migrate data gradually
4. Update client routingDatabase Sharding
Vertical Sharding (by table):
- Rules table: Shard 0
- User quotas: Shards 1-10
- Audit logs: Shards 11-20
- Metrics: Separate time-series DB
Horizontal Sharding (by user_id):
Shard 0: user_id % 10 = 0
Shard 1: user_id % 10 = 1
...
Shard 9: user_id % 10 = 9
Benefits:
- Distribute load evenly
- Parallel query execution
- Independent scaling
- Fault isolationPerformance Optimization
Multi-Level Caching
L1 Cache (In-Memory):
- Location: Application server
- Size: 100 MB per server
- TTL: 60 seconds
- Hit Rate: 80%
- Latency: <1ms
L2 Cache (Redis):
- Location: Redis cluster
- Size: 130 GB per region
- TTL: 300 seconds
- Hit Rate: 95%
- Latency: <2ms
L3 Cache (Database Query Cache):
- Location: PostgreSQL
- Size: 50 GB
- TTL: 600 seconds
- Hit Rate: 70%
- Latency: <10ms
Cache Hierarchy:
Request → L1 → L2 → L3 → Database
↓ ↓ ↓
80% 15% 4% 1%Batch Processing
Counter Updates:
- Buffer updates locally (100ms)
- Batch write to Redis
- Reduce network calls by 10x
- Trade-off: Slight accuracy loss
Configuration Changes:
- Batch propagate every 5 seconds
- Reduce config sync overhead
- Eventual consistency acceptable
Metrics Collection:
- Aggregate locally (1 minute)
- Batch write to time-series DB
- Reduce write load by 60xConnection Pooling
Redis Connection Pool:
- Min connections: 10 per server
- Max connections: 100 per server
- Idle timeout: 300 seconds
- Connection reuse: 1000 requests
Database Connection Pool:
- Min connections: 5 per server
- Max connections: 50 per server
- Connection lifetime: 3600 seconds
- Statement caching enabled
Benefits:
- Reduce connection overhead
- Better resource utilization
- Faster request processing
- Lower latency varianceGeographic Distribution
Multi-Region Deployment
Regions:
- US-East (Primary)
- US-West (Secondary)
- EU-West (Primary)
- Asia-Pacific (Primary)
- South America (Secondary)
Traffic Routing:
- GeoDNS for region selection
- <50ms latency within region
- <200ms cross-region
- Automatic failover
Data Replication:
- Configuration: Synchronous (strong consistency)
- Counters: Asynchronous (eventual consistency)
- Metrics: Asynchronous (best effort)Cross-Region Synchronization
Configuration Sync:
- Method: Active-active replication
- Latency: <1 second
- Consistency: Strong (Raft consensus)
- Conflict Resolution: Last-write-wins with version vectors
Counter Sync:
- Method: Periodic aggregation
- Frequency: Every 10 seconds
- Consistency: Eventual
- Accuracy: 99% within 1 second
Implementation:
1. Local enforcement (fast)
2. Background sync to global store
3. Periodic reconciliation
4. Conflict detection and resolutionLoad Balancing Strategies
Request Distribution
Layer 4 Load Balancing (TCP):
- Algorithm: Least connections
- Health checks: TCP port 6379 (Redis)
- Failover: <1 second
- Session persistence: Source IP hash
Layer 7 Load Balancing (HTTP):
- Algorithm: Round robin with weights
- Health checks: HTTP /health endpoint
- Sticky sessions: Not required (stateless)
- SSL termination: At load balancer
Geographic Load Balancing:
- DNS-based routing
- Latency-based selection
- Automatic failover to nearest region
- Traffic splitting for A/B testingAuto-Scaling Configuration
# Kubernetes HPA (Horizontal Pod Autoscaler)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: rate-limiter-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: rate-limiter
minReplicas: 50
maxReplicas: 200
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
- type: Pods
pods:
metric:
name: requests_per_second
target:
type: AverageValue
averageValue: "20000"
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60Bottleneck Identification and Mitigation
Redis Bottlenecks
Problem: Redis CPU saturation at 100K ops/sec
Solutions:
1. Redis Cluster (20 nodes)
- Throughput: 2M ops/sec
- Cost: 20x infrastructure
2. Read Replicas (5 replicas per master)
- Read throughput: 500K ops/sec
- Write throughput: 100K ops/sec
3. Client-side caching
- Reduce Redis load by 80%
- Trade-off: Stale data up to 60s
4. Pipelining
- Batch 100 commands
- Reduce network RTT by 100xNetwork Bottlenecks
Problem: 10 Gbps network saturation
Solutions:
1. Upgrade to 100 Gbps interfaces
- Cost: $5000/month per server
- Benefit: 10x capacity
2. Compression
- Reduce payload size by 70%
- CPU overhead: +5%
3. Protocol optimization
- Binary protocol vs JSON
- Reduce overhead by 50%
4. Regional caching
- Serve 95% locally
- Cross-region: 5%Database Write Bottlenecks
Problem: 10K writes/sec limit per shard
Solutions:
1. Batch writes
- Aggregate 100 writes
- Reduce load by 100x
2. Async writes
- Write to queue first
- Process in background
3. More shards
- 10 shards → 100K writes/sec
- Linear scaling
4. Time-series database
- Optimized for metrics
- 1M writes/sec per nodeCapacity Planning
Growth Projections
Current (Year 1):
- 10M requests/second
- 100M users
- 500 servers
- $142K/month
Year 2 (2x growth):
- 20M requests/second
- 200M users
- 1000 servers
- $284K/month
Year 3 (3x growth):
- 30M requests/second
- 300M users
- 1500 servers
- $426K/month
Scaling Strategy:
- Horizontal scaling (linear cost)
- Optimize algorithms (reduce cost/request)
- Regional expansion (reduce latency)Resource Allocation
Per 1M requests/second:
- Servers: 50 (20K req/s each)
- Redis: 2 clusters (10 nodes each)
- Database: 5 shards
- Load Balancers: 5
- Cost: $14,200/month
Breakdown:
- Compute: 60% ($8,520)
- Storage: 15% ($2,130)
- Network: 20% ($2,840)
- Other: 5% ($710)Monitoring and Alerting for Scale
Key Metrics to Monitor
Performance Metrics:
- Requests per second (by region, endpoint)
- Latency (P50, P95, P99)
- Error rate (4xx, 5xx)
- Cache hit rate
Resource Metrics:
- CPU utilization (per server)
- Memory usage (per server)
- Network bandwidth (in/out)
- Disk I/O (IOPS, throughput)
Business Metrics:
- Throttle rate (by user, tier)
- Quota utilization
- API key usage
- Revenue impact
Alerts:
- P99 latency > 10ms (warning)
- Error rate > 1% (critical)
- CPU > 80% (warning)
- Redis memory > 90% (critical)
- Throttle rate > 10% (warning)Scaling Triggers
Scale Up Triggers:
- CPU > 70% for 5 minutes
- Memory > 80% for 5 minutes
- Request queue > 1000
- P99 latency > 20ms
- Redis ops > 80K/sec
Scale Down Triggers:
- CPU < 30% for 15 minutes
- Memory < 50% for 15 minutes
- Request queue < 100
- P99 latency < 5ms
Cooldown Periods:
- Scale up: 60 seconds
- Scale down: 300 secondsCost Optimization
Right-Sizing Resources
Before Optimization:
- 500 servers × $200/month = $100K
- Over-provisioned by 40%
After Optimization:
- 300 servers × $200/month = $60K
- Savings: $40K/month (40%)
Strategies:
- Use spot instances (70% savings)
- Reserved instances (40% savings)
- Auto-scaling (30% savings)
- Resource right-sizing (20% savings)Caching ROI
Without Caching:
- Database queries: 10M/sec
- Database cost: $50K/month
With Redis Caching (95% hit rate):
- Database queries: 500K/sec
- Redis cost: $5K/month
- Database cost: $5K/month
- Total: $10K/month
- Savings: $40K/month (80%)This comprehensive scaling strategy ensures the rate limiting system can handle massive growth while maintaining performance and controlling costs.