Distributed Unique ID Generator - Scaling Considerations
Horizontal Scaling Strategy
Node Scaling Patterns
Current Capacity (10 nodes):
- 10 nodes × 10,000 IDs/sec = 100,000 IDs/sec
- Headroom: 10x current load (10,000 IDs/sec)
Scaling Triggers:
- CPU usage > 70% sustained for 5 minutes
- Average latency > 5ms for 5 minutes
- Sequence overflow rate > 1% of requests
- Request queue depth > 1000
Scaling Actions:
- Add 2-4 nodes per scaling event
- Distribute across availability zones
- Update load balancer configuration
- Monitor for 10 minutes before next scaleWorker ID Allocation at Scale
Static Allocation (Small Scale):
- Manual worker ID assignment
- Configuration file per node
- Simple, predictable
- Suitable for <50 nodes
Dynamic Allocation (Large Scale):
- Automatic worker ID assignment via Zookeeper/etcd
- Ephemeral nodes for automatic cleanup
- Supports 1,024 workers globally
- Suitable for 50+ nodes
Hybrid Approach:
- Static allocation for core workers (0-15)
- Dynamic allocation for auto-scaled workers (16-31)
- Best of both worldsGeographic Scaling
Multi-Region Deployment
Region Distribution:
┌──────────────────────────────────────────────────────────┐
│ US-East (DC 0): 40% traffic → 12 workers │
│ US-West (DC 1): 25% traffic → 8 workers │
│ EU-West (DC 2): 20% traffic → 6 workers │
│ APAC (DC 3): 15% traffic → 6 workers │
└──────────────────────────────────────────────────────────┘
Benefits:
- Reduced latency (local generation)
- Improved availability (regional isolation)
- Regulatory compliance (data residency)
- Load distribution across regionsCross-Region Failover
Failover Strategy:
1. Health check detects region failure
2. GeoDNS routes traffic to next nearest region
3. Backup region handles 2x normal load
4. Auto-scale triggers in backup region
5. Failed region recovers and rejoins
Recovery Time:
- Detection: 30 seconds (health check interval)
- DNS propagation: 60 seconds (TTL)
- Total failover: ~90 secondsVertical Scaling Considerations
Resource Optimization
CPU Scaling:
- 2 cores: 20,000 IDs/sec
- 4 cores: 40,000 IDs/sec
- 8 cores: 80,000 IDs/sec
- Diminishing returns beyond 8 cores
Memory Scaling:
- Base: 50MB (application)
- Per 1000 req/sec: +10MB
- 10,000 req/sec: ~150MB
- Memory not a bottleneck
Network Scaling:
- 1 Gbps NIC: 15M IDs/sec theoretical
- Network rarely bottleneck
- Focus on CPU optimizationLoad Balancing Strategies
Load Balancer Configuration
Algorithm Options:
1. Round Robin
- Simple, fair distribution
- No session affinity needed
- Best for stateless ID generation
2. Least Connections
- Route to least busy node
- Better for variable request sizes
- Slight overhead for tracking
3. Geographic Routing
- Route to nearest datacenter
- Lowest latency for clients
- Requires GeoDNS
Recommended: Round Robin + Geographic RoutingHealth Check Configuration
health_check:
endpoint: /api/v1/health
interval: 10s
timeout: 2s
unhealthy_threshold: 3
healthy_threshold: 2
success_criteria:
- status_code: 200
- response_time_ms: < 100
- clock_drift_ms: < 100
- ntp_synchronized: trueSequence Overflow Handling
Overflow Prevention
Sequence Limit: 4,096 IDs per millisecond per worker
Prevention Strategies:
1. Add More Workers
- Distribute load across workers
- Each worker: independent sequence space
- Linear scaling
2. Wait for Next Millisecond
- Block until next millisecond
- Max wait: 1ms
- Acceptable for most use cases
3. Use Microsecond Precision
- 1000x more sequence space
- Requires format change
- For extreme scale only
Current Headroom:
- 10 IDs/ms average
- 4,096 IDs/ms capacity
- 400x headroomOverflow Monitoring
Metrics to Track:
- sequence_overflow_count
- sequence_utilization_percent
- wait_time_ms (when overflow occurs)
- requests_queued
Alerts:
- Warning: >50% sequence utilization
- Critical: >80% sequence utilization
- Action: Add workers or investigate load spikeDatabase Scaling
Metrics Storage Scaling
ClickHouse Scaling:
- Horizontal: Add nodes to cluster
- Vertical: Increase node resources
- Partitioning: By month for time-series data
- Replication: 2-3 replicas per shard
Capacity Planning:
- 1 event = ~500 bytes
- 10K IDs/sec = 5MB/sec = 432GB/day
- With compression (10x): 43GB/day
- 1 year retention: ~16TBConfiguration Store Scaling
Zookeeper/etcd Scaling:
- 3-5 nodes for quorum
- Read replicas for high read load
- Rarely a bottleneck
- Focus on availability over performance
Typical Load:
- Worker registration: Once per node startup
- Config reads: Cached, infrequent
- Heartbeats: Every 30 seconds per node
- Easily handles 1000+ workersPerformance Optimization
Code-Level Optimizations
// Lock-free atomic operations
type IDGenerator struct {
lastTimestamp int64
sequence int64
}
func (g *IDGenerator) NextID() int64 {
for {
now := currentTimestamp()
last := atomic.LoadInt64(&g.lastTimestamp)
if now > last {
// New millisecond - reset sequence
if atomic.CompareAndSwapInt64(&g.lastTimestamp, last, now) {
atomic.StoreInt64(&g.sequence, 0)
return g.buildID(now, 0)
}
} else if now == last {
// Same millisecond - increment sequence
seq := atomic.AddInt64(&g.sequence, 1)
if seq < 4096 {
return g.buildID(now, seq)
}
// Sequence overflow - wait for next millisecond
time.Sleep(time.Millisecond)
} else {
// Clock regression - handle error
return g.handleClockRegression(now, last)
}
}
}Network Optimizations
HTTP/2 Benefits:
- Multiplexing: Multiple requests per connection
- Header compression: Reduced bandwidth
- Server push: Proactive response delivery
gRPC Benefits:
- Binary protocol: Faster serialization
- Streaming: Efficient batch generation
- Connection pooling: Reduced overhead
Connection Pooling:
- Client-side: Reuse connections
- Pool size: 10-50 connections per client
- Idle timeout: 5 minutes
- Max lifetime: 30 minutesCapacity Planning
Growth Projections
Year 1 (Current):
- Load: 10,000 IDs/sec average
- Nodes: 10 workers
- Cost: $1,000/month
Year 2 (2x growth):
- Load: 20,000 IDs/sec average
- Nodes: 20 workers
- Cost: $2,000/month
Year 3 (2x growth):
- Load: 40,000 IDs/sec average
- Nodes: 40 workers
- Cost: $4,000/month
Year 5 (2.5x growth):
- Load: 100,000 IDs/sec average
- Nodes: 100 workers
- Cost: $10,000/monthResource Forecasting
Per-Node Resources:
- CPU: 2-4 cores
- Memory: 2GB
- Network: 1 Gbps
- Storage: 20GB (OS + app)
Cluster Resources (100 nodes):
- CPU: 200-400 cores
- Memory: 200GB
- Network: 100 Gbps aggregate
- Storage: 2TB
Cloud Costs (AWS example):
- Instance: t3.medium ($30/month)
- Network: $10/month
- Monitoring: $5/month
- Total: ~$45/node/monthAuto-Scaling Configuration
Kubernetes HPA Example
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: id-generator-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: id-generator
minReplicas: 10
maxReplicas: 100
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: id_generation_latency_p99
target:
type: AverageValue
averageValue: "5m"
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60AWS Auto Scaling Example
{
"AutoScalingGroupName": "id-generator-asg",
"MinSize": 10,
"MaxSize": 100,
"DesiredCapacity": 10,
"HealthCheckType": "ELB",
"HealthCheckGracePeriod": 300,
"TargetGroupARNs": ["arn:aws:elasticloadbalancing:..."],
"Tags": [
{
"Key": "Name",
"Value": "id-generator-node"
}
]
}Bottleneck Mitigation
Common Bottlenecks and Solutions
1. Sequence Exhaustion
Problem: >4,096 IDs/ms per worker
Solution: Add more workers, distribute load
2. Clock Synchronization
Problem: Clock drift >100ms
Solution: Multiple NTP sources, monitoring
3. Network Latency
Problem: High client-server latency
Solution: Geographic distribution, edge deployment
4. Load Balancer Limits
Problem: LB becomes bottleneck
Solution: Multiple LBs, DNS round-robin
5. Worker ID Exhaustion
Problem: All 1,024 worker IDs used
Solution: Increase worker ID bits, use 128-bit IDsThis comprehensive scaling guide ensures the ID generation system can grow from thousands to millions of IDs per second while maintaining low latency and high availability.