Scaling Considerations

📖 7 min read 📄 Part 6 of 10

Distributed Unique ID Generator - Scaling Considerations

Horizontal Scaling Strategy

Node Scaling Patterns

Current Capacity (10 nodes):
- 10 nodes × 10,000 IDs/sec = 100,000 IDs/sec
- Headroom: 10x current load (10,000 IDs/sec)

Scaling Triggers:
- CPU usage > 70% sustained for 5 minutes
- Average latency > 5ms for 5 minutes
- Sequence overflow rate > 1% of requests
- Request queue depth > 1000

Scaling Actions:
- Add 2-4 nodes per scaling event
- Distribute across availability zones
- Update load balancer configuration
- Monitor for 10 minutes before next scale

Worker ID Allocation at Scale

Static Allocation (Small Scale):
- Manual worker ID assignment
- Configuration file per node
- Simple, predictable
- Suitable for <50 nodes

Dynamic Allocation (Large Scale):
- Automatic worker ID assignment via Zookeeper/etcd
- Ephemeral nodes for automatic cleanup
- Supports 1,024 workers globally
- Suitable for 50+ nodes

Hybrid Approach:
- Static allocation for core workers (0-15)
- Dynamic allocation for auto-scaled workers (16-31)
- Best of both worlds

Geographic Scaling

Multi-Region Deployment

Region Distribution:
┌──────────────────────────────────────────────────────────┐
│ US-East (DC 0): 40% traffic → 12 workers                 │
│ US-West (DC 1): 25% traffic → 8 workers                  │
│ EU-West (DC 2): 20% traffic → 6 workers                  │
│ APAC (DC 3): 15% traffic → 6 workers                     │
└──────────────────────────────────────────────────────────┘

Benefits:
- Reduced latency (local generation)
- Improved availability (regional isolation)
- Regulatory compliance (data residency)
- Load distribution across regions

Cross-Region Failover

Failover Strategy:
1. Health check detects region failure
2. GeoDNS routes traffic to next nearest region
3. Backup region handles 2x normal load
4. Auto-scale triggers in backup region
5. Failed region recovers and rejoins

Recovery Time:
- Detection: 30 seconds (health check interval)
- DNS propagation: 60 seconds (TTL)
- Total failover: ~90 seconds

Vertical Scaling Considerations

Resource Optimization

CPU Scaling:
- 2 cores: 20,000 IDs/sec
- 4 cores: 40,000 IDs/sec
- 8 cores: 80,000 IDs/sec
- Diminishing returns beyond 8 cores

Memory Scaling:
- Base: 50MB (application)
- Per 1000 req/sec: +10MB
- 10,000 req/sec: ~150MB
- Memory not a bottleneck

Network Scaling:
- 1 Gbps NIC: 15M IDs/sec theoretical
- Network rarely bottleneck
- Focus on CPU optimization

Load Balancing Strategies

Load Balancer Configuration

Algorithm Options:
1. Round Robin
   - Simple, fair distribution
   - No session affinity needed
   - Best for stateless ID generation

2. Least Connections
   - Route to least busy node
   - Better for variable request sizes
   - Slight overhead for tracking

3. Geographic Routing
   - Route to nearest datacenter
   - Lowest latency for clients
   - Requires GeoDNS

Recommended: Round Robin + Geographic Routing

Health Check Configuration

health_check:
  endpoint: /api/v1/health
  interval: 10s
  timeout: 2s
  unhealthy_threshold: 3
  healthy_threshold: 2
  
  success_criteria:
    - status_code: 200
    - response_time_ms: < 100
    - clock_drift_ms: < 100
    - ntp_synchronized: true

Sequence Overflow Handling

Overflow Prevention

Sequence Limit: 4,096 IDs per millisecond per worker

Prevention Strategies:
1. Add More Workers
   - Distribute load across workers
   - Each worker: independent sequence space
   - Linear scaling

2. Wait for Next Millisecond
   - Block until next millisecond
   - Max wait: 1ms
   - Acceptable for most use cases

3. Use Microsecond Precision
   - 1000x more sequence space
   - Requires format change
   - For extreme scale only

Current Headroom:
- 10 IDs/ms average
- 4,096 IDs/ms capacity
- 400x headroom

Overflow Monitoring

Metrics to Track:
- sequence_overflow_count
- sequence_utilization_percent
- wait_time_ms (when overflow occurs)
- requests_queued

Alerts:
- Warning: >50% sequence utilization
- Critical: >80% sequence utilization
- Action: Add workers or investigate load spike

Database Scaling

Metrics Storage Scaling

ClickHouse Scaling:
- Horizontal: Add nodes to cluster
- Vertical: Increase node resources
- Partitioning: By month for time-series data
- Replication: 2-3 replicas per shard

Capacity Planning:
- 1 event = ~500 bytes
- 10K IDs/sec = 5MB/sec = 432GB/day
- With compression (10x): 43GB/day
- 1 year retention: ~16TB

Configuration Store Scaling

Zookeeper/etcd Scaling:
- 3-5 nodes for quorum
- Read replicas for high read load
- Rarely a bottleneck
- Focus on availability over performance

Typical Load:
- Worker registration: Once per node startup
- Config reads: Cached, infrequent
- Heartbeats: Every 30 seconds per node
- Easily handles 1000+ workers

Performance Optimization

Code-Level Optimizations

// Lock-free atomic operations
type IDGenerator struct {
    lastTimestamp int64
    sequence      int64
}

func (g *IDGenerator) NextID() int64 {
    for {
        now := currentTimestamp()
        last := atomic.LoadInt64(&g.lastTimestamp)
        
        if now > last {
            // New millisecond - reset sequence
            if atomic.CompareAndSwapInt64(&g.lastTimestamp, last, now) {
                atomic.StoreInt64(&g.sequence, 0)
                return g.buildID(now, 0)
            }
        } else if now == last {
            // Same millisecond - increment sequence
            seq := atomic.AddInt64(&g.sequence, 1)
            if seq < 4096 {
                return g.buildID(now, seq)
            }
            // Sequence overflow - wait for next millisecond
            time.Sleep(time.Millisecond)
        } else {
            // Clock regression - handle error
            return g.handleClockRegression(now, last)
        }
    }
}

Network Optimizations

HTTP/2 Benefits:
- Multiplexing: Multiple requests per connection
- Header compression: Reduced bandwidth
- Server push: Proactive response delivery

gRPC Benefits:
- Binary protocol: Faster serialization
- Streaming: Efficient batch generation
- Connection pooling: Reduced overhead

Connection Pooling:
- Client-side: Reuse connections
- Pool size: 10-50 connections per client
- Idle timeout: 5 minutes
- Max lifetime: 30 minutes

Capacity Planning

Growth Projections

Year 1 (Current):
- Load: 10,000 IDs/sec average
- Nodes: 10 workers
- Cost: $1,000/month

Year 2 (2x growth):
- Load: 20,000 IDs/sec average
- Nodes: 20 workers
- Cost: $2,000/month

Year 3 (2x growth):
- Load: 40,000 IDs/sec average
- Nodes: 40 workers
- Cost: $4,000/month

Year 5 (2.5x growth):
- Load: 100,000 IDs/sec average
- Nodes: 100 workers
- Cost: $10,000/month

Resource Forecasting

Per-Node Resources:
- CPU: 2-4 cores
- Memory: 2GB
- Network: 1 Gbps
- Storage: 20GB (OS + app)

Cluster Resources (100 nodes):
- CPU: 200-400 cores
- Memory: 200GB
- Network: 100 Gbps aggregate
- Storage: 2TB

Cloud Costs (AWS example):
- Instance: t3.medium ($30/month)
- Network: $10/month
- Monitoring: $5/month
- Total: ~$45/node/month

Auto-Scaling Configuration

Kubernetes HPA Example

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: id-generator-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: id-generator
  minReplicas: 10
  maxReplicas: 100
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Pods
    pods:
      metric:
        name: id_generation_latency_p99
      target:
        type: AverageValue
        averageValue: "5m"
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60

AWS Auto Scaling Example

{
  "AutoScalingGroupName": "id-generator-asg",
  "MinSize": 10,
  "MaxSize": 100,
  "DesiredCapacity": 10,
  "HealthCheckType": "ELB",
  "HealthCheckGracePeriod": 300,
  "TargetGroupARNs": ["arn:aws:elasticloadbalancing:..."],
  "Tags": [
    {
      "Key": "Name",
      "Value": "id-generator-node"
    }
  ]
}

Bottleneck Mitigation

Common Bottlenecks and Solutions

1. Sequence Exhaustion
   Problem: >4,096 IDs/ms per worker
   Solution: Add more workers, distribute load
   
2. Clock Synchronization
   Problem: Clock drift >100ms
   Solution: Multiple NTP sources, monitoring
   
3. Network Latency
   Problem: High client-server latency
   Solution: Geographic distribution, edge deployment
   
4. Load Balancer Limits
   Problem: LB becomes bottleneck
   Solution: Multiple LBs, DNS round-robin
   
5. Worker ID Exhaustion
   Problem: All 1,024 worker IDs used
   Solution: Increase worker ID bits, use 128-bit IDs

This comprehensive scaling guide ensures the ID generation system can grow from thousands to millions of IDs per second while maintaining low latency and high availability.