Distributed Unique ID Generator - System Architecture

High-Level Architecture Overview

System Architecture Principles

Coordination-Free Design: No inter-node communication required
Stateless Nodes: Each node operates independently
Time-Based Ordering: Leverage timestamps for sortability
Horizontal Scalability: Add nodes without coordination
High Availability: No single point of failure
Low Latency: Sub-millisecond ID generation

Core Architecture Components

Unique ID Generator — Snowflake Architecture with ZooKeeper Coordination

Snowflake ID Format Design

64-Bit ID Structure

64-Bit Snowflake ID — Bit allocation across sign, timestamp, datacenter, worker, and sequence fields

ID Generation Algorithm

class SnowflakeIDGenerator:
    def __init__(self, datacenter_id, worker_id, epoch=1609459200000):
        self.datacenter_id = datacenter_id  # 0-31
        self.worker_id = worker_id          # 0-31
        self.epoch = epoch                  # Custom epoch (2021-01-01)
        self.sequence = 0
        self.last_timestamp = -1
        
    def generate_id(self):
        timestamp = self.current_timestamp()
        
        # Handle clock moving backwards
        if timestamp < self.last_timestamp:
            raise Exception(f"Clock moved backwards. Refusing to generate ID")
        
        # Same millisecond - increment sequence
        if timestamp == self.last_timestamp:
            self.sequence = (self.sequence + 1) & 0xFFF  # 12-bit mask
            
            # Sequence overflow - wait for next millisecond
            if self.sequence == 0:
                timestamp = self.wait_next_millis(self.last_timestamp)
        else:
            # New millisecond - reset sequence
            self.sequence = 0
        
        self.last_timestamp = timestamp
        
        # Construct ID
        id = ((timestamp - self.epoch) << 22) | \
             (self.datacenter_id << 17) | \
             (self.worker_id << 12) | \
             self.sequence
        
        return id
    
    def current_timestamp(self):
        return int(time.time() * 1000)  # Milliseconds
    
    def wait_next_millis(self, last_timestamp):
        timestamp = self.current_timestamp()
        while timestamp <= last_timestamp:
            timestamp = self.current_timestamp()
        return timestamp

Alternative ID Generation Strategies

Instagram-Style IDs

Instagram-Style 64-Bit ID — Simplified two-segment layout with shard-aware sequencing

UUID v1 (Time-Based)

UUID v1 (Time-Based) — 128-bit RFC 4122 structure with embedded timestamp and MAC address

ULID (Universally Unique Lexicographically Sortable ID)

ULID — Universally Unique Lexicographically Sortable Identifier with 48-bit timestamp prefix

Multi-Datacenter Architecture

Geographic Distribution

Multi-Datacenter Architecture — Independent datacenters with global monitoring overlay

Datacenter Failover Strategy

Normal Operation:
Client → GeoDNS → Nearest DC → ID Generator

Datacenter Failure:
Client → GeoDNS → Next Nearest DC → ID Generator
                  (Automatic failover)

Benefits:
- No coordination between DCs
- Independent operation
- Automatic geographic routing
- Graceful degradation

Worker ID Management

Static Worker ID Assignment

Configuration File (config.yaml):
datacenter_id: 0
worker_id: 5
epoch: 1609459200000  # 2021-01-01 00:00:00 UTC
port: 8080

Advantages:
- Simple configuration
- No external dependencies
- Fast startup
- Predictable behavior

Disadvantages:
- Manual management
- Risk of conflicts
- Difficult to scale dynamically

Dynamic Worker ID Assignment

Dynamic Worker ID Assignment — Lifecycle from startup through operation, shutdown, and failure recovery

Clock Synchronization and Time Management

NTP Synchronization Architecture

NTP Synchronization — Multi-server time sync with drift detection and automatic correction

Handling Clock Regression

class ClockManager:
    def __init__(self):
        self.last_timestamp = 0
        self.clock_regression_count = 0
        
    def get_timestamp(self):
        current = int(time.time() * 1000)
        
        if current < self.last_timestamp:
            # Clock moved backwards
            regression = self.last_timestamp - current
            self.clock_regression_count += 1
            
            if regression < 5:  # Less than 5ms
                # Small regression - wait it out
                time.sleep(regression / 1000.0)
                return self.last_timestamp
            elif regression < 1000:  # Less than 1 second
                # Medium regression - use last timestamp
                logging.warning(f"Clock regression: {regression}ms")
                return self.last_timestamp
            else:
                # Large regression - refuse to generate
                raise ClockRegressionError(
                    f"Clock moved backwards by {regression}ms"
                )
        
        self.last_timestamp = current
        return current

API Design and Service Interface

REST API Endpoints

GET /api/v1/id
- Generate single ID
- Response: {"id": 1234567890123456789}

GET /api/v1/ids?count=100
- Generate multiple IDs
- Response: {"ids": [123..., 456..., 789...]}

GET /api/v1/parse?id=1234567890123456789
- Parse ID components
- Response: {
    "timestamp": "2024-01-03T19:30:00Z",
    "datacenter_id": 0,
    "worker_id": 5,
    "sequence": 42
  }

GET /api/v1/health
- Health check endpoint
- Response: {
    "status": "healthy",
    "worker_id": 5,
    "datacenter_id": 0,
    "uptime_seconds": 86400,
    "ids_generated": 1000000
  }

GET /api/v1/metrics
- Prometheus metrics endpoint
- Response: Prometheus format metrics

gRPC Service Definition

syntax = "proto3";

service IDGenerator {
  rpc GenerateID(GenerateIDRequest) returns (GenerateIDResponse);
  rpc GenerateBatch(GenerateBatchRequest) returns (GenerateBatchResponse);
  rpc ParseID(ParseIDRequest) returns (ParseIDResponse);
  rpc HealthCheck(HealthCheckRequest) returns (HealthCheckResponse);
}

message GenerateIDRequest {}

message GenerateIDResponse {
  int64 id = 1;
}

message GenerateBatchRequest {
  int32 count = 1;  // Number of IDs to generate
}

message GenerateBatchResponse {
  repeated int64 ids = 1;
}

message ParseIDRequest {
  int64 id = 1;
}

message ParseIDResponse {
  int64 timestamp_ms = 1;
  int32 datacenter_id = 2;
  int32 worker_id = 3;
  int32 sequence = 4;
}

message HealthCheckRequest {}

message HealthCheckResponse {
  string status = 1;
  int32 worker_id = 2;
  int32 datacenter_id = 3;
  int64 uptime_seconds = 4;
  int64 ids_generated = 5;
}

Monitoring and Observability

Key Metrics to Track

Performance Metrics:
- id_generation_latency_ms (histogram)
- id_generation_rate (counter)
- sequence_overflow_count (counter)
- clock_regression_count (counter)

Resource Metrics:
- cpu_usage_percent (gauge)
- memory_usage_bytes (gauge)
- goroutines_count (gauge)

Health Metrics:
- uptime_seconds (gauge)
- last_id_timestamp (gauge)
- clock_drift_ms (gauge)
- ntp_sync_status (gauge)

Business Metrics:
- total_ids_generated (counter)
- ids_per_second (gauge)
- error_rate (counter)

Alerting Rules

alerts:
  - name: HighClockDrift
    condition: clock_drift_ms > 100
    severity: warning
    action: Page on-call engineer
    
  - name: ClockRegression
    condition: clock_regression_count > 10 in 1m
    severity: critical
    action: Page on-call engineer
    
  - name: SequenceOverflow
    condition: sequence_overflow_count > 100 in 1m
    severity: warning
    action: Scale up workers
    
  - name: HighLatency
    condition: p99(id_generation_latency_ms) > 10
    severity: warning
    action: Investigate performance

This comprehensive architecture provides a robust, scalable, and efficient foundation for distributed unique ID generation across multiple datacenters with high availability and low latency.