Distributed Unique ID Generator - Variations and Follow-ups
Common Variations
1. Short URL ID Generator
Modification: Generate short, URL-safe IDs for link shorteners
Changes:
- Base62 encoding (a-zA-Z0-9)
- 7-8 character IDs
- Custom alphabet for readability
- Collision detection and retry
Example:
Numeric ID: 1234567890123456789
Base62 ID: aB3xK9mP2
URL: https://short.ly/aB3xK9mP2Implementation:
BASE62 = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
def encode_base62(num):
if num == 0:
return BASE62[0]
result = []
while num:
result.append(BASE62[num % 62])
num //= 62
return ''.join(reversed(result))
def decode_base62(s):
num = 0
for char in s:
num = num * 62 + BASE62.index(char)
return num2. Distributed Transaction ID Generator
Modification: Generate IDs for distributed transactions with causality tracking
Changes:
- Embed transaction coordinator ID
- Include parent transaction ID
- Add transaction type metadata
- Support nested transactions
Format:
[Timestamp 41][Coordinator 8][Transaction Type 3][Sequence 12]3. Multi-Tenant ID Generator
Modification: Generate tenant-aware IDs for SaaS applications
Changes:
- Embed tenant ID in the ID
- Tenant-specific sequence numbers
- Tenant isolation guarantees
- Per-tenant rate limiting
Format:
[Timestamp 41][Tenant ID 10][Worker 5][Sequence 8]Benefits:
- Easy tenant data isolation
- Efficient tenant-based queries
- Built-in multi-tenancy support
4. Geographically Sortable IDs
Modification: IDs sortable by both time and geography
Changes:
- Embed geographic region code
- Region-aware routing
- Geo-distributed generation
- Location-based sharding
Format:
[Timestamp 41][Region 8][Datacenter 5][Worker 5][Sequence 5]5. Priority-Aware ID Generator
Modification: Embed priority information in IDs
Changes:
- Priority bits in ID
- Priority-based processing
- Queue ordering by ID
- SLA-aware generation
Format:
[Priority 3][Timestamp 38][Datacenter 5][Worker 5][Sequence 13]Common Follow-Up Questions
1. "How do you handle clock synchronization failures?"
Answer:
Multi-layered approach:
1. Prevention:
- Multiple NTP servers
- Continuous monitoring
- Alert on drift >100ms
2. Detection:
- Compare system time with NTP
- Track clock regression events
- Monitor drift trends
3. Mitigation:
- Refuse generation if drift >1 second
- Use last known good timestamp
- Fall back to UUID v4 generation
- Alert operations team
4. Recovery:
- Automatic NTP resync
- Gradual timestamp adjustment
- Resume normal operation2. "What happens if two nodes get the same worker ID?"
Answer:
Prevention:
- Use Zookeeper/etcd for worker ID assignment
- Ephemeral nodes ensure cleanup
- Grace period before ID reuse (1 hour)
- Health checks detect conflicts
Detection:
- Monitor for duplicate IDs
- Track worker ID assignments
- Alert on conflicts
Resolution:
- Shut down conflicting node
- Reassign new worker ID
- Investigate root cause
- Update assignment process
Impact:
- Extremely rare with proper coordination
- Affects only IDs generated during conflict
- No data corruption (IDs still unique by timestamp)3. "How do you migrate from auto-increment IDs to Snowflake IDs?"
Answer:
Migration Strategy:
Phase 1: Dual Write (2 weeks)
- Generate both ID types
- Store Snowflake ID in new column
- Keep auto-increment as primary key
- Monitor for issues
Phase 2: Dual Read (2 weeks)
- Application reads both ID types
- Gradually shift to Snowflake IDs
- Monitor performance
Phase 3: Cutover (1 week)
- Switch primary key to Snowflake ID
- Keep auto-increment for reference
- Update all foreign keys
- Reindex tables
Phase 4: Cleanup (1 week)
- Remove auto-increment column
- Update documentation
- Archive old IDs4. "How do you ensure IDs are unpredictable for security?"
Answer:
Options:
1. Encryption:
- Encrypt Snowflake ID with AES
- Use deterministic encryption
- Decrypt on server side
- Maintains sortability with encrypted index
2. Obfuscation:
- XOR with secret key
- Reversible transformation
- Lightweight operation
- Not cryptographically secure
3. Separate Public ID:
- Internal: Snowflake ID
- External: UUID v4
- Mapping table
- Best security, more complexity
4. Hash-based:
- HMAC(Snowflake ID, secret)
- Truncate to desired length
- One-way transformation
- Requires lookup table5. "How do you handle ID exhaustion?"
Answer:
Timeline:
- 41-bit timestamp with 2020 epoch
- Exhaustion: ~2089 (69 years)
Solutions:
1. Epoch Update (Preferred):
- Update epoch to current year
- Deploy new version
- Old IDs remain valid
- Extends lifetime another 69 years
2. Migrate to 128-bit:
- Switch to ULID or UUID
- Virtually unlimited space
- Requires schema changes
- Plan years in advance
3. Increase Timestamp Bits:
- Reduce worker/sequence bits
- Extends lifetime
- Reduces capacity
- Breaking change
Planning:
- Monitor exhaustion date
- Plan migration 5 years ahead
- Test migration process
- Communicate with stakeholders6. "How do you generate IDs offline?"
Answer:
Approaches:
1. Pre-allocated ID Ranges:
- Allocate ranges before going offline
- Each device gets unique range
- Sync when back online
- Risk of exhaustion
2. Device-Specific Worker IDs:
- Assign unique worker ID per device
- Generate IDs offline
- No coordination needed
- Requires worker ID management
3. UUID v4:
- Generate random UUIDs
- No coordination needed
- Not sortable
- Larger storage
4. Hybrid Approach:
- Online: Snowflake IDs
- Offline: UUID v4
- Convert on sync
- Best of both worlds7. "How do you handle leap seconds?"
Answer:
Leap Second Handling:
1. Ignore Leap Seconds:
- Use Unix timestamp (ignores leap seconds)
- Simplest approach
- Acceptable for most use cases
- 1-second drift over decades
2. Smear Leap Seconds:
- Distribute adjustment over 24 hours
- Google's approach
- Smooth transition
- Requires custom NTP
3. Pause Generation:
- Stop generating during leap second
- Resume after adjustment
- Ensures correctness
- Brief service interruption
Recommendation: Ignore leap seconds (Option 1)
- Simplest implementation
- Negligible impact
- Industry standard8. "How do you test the ID generator?"
Answer:
Testing Strategy:
1. Unit Tests:
- ID format validation
- Sequence increment
- Timestamp handling
- Clock regression
2. Integration Tests:
- Multi-threaded generation
- Uniqueness verification
- Performance benchmarks
- Failure scenarios
3. Load Tests:
- Sustained high load
- Burst traffic
- Sequence overflow
- Resource usage
4. Chaos Tests:
- Clock regression
- NTP failures
- Node failures
- Network partitions
5. Production Monitoring:
- Duplicate detection
- Performance metrics
- Error rates
- Clock driftAdvanced Variations
Sharded ID Generator
Use Case: Database sharding with shard-aware IDs
Format: [Shard ID 16][Timestamp 32][Sequence 16]
Benefits:
- Efficient shard routing
- Balanced distribution
- Shard-local uniqueness
Implementation:
- Hash user ID to determine shard
- Generate ID within shard
- Embed shard ID in resultHierarchical ID Generator
Use Case: Parent-child relationships (e.g., order-items)
Format: [Parent ID 32][Child Sequence 32]
Benefits:
- Implicit relationship
- Efficient queries
- Sorted by parent
Implementation:
- Generate parent ID first
- Child IDs derived from parent
- Sequence per parentTime-Bucketed ID Generator
Use Case: Time-series data with efficient queries
Format: [Time Bucket 20][Timestamp 21][Worker 10][Sequence 13]
Benefits:
- Efficient time-range queries
- Automatic partitioning
- Optimized storage
Implementation:
- Bucket: Hour/Day/Week
- IDs grouped by bucket
- Partition by bucketThis comprehensive guide covers the most common variations and follow-up questions you'll encounter when designing or discussing distributed ID generation systems.