Scale & Constraints

📖 2 min read 📄 Part 2 of 10

Distributed Locking System - Scale and Constraints

Scale Estimation

Traffic Analysis

  • Total Applications: 1,000 distributed applications
  • Active Clients: 10,000 concurrent clients
  • Lock Operations: 10,000 ops/sec average
  • Peak Traffic: 50,000 ops/sec (5x normal)
  • Active Locks: 100,000 locks at any time
  • Watch Operations: 50,000 active watches

Operation Breakdown

Daily Operations:
- Lock acquisitions: 10K ops/sec × 86,400 sec = 864M/day
- Lock releases: 10K ops/sec × 86,400 sec = 864M/day
- Heartbeats: 10K clients × 1/sec × 86,400 = 864M/day
- Watch notifications: 5K/sec × 86,400 = 432M/day
- Total operations: ~3 billion/day

Lock Duration Distribution:
- Short locks (<1 sec): 60%
- Medium locks (1-10 sec): 30%
- Long locks (>10 sec): 10%
- Average duration: 5 seconds

Storage Requirements

Per Lock State:
- Lock path: 100 bytes
- Owner session ID: 16 bytes
- Fencing token: 8 bytes
- Timestamps: 24 bytes
- Metadata: 52 bytes
Total: ~200 bytes per lock

Cluster Storage:
- 100K active locks × 200 bytes = 20MB
- Log entries (1 day): 3B ops × 100 bytes = 300GB
- Snapshots: 100MB per snapshot
- Total: ~500GB with retention

Capacity Planning

Cluster Configuration

Recommended Cluster Sizes:

Small (Development):
- 3 nodes
- 10K ops/sec
- 1K concurrent clients
- 10K active locks

Medium (Production):
- 5 nodes
- 50K ops/sec
- 5K concurrent clients
- 50K active locks

Large (Enterprise):
- 7 nodes
- 100K ops/sec
- 10K concurrent clients
- 100K active locks

Why Odd Numbers?
- Quorum: (N/2 + 1) for consensus
- 3 nodes: Tolerates 1 failure
- 5 nodes: Tolerates 2 failures
- 7 nodes: Tolerates 3 failures