Problem Statement

📖 6 min read 📄 Part 1 of 10

Distributed Locking System - Problem Statement

Overview

Design a distributed locking service similar to Apache Zookeeper or etcd that provides mutual exclusion, leader election, and coordination primitives for distributed applications. The system must ensure safety (at most one lock holder), liveness (locks are eventually acquired), and fault tolerance across multiple nodes.

Functional Requirements

Core Locking Features

  • Exclusive Locks: Mutual exclusion for critical sections
  • Shared Locks: Read locks allowing multiple readers
  • Lock Acquisition: Blocking and non-blocking lock requests
  • Lock Release: Explicit and automatic lock release
  • Lock Timeout: Automatic release after timeout period
  • Lock Renewal: Extend lock duration before expiration

Advanced Locking Features

  • Reentrant Locks: Same client can acquire lock multiple times
  • Fair Locks: FIFO ordering for lock acquisition
  • Try-Lock: Attempt lock with immediate return
  • Lock Queuing: Queue waiting clients in order
  • Lock Priority: Priority-based lock acquisition
  • Conditional Locks: Acquire lock only if condition met

Leader Election

  • Single Leader: Elect one leader from multiple candidates
  • Automatic Failover: Re-elect on leader failure
  • Leader Lease: Time-bound leadership with renewal
  • Split-Brain Prevention: Ensure single leader at all times
  • Leader Discovery: Clients can discover current leader
  • Leadership Transfer: Graceful leadership handoff

Distributed Coordination

  • Barriers: Synchronize multiple processes
  • Semaphores: Limit concurrent access to N clients
  • Counters: Distributed atomic counters
  • Queues: Distributed FIFO queues
  • Configuration Management: Centralized configuration storage
  • Service Discovery: Register and discover services

Session Management

  • Client Sessions: Maintain client connection state
  • Session Timeout: Detect and handle client failures
  • Session Renewal: Keep-alive heartbeat mechanism
  • Session Recovery: Reconnect and recover session state
  • Ephemeral Nodes: Auto-delete on session expiration
  • Session Affinity: Maintain session on specific server

Watch and Notification

  • Lock Watches: Notify on lock state changes
  • Data Watches: Notify on data changes
  • One-Time Watches: Single notification per watch
  • Persistent Watches: Multiple notifications
  • Watch Filtering: Filter notifications by criteria
  • Batch Notifications: Group multiple notifications

Non-Functional Requirements

Safety Requirements

  • Mutual Exclusion: At most one lock holder at any time
  • No Deadlocks: System prevents circular wait conditions
  • Fencing Tokens: Monotonically increasing tokens for safety
  • Consistency: Strong consistency for lock state
  • Durability: Lock state persists across failures
  • Correctness: Locks work correctly under all conditions

Liveness Requirements

  • Lock Availability: Locks eventually acquirable
  • Progress Guarantee: System makes forward progress
  • Starvation Freedom: All clients eventually get locks
  • Timeout Handling: Expired locks automatically released
  • Failure Recovery: System recovers from node failures
  • No Permanent Blocking: Clients don't block indefinitely

Performance Requirements

  • Lock Acquisition Latency: <10ms for local locks
  • Lock Release Latency: <5ms for release operations
  • Throughput: 10,000+ lock operations per second
  • Watch Notification: <100ms notification delivery
  • Session Heartbeat: <1 second heartbeat interval
  • Leader Election Time: <5 seconds for new leader

Scalability Requirements

  • Concurrent Locks: Support 100,000+ active locks
  • Concurrent Clients: Handle 10,000+ connected clients
  • Lock Namespace: Support millions of lock paths
  • Watch Scalability: 100,000+ active watches
  • Cluster Size: 3-7 nodes for consensus
  • Geographic Distribution: Multi-datacenter deployment

Reliability Requirements

  • Availability: 99.99% uptime (52 minutes/year downtime)
  • Fault Tolerance: Survive (N-1)/2 node failures
  • Data Durability: 99.999% lock state durability
  • Split-Brain Prevention: No dual leadership
  • Network Partition Handling: Maintain consistency
  • Graceful Degradation: Read-only mode during issues

Consistency Requirements

  • Linearizability: All operations appear atomic
  • Sequential Consistency: Operations ordered per client
  • Consensus: Raft or Paxos for state replication
  • Quorum Reads: Majority reads for consistency
  • Quorum Writes: Majority writes for durability
  • Snapshot Isolation: Consistent point-in-time reads

Technical Constraints

Consensus Algorithm

  • Raft or Paxos: Proven consensus algorithms
  • Quorum Size: Majority (N/2 + 1) for decisions
  • Log Replication: Replicate all state changes
  • Leader-Based: Single leader for writes
  • Follower Reads: Optional stale reads from followers
  • Log Compaction: Periodic snapshot and cleanup

Network Assumptions

  • Asynchronous Network: No timing guarantees
  • Partial Failures: Nodes can fail independently
  • Network Partitions: Handle split-brain scenarios
  • Message Loss: Tolerate lost messages
  • Message Reordering: Handle out-of-order delivery
  • Byzantine Failures: Assume crash-fail model only

Timing and Clocks

  • Logical Clocks: Lamport or vector clocks
  • Lease-Based: Time-bound lock ownership
  • Clock Skew Tolerance: Handle clock differences
  • Timeout Precision: ±10% timeout accuracy
  • Heartbeat Interval: 1-5 second intervals
  • Session Timeout: 10-60 second timeouts

Edge Cases and Failure Scenarios

Network Failures

  • Network Partition: Cluster splits into groups
  • Packet Loss: Messages don't arrive
  • High Latency: Slow network communication
  • Connection Drops: Client disconnections
  • Asymmetric Partition: One-way communication failure
  • Cascading Failures: Multiple simultaneous failures

Node Failures

  • Leader Failure: Current leader crashes
  • Follower Failure: Non-leader node crashes
  • Majority Failure: Quorum lost
  • Slow Nodes: Nodes respond slowly
  • Disk Failures: Persistent storage fails
  • Memory Exhaustion: Out of memory errors

Client Failures

  • Client Crash: Client fails while holding lock
  • Client Hang: Client stops responding
  • Zombie Clients: Client thinks it has lock but doesn't
  • Session Expiry: Client session times out
  • Reconnection: Client reconnects after failure
  • Duplicate Requests: Client retries operations

Timing Issues

  • Clock Skew: Clocks drift between nodes
  • Lease Expiry: Lock lease expires
  • Heartbeat Miss: Missed heartbeat messages
  • Timeout Races: Timeout and operation race
  • GC Pauses: Long garbage collection pauses
  • Process Suspension: Process stops temporarily

Success Metrics

Correctness Metrics

  • Lock Safety: 100% mutual exclusion guarantee
  • No Split-Brain: 0 dual leadership incidents
  • Consistency Violations: 0 consistency errors
  • Data Loss: 0 committed data loss
  • Deadlock Incidents: 0 deadlock occurrences
  • Correctness Tests: 100% pass rate

Performance Metrics

  • Lock Latency P50: <5ms
  • Lock Latency P99: <20ms
  • Lock Throughput: 10,000+ ops/sec
  • Watch Notification: <100ms P99
  • Leader Election: <5 seconds
  • Session Heartbeat: <1 second

Reliability Metrics

  • Availability: 99.99% uptime
  • MTBF: >1 year mean time between failures
  • MTTR: <5 minutes mean time to recovery
  • Failed Operations: <0.01% error rate
  • Successful Failovers: 100% success rate
  • Data Durability: 99.999% durability

Operational Metrics

  • Cluster Health: 100% healthy nodes
  • Disk Usage: <80% capacity
  • Memory Usage: <80% capacity
  • CPU Usage: <70% average
  • Network Bandwidth: <50% utilization
  • Alert Response: <5 minutes

This problem statement provides the foundation for designing a robust, scalable, and reliable distributed locking system suitable for coordinating distributed applications and services.