Variations & Follow-ups

📖 9 min read 📄 Part 8 of 10

Key-Value Store - Variations and Follow-up Questions

Common Variations

1. In-Memory Key-Value Store (Redis-like)

Requirements:

  • All data stored in RAM for ultra-low latency
  • Optional persistence to disk
  • Single-threaded or multi-threaded architecture
  • Support for complex data structures

Key Differences:

Architecture Changes:
- No LSM-tree, pure in-memory hash table
- Optional AOF (Append-Only File) for persistence
- Optional RDB snapshots for backup
- Single-threaded event loop or thread-per-core

Performance:
- Latency: <100μs (vs <1ms for disk-based)
- Throughput: 100K-1M ops/sec per node
- Memory: Limited by RAM (vs unlimited disk)

Trade-offs:
✅ Ultra-low latency
✅ Simple architecture
❌ Limited by memory
❌ Data loss risk on crash
❌ Higher cost per GB

Follow-up Questions:

  • How do you handle data larger than available RAM?
  • What persistence strategies ensure durability?
  • How do you implement replication for high availability?
  • What eviction policies do you use when memory is full?

2. Time-Series Key-Value Store

Requirements:

  • Optimized for time-series data (metrics, logs, events)
  • Efficient range queries by timestamp
  • Data retention and downsampling
  • High write throughput

Key Differences:

Data Model:
- Keys: metric_name:timestamp
- Values: numeric values or JSON
- Partitioning: Time-based partitioning
- Compaction: Time-window compaction

Optimizations:
- Columnar storage for compression
- Delta encoding for timestamps
- Downsampling for old data
- Automatic data expiration

Example Schema:
  cpu_usage:server1:1704729600 → 45.2
  cpu_usage:server1:1704729601 → 46.1
  cpu_usage:server1:1704729602 → 44.8

Follow-up Questions:

  • How do you handle out-of-order writes?
  • What downsampling strategies do you use?
  • How do you implement efficient range queries?
  • What retention policies do you support?

3. Geo-Distributed Key-Value Store

Requirements:

  • Data replicated across multiple geographic regions
  • Low latency for local reads
  • Conflict resolution for concurrent writes
  • Compliance with data residency requirements

Key Differences:

Architecture:
- Multi-region clusters
- Async cross-region replication
- Regional routing for reads
- Conflict resolution with vector clocks

Consistency:
- Local strong consistency
- Global eventual consistency
- Configurable per-key replication
- Conflict-free replicated data types (CRDTs)

Latency:
- Local reads: <5ms
- Cross-region reads: <100ms
- Cross-region writes: <200ms

Follow-up Questions:

  • How do you handle network partitions between regions?
  • What conflict resolution strategies do you use?
  • How do you ensure data residency compliance?
  • What's the replication lag between regions?

4. Strongly Consistent Key-Value Store (Spanner-like)

Requirements:

  • Linearizable reads and writes
  • Global transactions with ACID guarantees
  • Distributed consensus (Paxos/Raft)
  • External consistency using synchronized clocks

Key Differences:

Consensus:
- Paxos or Raft for leader election
- Quorum-based writes
- Synchronized clocks (TrueTime)
- Two-phase commit for transactions

Performance:
- Higher write latency (consensus overhead)
- Strong consistency guarantees
- Serializable isolation
- Global transactions

Trade-offs:
✅ Strong consistency
✅ ACID transactions
❌ Higher latency
❌ Lower availability during partitions
❌ Complex implementation

Follow-up Questions:

  • How do you implement distributed transactions?
  • What consensus algorithm do you use?
  • How do you handle clock skew?
  • What's the performance impact of strong consistency?

5. Document Store (MongoDB-like)

Requirements:

  • Store JSON/BSON documents instead of simple values
  • Support for nested fields and arrays
  • Secondary indexes on document fields
  • Rich query language

Key Differences:

Data Model:
- Values: JSON documents (not just bytes)
- Schema: Flexible, schema-less
- Indexes: Multiple secondary indexes
- Queries: Complex queries with filters, projections

Example:
  Key: user:123
  Value: {
    "name": "John Doe",
    "email": "john@example.com",
    "address": {
      "city": "New York",
      "zip": "10001"
    },
    "tags": ["premium", "verified"]
  }

Query: Find users in New York with "premium" tag

Follow-up Questions:

  • How do you implement secondary indexes?
  • What query optimization techniques do you use?
  • How do you handle schema evolution?
  • What's the performance impact of complex queries?

Interview Follow-up Questions

Capacity and Scaling

Q: How would you handle 10x traffic growth?

Approach:
1. Horizontal Scaling:
   - Add 10x nodes (10 → 100 nodes)
   - Rebalance data across new nodes
   - Update routing tables

2. Vertical Scaling:
   - Upgrade to larger instances
   - More RAM for caching
   - Faster SSDs for storage

3. Optimization:
   - Increase cache hit rate
   - Optimize hot paths
   - Batch operations
   - Compression

Cost Analysis:
- Current: 10 nodes × $500 = $5,000/month
- 10x traffic: 100 nodes × $500 = $50,000/month
- With optimization: 50 nodes × $500 = $25,000/month

Q: How do you handle hot keys?

Detection:
- Monitor access patterns
- Track requests per key
- Alert on threshold (>10K req/min)

Mitigation:
1. Replication: Replicate hot key to more nodes
2. Caching: Cache hot key on all nodes
3. Sharding: Split hot key into sub-keys
4. Rate Limiting: Limit requests per client
5. Load Balancing: Distribute reads across replicas

Example:
  Hot key: "trending_topics"
  Solution: Replicate to 20 nodes, cache on all nodes
  Result: 100K req/sec → 5K req/sec per node

Consistency and Availability

Q: How do you handle network partitions?

Scenario: Network partition splits cluster into two groups

Group A (2 nodes):
- Can't reach Group B
- Continues serving reads (stale data possible)
- Rejects writes (can't achieve quorum W=2)

Group B (1 node):
- Can't reach Group A
- Continues serving reads (stale data possible)
- Rejects writes (can't achieve quorum W=2)

After Partition Heals:
1. Detect partition recovery
2. Run anti-entropy repair
3. Resolve conflicts using vector clocks
4. Resume normal operations

Trade-off:
- Availability: Reads continue during partition
- Consistency: Writes blocked to prevent split-brain

Q: How do you ensure data consistency across replicas?

Mechanisms:
1. Quorum Writes: W=2, R=2, N=3 (R+W>N)
2. Read Repair: Fix inconsistencies during reads
3. Anti-Entropy: Background process to sync replicas
4. Hinted Handoff: Store writes for unavailable nodes
5. Merkle Trees: Efficient comparison of replica data

Example Read Repair:
  Client reads key "user:123" with R=2
  Node A: version=5, value="Alice"
  Node B: version=6, value="Bob"
  
  Coordinator:
  1. Returns latest value ("Bob") to client
  2. Background: Updates Node A to version=6

Performance and Optimization

Q: How do you optimize for read-heavy workloads?

Strategies:
1. Caching:
   - In-memory cache for hot data
   - 95% hit rate → 20x throughput
   - LRU eviction policy

2. Read Replicas:
   - Add more replicas (N=3 → N=5)
   - Route reads to nearest replica
   - Eventual consistency acceptable

3. Bloom Filters:
   - Reduce unnecessary disk reads
   - 1% false positive rate
   - 90% reduction in disk I/O

4. Compression:
   - Reduce I/O with Snappy/LZ4
   - 3x compression ratio
   - Lower network bandwidth

Result:
- Before: 10K reads/sec, 10ms latency
- After: 200K reads/sec, 1ms latency

Q: How do you handle large values (>1MB)?

Approaches:
1. Chunking:
   - Split large value into chunks
   - Store chunks as separate keys
   - Reassemble on read
   
   Example:
     Original: "large_file"10MB data
     Chunked:
       "large_file:chunk:0"1MB
       "large_file:chunk:1"1MB
       ...
       "large_file:chunk:9"1MB
       "large_file:metadata" → {"chunks": 10, "size": 10MB}

2. External Storage:
   - Store large values in object storage (S3)
   - Store reference in key-value store
   
   Example:
     Key: "large_file"
     Value: {"s3_url": "s3://bucket/large_file", "size": 10MB}

3. Compression:
   - Compress large values
   - Store compressed data
   - Decompress on read

Trade-offs:
- Chunking: More operations, complex logic
- External: Network latency, dependency
- Compression: CPU overhead, latency

Failure Scenarios

Q: What happens when a node fails?

Failure Detection:
1. Heartbeat timeout (10 seconds)
2. Mark node as suspected
3. Confirm failure (30 seconds)
4. Mark node as down

Failover Process:
1. Identify affected partitions
2. Promote replicas to primary
3. Update routing tables
4. Reroute traffic to healthy nodes
5. Trigger hinted handoff replay

Recovery Process:
1. Node comes back online
2. Rejoin cluster
3. Receive hinted handoff data
4. Run repair to sync data
5. Resume normal operations

Impact:
- Availability: No downtime (replicas available)
- Performance: Slight increase in latency
- Data Loss: Zero (with quorum writes)

Q: How do you handle data corruption?

Detection:
1. Checksums: Verify data integrity on read
2. Scrubbing: Background process to check data
3. Replication: Compare replicas for consistency

Recovery:
1. Detect corrupted data (checksum mismatch)
2. Fetch data from healthy replica
3. Overwrite corrupted data
4. Log corruption event
5. Alert monitoring system

Prevention:
- ECC memory
- Checksums on disk
- Replication (N=3)
- Regular backups
- Monitoring and alerts

Advanced Topics

Q: How would you implement transactions?

Two-Phase Commit (2PC):

Phase 1 (Prepare):
1. Coordinator sends PREPARE to all participants
2. Participants lock resources and vote YES/NO
3. Coordinator collects votes

Phase 2 (Commit):
1. If all YES: Coordinator sends COMMIT
2. If any NO: Coordinator sends ABORT
3. Participants commit/abort and release locks

Example:
  Transaction: Transfer $100 from A to B
  
  Prepare:
    - Lock account A
    - Lock account B
    - Check balance A >= $100
    - Vote YES
  
  Commit:
    - Deduct $100 from A
    - Add $100 to B
    - Release locks

Trade-offs:
✅ ACID guarantees
✅ Strong consistency
❌ Higher latency
❌ Blocking protocol
❌ Coordinator failure risk

Q: How do you implement pub/sub?

Architecture:
1. Channel Registry:
   - Map channel → subscribers
   - Store in distributed hash table

2. Message Broker:
   - Receive publish requests
   - Lookup subscribers
   - Deliver messages

3. Subscription Management:
   - Add/remove subscribers
   - Pattern matching (e.g., "user:*")
   - Persistent subscriptions

Example:
  Channel: "notifications"
  Subscribers: [client1, client2, client3]
  
  Publish:
    1. Client publishes to "notifications"
    2. Broker looks up subscribers
    3. Broker delivers to all subscribers
    4. Acknowledge delivery

Scaling:
- Partition channels across nodes
- Use consistent hashing
- Replicate channel registry
- Load balance subscribers

This comprehensive guide covers common variations and follow-up questions that help demonstrate deep understanding of distributed key-value store design and implementation.