Distributed Counter - Interview Tips
Interview Approach (45-60 minutes)
Time Management
1. Requirements (5-8 min)
- Clarify use case (page views, likes, analytics)
- Understand scale (ops/second, total counters)
- Define accuracy requirements
2. High-Level Design (10-15 min)
- Draw system architecture
- Explain sharding strategy
- Discuss storage choices
3. Deep Dives (20-25 min)
- Counter implementation (simple vs sharded)
- Synchronization strategy
- Time-windowed counters
- Failure handling
4. Trade-offs (5-10 min)
- Accuracy vs performance
- Consistency models
- Storage options
5. Follow-ups (5-10 min)
- Scaling strategies
- Unique counting
- Real-time analyticsEssential Questions to Ask
Functional Requirements
Critical Questions:
1. "What are we counting?"
- Page views, likes, API requests, unique visitors?
2. "What accuracy do we need?"
- 100% exact, 99% eventual, 95% approximate?
3. "What operations are needed?"
- Increment only, or also decrement, reset?
4. "Do we need time windows?"
- Last hour, last day, all-time?
5. "What about unique counting?"
- Total counts or unique items?
Good Follow-ups:
- "Should counters expire automatically?"
- "Do we need historical data?"
- "What about aggregations across counters?"Scale Requirements
Key Metrics:
1. "How many increments per second?"
- 1K → Single server
- 100K → Small cluster
- 1M+ → Sharded counters
2. "How many total counters?"
- 1K → In-memory only
- 1M → Redis + database
- 100M+ → Distributed storage
3. "What's the acceptable latency?"
- <1ms → In-memory, eventual consistency
- <10ms → Hybrid approach
- <100ms → Database-backed
4. "How long to retain data?"
- 1 day → Memory only
- 30 days → Database
- 1 year → Archival storageCommon Pitfalls to Avoid
1. Not Discussing Sharding
❌ Bad: "We'll use a single counter in Redis"
✅ Good: "For high-traffic counters (>10K ops/s), we'll use
sharded counters. Split the counter into N shards, increment
a random shard, and aggregate on read. This eliminates
contention and scales linearly."
Key Points:
- Explain when to shard (traffic threshold)
- Discuss shard count selection
- Address aggregation overhead2. Ignoring Consistency Trade-offs
❌ Bad: "The counter will always be accurate"
✅ Good: "We have three options:
1. Strong consistency (100% accurate):
- Synchronous updates, distributed locks
- 5-10ms latency, 100K ops/s
- Use for: Financial, voting
2. Eventual consistency (99%+ accurate):
- Async replication, CRDT merging
- <1ms latency, 1M+ ops/s
- Use for: Page views, analytics
3. Approximate (95%+ accurate):
- HyperLogLog, probabilistic
- <0.5ms latency, 10M+ ops/s
- Use for: Unique counts, dashboards
I'd recommend eventual consistency for most use cases."
Key Points:
- Present multiple options
- Explain trade-offs clearly
- Provide specific recommendations3. Not Addressing Hot Counters
❌ Bad: "All counters are treated the same"
✅ Good: "We need to handle hot counters differently:
Detection:
- Monitor ops/second per counter
- Alert if >10K ops/s
Mitigation:
1. Increase shard count (4 → 16 shards)
2. Dedicated server for hot counter
3. Use approximate counting (HyperLogLog)
4. Rate limiting if abuse detected
This prevents a single hot counter from becoming a
bottleneck for the entire system."
Key Points:
- Show awareness of hot spots
- Provide detection mechanism
- Offer multiple solutions4. Overlooking Time-Windowed Counters
❌ Bad: "We'll just store the total count"
✅ Good: "For time-windowed counters (e.g., requests per hour),
we'll use a sliding window approach:
1. Store counts in 1-minute buckets
2. Use Redis sorted set with timestamp as score
3. On read, sum buckets in the window
4. Auto-expire old buckets
This provides accurate sliding windows without boundary
issues that fixed windows have."
Key Points:
- Explain sliding vs fixed windows
- Discuss bucket granularity
- Address automatic cleanupStrong Talking Points
1. Sharded Counter Implementation
Strong Answer:
"For high-throughput counters, I'll implement sharded counters:
Structure:
counter:page_views:article_123
├── shard:0 → 123,456
├── shard:1 → 234,567
├── shard:2 → 345,678
└── shard:3 → 456,789
Total: 1,160,490
Increment:
1. Hash counter_id to determine server
2. Select random shard (0-3)
3. Increment shard atomically
4. Return immediately (no aggregation)
Read:
1. Query all shards in parallel
2. Sum values
3. Cache result for 1 second
4. Return total
Benefits:
- No contention (random shard selection)
- Linear scaling (add more shards)
- Fast writes (<1ms)
- Acceptable read latency (<5ms)
Shard Count Selection:
- Start with 4 shards
- Increase if ops/s > 10K per shard
- Max 64 shards (diminishing returns)
Why This Works:
- Shows deep understanding
- Provides implementation details
- Discusses scaling strategy
- Addresses trade-offs2. CRDT for Distributed Counters
Strong Answer:
"For multi-region deployment, I'll use CRDTs
(Conflict-free Replicated Data Types):
G-Counter (Grow-only):
{
"us-east": 1000,
"us-west": 2000,
"eu-west": 3000
}
Total: 6000
Operations:
- Increment (us-east): us-east += delta
- Read: sum(all regions)
- Merge: for each region, take max(local, remote)
PN-Counter (Positive-Negative):
{
"positive": {"us-east": 1000, "us-west": 2000},
"negative": {"us-east": 100, "us-west": 200}
}
Total: (1000 + 2000) - (100 + 200) = 2700
Benefits:
- No coordination needed
- Partition-tolerant
- Eventually consistent
- Guaranteed convergence
Trade-offs:
- Memory grows with regions
- Cannot enforce limits (e.g., non-negative)
- Eventual consistency only
Use When:
- Multi-region deployment
- High availability required
- Eventual consistency acceptable
Why This Works:
- Advanced technique
- Shows distributed systems knowledge
- Explains limitations
- Practical application3. HyperLogLog for Unique Counts
Strong Answer:
"For counting unique items (e.g., unique visitors),
I'll use HyperLogLog:
Problem:
- Billions of unique users
- Cannot store all user IDs
- Need approximate count
HyperLogLog Solution:
- Fixed memory: 12KB per counter
- Accuracy: 0.81% standard error
- Capacity: Billions of unique items
How it works:
1. Hash user ID
2. Count leading zeros in hash
3. Update register with max value
4. Estimate cardinality from registers
Operations:
ADD(user_id):
hash = hash(user_id)
register = hash & 0xFFFF
value = leading_zeros(hash >> 16)
registers[register] = max(registers[register], value)
COUNT():
return 2^(average(registers)) × correction_factor
Benefits:
- Constant memory (12KB)
- Very fast operations
- Mergeable (combine multiple HLLs)
- Good enough accuracy (0.81% error)
Trade-offs:
- Approximate (not exact)
- Cannot decrement
- Cannot list items
- Cannot check membership
Use Cases:
- Unique page visitors
- Unique video viewers
- Daily/Monthly active users
- Cardinality estimation
Why This Works:
- Solves specific problem
- Explains algorithm clearly
- Discusses trade-offs
- Provides use casesHow to Handle Follow-ups
"How would you count unique visitors?"
Strong Answer:
"Great question. For unique counting, I have two approaches:
Approach 1: Exact Count (Small Scale)
- Store user IDs in Redis Set
- SADD unique_visitors:article_123 user_id
- SCARD to get count
- Memory: 8 bytes per user
- Limit: ~10M unique users per counter
Approach 2: Approximate Count (Large Scale)
- Use HyperLogLog
- PFADD unique_visitors:article_123 user_id
- PFCOUNT to get estimate
- Memory: 12KB fixed
- Capacity: Billions of users
- Error: 0.81%
Recommendation:
- Use exact for <10M unique items
- Use HyperLogLog for >10M unique items
- Trade 0.81% accuracy for 1000x memory savings
For most use cases (analytics, dashboards), the 0.81%
error is acceptable and the memory savings are huge."
Why This Works:
- Two solutions for different scales
- Clear recommendation
- Quantifies trade-offs
- Practical guidance"How would you handle counter overflow?"
Strong Answer:
"Counter overflow is rarely an issue in practice:
64-bit Integer Max: 9,223,372,036,854,775,807
At 1M increments/second: 292,471 years to overflow
However, if we need arbitrary precision:
Solution 1: BigInteger
- Use arbitrary precision arithmetic
- Store as string in Redis
- Slower but no overflow
Solution 2: Hierarchical Counters
- counter_low: 0 to 2^32-1
- counter_high: overflow count
- total = counter_high × 2^32 + counter_low
Solution 3: Scientific Notation
- Store as mantissa × 10^exponent
- Example: 1.23 × 10^15
Recommendation:
- 64-bit is sufficient for 99.99% of use cases
- Only use arbitrary precision if truly needed
- Monitor for approaching overflow (>2^60)
The key insight is that overflow is a theoretical concern
but rarely a practical problem."
Why This Works:
- Addresses concern directly
- Provides context (time to overflow)
- Offers solutions if needed
- Practical recommendationRed Flags to Avoid
❌ Don't Say:
1. "We'll use a database counter"
→ Too slow for high throughput
2. "All counters will be 100% accurate"
→ Ignores distributed systems challenges
3. "We don't need to worry about hot counters"
→ Shows lack of practical experience
4. "We'll lock the counter for updates"
→ Terrible for performance
5. "Overflow isn't a problem"
→ Should at least acknowledge it✅ Do Say:
1. "Let me clarify the accuracy requirements"
→ Shows structured thinking
2. "Here are three approaches with trade-offs"
→ Demonstrates options analysis
3. "For high-traffic counters, we'll use sharding"
→ Shows scalability awareness
4. "We'll use CRDTs for eventual consistency"
→ Advanced distributed systems knowledge
5. "HyperLogLog for unique counts"
→ Knows appropriate algorithmsClosing Strong
Good Summary
"To summarize, I've designed a distributed counter system that:
1. Uses sharded counters for high throughput (1M+ ops/s)
2. Implements eventual consistency with CRDTs
3. Supports time-windowed counting with sliding windows
4. Uses HyperLogLog for unique counting
5. Handles hot counters with dynamic sharding
Key design decisions:
- Eventual consistency (99%+ accuracy)
- Redis for hot data, database for persistence
- Sharding threshold at 10K ops/s per counter
- 1-minute buckets for time windows
Trade-offs made:
- Accuracy for performance (99% vs 100%)
- Read latency for write throughput
- Memory for speed (caching)
I'm happy to dive deeper into any specific area."
Why This Works:
- Concise summary
- Highlights key decisions
- Acknowledges trade-offs
- Invites discussionRemember: Focus on demonstrating systematic thinking, understanding of trade-offs, and practical engineering judgment. Good luck!