Design Reddit - Interview Tips
Interview Approach
Step 1: Requirements (5-10 min)
Questions:
- "Core features: Subreddits, posts, comments, voting?"
- "Scale: How many users and subreddits?"
- "Ranking: Hot, new, top, controversial?"
- "Moderation: Community-driven or centralized?"
Sample Dialog:
You: "Let me clarify:
- Subreddits: Topic-based communities?
- Posts: Text, links, images, videos?
- Comments: Threaded discussions?
- Voting: Upvote/downvote system?
- Scale: Reddit's scale (500M users, 3M subreddits)?"Step 2: Calculations (5 min)
Users:
- Registered: 500M
- DAU: 50M (10%)
- Subreddits: 3M
Content:
- Posts/day: 2M (23 TPS)
- Comments/day: 20M (231 TPS)
- Votes/day: 500M (5,787 TPS)
Storage (5 years):
- Posts: 2M × 365 × 5 × 1KB = 3.65TB
- Comments: 20M × 365 × 5 × 500B = 18.25TB
- Total: ~22TB
Traffic:
- Page views: 20B/month
- Read:Write: 100:1Step 3: High-Level Design (10-15 min)
Components:
- Post Service
- Comment Service
- Vote Service
- Ranking Service
- Subreddit Service
- Moderation Service
- PostgreSQL (sharded)
- Redis (caching)
- Elasticsearch (search)Step 4: Deep Dive (15-20 min)
Critical Decision 1: Voting System
Interviewer: "How do you implement voting?"
Strong Answer:
"I'd use eventual consistency:
1. Vote Registration:
- Store vote in votes table
- Publish event to Kafka
- Return success immediately
2. Count Aggregation:
- Batch process every 5 seconds
- Aggregate votes per post/comment
- Update score in database
- Invalidate cache
3. Anti-Manipulation:
- Rate limit: 1000 votes/hour
- Vote fuzzing: Add random noise
- ML detection: Identify bots
- Shadow banning: Hide manipulated votes
This provides scalability while preventing manipulation."Critical Decision 2: Hot Algorithm
Interviewer: "How do you rank posts?"
Strong Answer:
"I'd use Reddit's Hot algorithm:
Formula:
score = log10(max(|ups-downs|, 1)) × sign + age_seconds/45000
Components:
- Logarithmic: Diminishing returns for votes
- Time decay: Older posts rank lower
- 45000 seconds: ~12.5 hours per rank
Implementation:
- Pre-compute scores every 5 minutes
- Cache rankings in Redis (5 min TTL)
- Recompute on new votes
This balances popularity with recency."Critical Decision 3: Comment Threading
Interviewer: "How do you implement threaded comments?"
Strong Answer:
"I'd use nested set model:
Schema:
- parent_comment_id: Link to parent
- depth: Nesting level (max 10)
- path: Hierarchical path (e.g., '1.2.5')
Query:
SELECT * FROM comments
WHERE post_id = ? AND path LIKE '1.2.%'
ORDER BY path, score DESC;
Optimization:
- Cache entire comment trees in Redis
- Lazy load deeply nested comments
- Pagination for large threads
This provides efficient threaded discussions."Common Pitfalls
Pitfall 1: Ignoring Vote Manipulation
❌ Wrong: "Just count votes in real-time" ✅ Right: "Batch updates, vote fuzzing, rate limiting"
Pitfall 2: Not Considering Ranking
❌ Wrong: "Sort by timestamp" ✅ Right: "Hot algorithm balances popularity and recency"
Pitfall 3: Forgetting Moderation
❌ Wrong: "No moderation tools" ✅ Right: "AutoModerator + human moderators"
Pitfall 4: Ignoring Comment Threading
❌ Wrong: "Flat comment structure" ✅ Right: "Nested with parent_comment_id"
Impressive Points
Technical Depth
1. Hot Algorithm:
"Reddit's Hot algorithm uses logarithmic scaling
and time decay to balance popularity with recency"
2. Wilson Confidence Interval:
"Best sorting uses Wilson score for statistical
confidence in upvote ratio"
3. Vote Fuzzing:
"Add random noise to vote counts to prevent
manipulation detection"Time Management
0-5 min: Requirements
5-10 min: Calculations
10-25 min: High-level design
25-40 min: Deep dive (voting, ranking, threading)
40-45 min: Wrap-upFinal Tips
Do's ✅
- Discuss voting system
- Explain Hot algorithm
- Consider comment threading
- Mention moderation tools
- Think about vote manipulation
Don'ts ❌
- Don't ignore ranking algorithms
- Don't forget moderation
- Don't overlook threading
- Don't skip vote manipulation
Remember: Show your understanding of Reddit's unique features (voting, ranking, threading) and how they scale!