Scaling Considerations

📖 2 min read 📄 Part 6 of 10

Design Reddit - Scaling Considerations

Horizontal Scaling

  • Application Servers: 5,000 stateless servers
  • Database Sharding: 500 shards by subreddit_id
  • Read Replicas: 5 replicas per shard
  • Auto-scaling: Based on CPU/memory

Caching Strategy

  • Hot Posts: Redis (1 hour TTL, 90% hit rate)
  • Comment Trees: Redis (30 min TTL)
  • Vote Counts: Redis (5 min TTL)
  • Subreddit Info: Redis (1 day TTL)

Ranking Algorithms

Hot Algorithm

def hot(ups, downs, date):
    score = ups - downs
    order = log10(max(abs(score), 1))
    sign = 1 if score > 0 else -1 if score < 0 else 0
    seconds = date - 1134028003
    return round(sign * order + seconds / 45000, 7)

Best Algorithm (Wilson Score)

def confidence(ups, downs):
    n = ups + downs
    if n == 0:
        return 0
    z = 1.96  # 95% confidence
    phat = ups / n
    return (phat + z*z/(2*n) - z * sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n)

Vote Counting

  • Eventual Consistency: Vote counts updated asynchronously
  • Batch Updates: Aggregate votes every 5 seconds
  • Cache Invalidation: Invalidate on vote
  • Anti-Cheating: Rate limiting, vote fuzzing

Comment Threading

  • Nested Storage: parent_comment_id for hierarchy
  • Depth Limit: Max 10 levels deep
  • Load More: Pagination for large threads
  • Caching: Cache entire comment trees

Performance Optimization

  • Database Indexes: Composite indexes on (subreddit_id, score)
  • Query Optimization: Limit result sets, use covering indexes
  • CDN: Serve static content from CDN
  • Lazy Loading: Load comments on demand

Auto-Scaling

  • Triggers: CPU >70%, Memory >80%, Queue depth >1000
  • Policies: Target tracking at 60% CPU
  • Scheduled: Scale for peak hours (6-11 PM)

Monitoring

  • Metrics: Response time, error rate, cache hit rate
  • Alerting: Critical >5% error rate, Warning >1%
  • Dashboards: Real-time system health

This scaling strategy ensures Reddit can handle millions of communities and billions of votes.