Problem Statement

📖 7 min read 📄 Part 1 of 10

Design Twitter - Problem Statement

Overview

Design a microblogging platform similar to Twitter that allows users to post short messages (tweets), follow other users, and see a personalized timeline of tweets from people they follow. The system should handle hundreds of millions of users with real-time tweet delivery and high availability.

Functional Requirements

Core Tweet Features

  • Post Tweets: Users can post text messages up to 280 characters
  • Media Attachments: Support images (up to 4 per tweet), videos (up to 2:20 minutes), GIFs
  • Tweet Interactions: Like, retweet, quote tweet, reply to tweets
  • Tweet Threading: Create threaded conversations with multiple connected tweets
  • Tweet Deletion: Users can delete their own tweets
  • Tweet Editing: Edit tweets within 30 minutes with edit history visible

Timeline Features

  • Home Timeline: Personalized feed of tweets from followed users
  • User Timeline: View all tweets from a specific user
  • Mentions Timeline: See tweets that mention the user
  • Trending Topics: Display trending hashtags and topics
  • Search: Full-text search for tweets, users, and hashtags
  • Bookmarks: Save tweets for later viewing

Social Features

  • Follow/Unfollow: Follow other users to see their tweets
  • Followers/Following Lists: View who follows you and who you follow
  • User Profiles: Display name, bio, profile picture, banner, verification badge
  • Direct Messages: Private messaging between users (out of scope for initial design)
  • Notifications: Real-time notifications for likes, retweets, mentions, new followers
  • Lists: Create curated lists of users to follow specific topics

Content Discovery

  • Trending Section: Real-time trending topics and hashtags
  • Explore Page: Discover new content based on interests
  • Recommendations: Suggested users to follow
  • Hashtags: Click hashtags to see related tweets
  • Moments: Curated collections of tweets about events
  • Topics: Follow specific topics of interest

User Management

  • Registration: Sign up with email/phone, username, password
  • Authentication: Secure login with optional 2FA
  • Profile Management: Update profile information, settings
  • Privacy Controls: Public, protected (private), or blocked accounts
  • Verification: Blue checkmark for verified accounts
  • Account Suspension: Moderation and suspension capabilities

Non-Functional Requirements

Performance Requirements

  • Tweet Posting Latency: <200ms for tweet creation
  • Timeline Load Time: <1 second for home timeline (50 tweets)
  • Search Response Time: <500ms for search queries
  • Media Upload: Images <5MB upload within 5 seconds
  • Real-time Updates: New tweets appear within 5 seconds
  • API Response Time: 95th percentile <300ms

Scalability Requirements

  • Registered Users: Support 500 million registered users
  • Daily Active Users: 200 million DAU
  • Tweets per Day: 500 million tweets posted daily
  • Timeline Requests: 10 billion timeline requests per day
  • Peak Load: Handle 3x normal load during major events
  • Concurrent Users: 50 million concurrent active users

Reliability Requirements

  • System Uptime: 99.9% availability (8.76 hours downtime per year)
  • Data Durability: 99.999999999% (11 9's) for tweet data
  • Tweet Delivery: 99.5% successful tweet delivery to followers
  • Disaster Recovery: <2 hours RTO, <15 minutes RPO
  • Graceful Degradation: Maintain core features during partial outages
  • Zero-Downtime Deployments: Rolling updates without service interruption

Security Requirements

  • Authentication: OAuth 2.0, JWT tokens, optional 2FA
  • Authorization: Role-based access control for tweets and profiles
  • Data Encryption: All data encrypted at rest and in transit (TLS 1.3)
  • Privacy Compliance: GDPR, CCPA compliance for user data
  • Content Moderation: Automated and manual content review
  • Rate Limiting: Prevent spam and API abuse
  • Audit Logging: Comprehensive security event logging

Consistency Requirements

  • Tweet Ordering: Eventual consistency for timeline delivery
  • Follower Counts: Eventually consistent follower/following counts
  • Like Counts: Eventually consistent engagement metrics
  • User Profile: Strong consistency for profile updates
  • Tweet Deletion: Eventually consistent deletion across timelines
  • Trending Topics: Near real-time consistency (5-minute lag acceptable)

Scale Estimates

User Metrics

  • Total Users: 500 million registered users
  • Daily Active Users: 200 million (40% of registered)
  • Monthly Active Users: 350 million (70% of registered)
  • Average Followers: 200 followers per user
  • Power Users: 1% of users have >100K followers

Content Metrics

  • Tweets per Day: 500 million tweets
  • Tweets per Second: ~6,000 average, ~20,000 peak
  • Average Tweet Size: 200 bytes (text) + 500KB (media average)
  • Media Tweets: 50% of tweets include media
  • Retweets: 30% of timeline content is retweets

Traffic Metrics

  • Timeline Requests: 10 billion per day (~115K per second)
  • Read:Write Ratio: 100:1 (mostly read-heavy)
  • Search Queries: 500 million per day
  • API Calls: 5 billion per day from third-party apps

Storage Estimates

  • Tweet Storage: 500M tweets/day × 200 bytes = 100GB/day (text only)
  • Media Storage: 250M media tweets/day × 500KB = 125TB/day
  • Total Daily Storage: ~125TB per day
  • 5-Year Storage: ~230PB (with compression and deduplication)

Bandwidth Estimates

  • Incoming: 500M tweets/day × 500KB = 250TB/day = 2.9GB/s
  • Outgoing: 10B timeline requests × 50 tweets × 200 bytes = 100TB/day = 1.2GB/s
  • Peak Bandwidth: 3x average = 12GB/s during major events

Edge Cases and Constraints

Celebrity User Problem

  • Challenge: Users with millions of followers cause fan-out explosion
  • Impact: Single tweet generates millions of timeline writes
  • Solution: Separate handling for users with >1M followers
  • Approach: Pull-based timeline generation for celebrity tweets

Viral Content

  • Challenge: Tweets that go viral cause sudden traffic spikes
  • Impact: Rapid increase in retweets, likes, and timeline requests
  • Solution: Caching, rate limiting, and load shedding
  • Approach: Pre-compute popular content, use CDN aggressively

Trending Topics

  • Challenge: Real-time computation of trending hashtags
  • Impact: High computational cost for aggregation
  • Solution: Approximate counting algorithms (Count-Min Sketch)
  • Approach: Sliding window with decay for recency

Timeline Consistency

  • Challenge: Showing consistent timeline across devices
  • Impact: User sees different tweets on web vs mobile
  • Solution: Timeline versioning with cursor-based pagination
  • Approach: Eventual consistency with conflict resolution

Deleted Tweet Propagation

  • Challenge: Removing deleted tweets from all timelines
  • Impact: Deleted tweets may still appear temporarily
  • Solution: Lazy deletion with tombstone markers
  • Approach: Background cleanup jobs with eventual consistency

Network Partitions

  • Challenge: Data center isolation during network failures
  • Impact: Users in different regions see different data
  • Solution: Multi-region active-active deployment
  • Approach: Conflict resolution with last-write-wins

Success Metrics

User Engagement

  • Daily Active Users: Target 200 million DAU
  • Tweet Volume: 500+ million tweets per day
  • Session Duration: Average 15+ minutes per session
  • Retention Rate: 80%+ day-1 retention, 60%+ day-30 retention
  • Engagement Rate: 5%+ of users tweet daily, 50%+ engage (like/retweet)
  • Timeline Refresh: Average 20+ timeline refreshes per session

Performance Metrics

  • Tweet Posting Success: 99.9%+ successful tweet posts
  • Timeline Load Time: 95th percentile <1 second
  • Search Performance: 95th percentile <500ms
  • Media Upload Success: 99%+ successful media uploads
  • API Availability: 99.9%+ API uptime
  • Real-time Delivery: 95%+ of tweets delivered within 5 seconds

Business Metrics

  • Revenue per User: Increase through promoted tweets and ads
  • Ad Engagement: 2%+ click-through rate on promoted content
  • API Usage: 5 billion+ API calls per day from ecosystem
  • Support Ticket Volume: <0.05% of DAU requiring support
  • Infrastructure Cost: <$0.02 per user per month
  • Content Moderation: <1% of tweets requiring manual review

Technical Metrics

  • System Uptime: 99.9%+ availability
  • Database Performance: <10ms p95 query latency
  • Cache Hit Rate: >90% for timeline requests
  • CDN Offload: >80% of media served from CDN
  • Error Rate: <0.1% of requests result in errors
  • Deployment Frequency: Multiple deployments per day with zero downtime

Out of Scope

Features Not Included in Initial Design

  • Direct Messages: Private messaging system (separate design)
  • Spaces: Live audio conversations (separate design)
  • Fleets: Temporary stories feature
  • Twitter Blue: Premium subscription features
  • Advanced Analytics: Detailed analytics dashboard for users
  • Third-Party Apps: OAuth app management and developer portal
  • Advertising Platform: Ad creation and campaign management
  • Content Moderation Tools: Advanced moderation dashboard
  • Machine Learning Models: Recommendation algorithm details
  • Video Live Streaming: Live video broadcasting (Periscope)

This problem statement provides the foundation for designing a scalable, reliable, and performant microblogging platform that can compete with Twitter while maintaining high performance and user satisfaction.