Design Twitter - Interview Tips

Interview Approach Strategy

Step 1: Requirements Clarification (5-10 minutes)

Essential Questions to Ask

Functional Requirements:
- "What are the core features? Just tweets and timelines, or also DMs, search, trending?"
- "Do we need to support media (images, videos)?"
- "Should we implement retweets, likes, and replies?"
- "Do we need real-time notifications?"

Scale Requirements:
- "How many users are we designing for? 100M, 500M, 1B?"
- "How many tweets per day? 100M, 500M, 1B?"
- "What's the read-to-write ratio? 100:1, 1000:1?"
- "What's the acceptable latency for timeline loads?"

Non-Functional Requirements:
- "What's the availability requirement? 99.9%, 99.99%?"
- "Do we need to support multiple regions?"
- "Are there any compliance requirements (GDPR, CCPA)?"
- "What's the budget constraint?"

Sample Clarification Dialog

Interviewer: "Design Twitter"

You: "Great! Let me clarify the requirements:

Core Features:
- Users can post tweets (280 characters)
- Users can follow other users
- Users see a timeline of tweets from people they follow
- Users can like, retweet, and reply to tweets
- Should I include DMs, search, and trending topics?

Scale:
- Are we targeting Twitter's scale (500M users, 200M DAU)?
- Should I assume 500M tweets per day?
- Is the read-to-write ratio around 100:1?

Performance:
- Is <1 second timeline load acceptable?
- Should tweet posting be <200ms?
- Do we need real-time updates or is 5-second delay okay?

Interviewer: "Yes, focus on core features. Twitter scale is good. 
Real-time is nice-to-have but not critical."

You: "Perfect! Let me start with high-level architecture..."

Step 2: Back-of-the-Envelope Calculations (5 minutes)

Calculate Scale

Users:
- Total users: 500M
- DAU: 200M (40%)
- Average followers: 200

Tweets:
- Tweets per day: 500M
- Tweets per second: 500M / 86400 = ~6K TPS
- Peak TPS: 3x = 18K TPS

Timeline Requests:
- Each user checks timeline 10 times/day
- Total requests: 200M × 10 = 2B requests/day
- Requests per second: 2B / 86400 = ~23K RPS
- Peak RPS: 3x = 70K RPS

Storage:
- Tweet size: 200 bytes (text) + 500 bytes (metadata) = 700 bytes
- Daily storage: 500M × 700 bytes = 350GB/day
- With media (50% have media, avg 200KB): 250M × 200KB = 50TB/day
- Total: ~50TB/day

Bandwidth:
- Ingress: 50TB/day = 580MB/s
- Egress: 2B requests × 50 tweets × 700 bytes = 70TB/day = 810MB/s
- With CDN: 85% offload, origin serves 120MB/s

Pro Tip: Write these calculations on the whiteboard. Shows you understand scale and can do quick math.

Step 3: High-Level Design (10-15 minutes)

Start Simple

Step 1: Basic Architecture
┌─────────┐     ┌─────────┐     ┌─────────┐
│ Clients │────▶│   API   │────▶│Database │
└─────────┘     │ Gateway │     └─────────┘
                └─────────┘

Step 2: Add Core Services
┌─────────┐     ┌─────────┐     ┌──────────┐     ┌─────────┐
│ Clients │────▶│   API   │────▶│  Tweet   │────▶│Database │
└─────────┘     │ Gateway │     │ Service  │     └─────────┘
                └─────────┘     └──────────┘
                      │
                      ▼
                ┌──────────┐
                │ Timeline │
                │ Service  │
                └──────────┘

Step 3: Add Caching and Queues
[Full architecture with cache, message queue, CDN, etc.]

Pro Tip: Draw incrementally. Don't overwhelm with complexity upfront.

Key Components to Mention

1. API Gateway: Authentication, rate limiting, routing
2. Tweet Service: Create, read, delete tweets
3. Timeline Service: Generate personalized timelines
4. Fan-out Service: Distribute tweets to followers
5. User Service: User profiles and authentication
6. Social Graph Service: Follow relationships
7. Media Service: Handle images and videos
8. Search Service: Full-text search
9. Notification Service: Push notifications
10. Cache Layer: Redis for performance
11. Message Queue: Kafka for async processing
12. CDN: Media delivery

Step 4: Deep Dive (15-20 minutes)

Critical Design Decisions

Decision 1: Fan-out Strategy

Interviewer: "How do you deliver tweets to followers?"

Strong Answer:
"I'd use a hybrid fan-out approach:

For regular users (<10K followers):
- Fan-out on write (push model)
- Pre-compute timelines when tweet is posted
- Fast reads, slower writes
- Store in timeline table partitioned by user_id

For celebrity users (>1M followers):
- Fan-out on read (pull model)
- Fetch tweets on-demand when timeline requested
- Fast writes, slower reads
- Cache aggressively to mitigate read cost

For medium users (10K-1M):
- Partial fan-out to active followers only
- Pull for inactive followers

This balances write and read performance across different user types."

Decision 2: Database Choice

Interviewer: "What database would you use?"

Strong Answer:
"I'd use polyglot persistence:

Cassandra for tweets and timelines:
- High write throughput (6K TPS)
- Time-series data model
- Horizontal scalability
- Partition by tweet_id or user_id

PostgreSQL for user data:
- ACID compliance for critical data
- Relational model for user profiles
- Master-replica for read scaling

Redis for caching:
- In-memory for fast access
- Cache timelines, user profiles, social graph
- 90%+ cache hit rate target

This optimizes for each data access pattern."

Decision 3: Handling Celebrity Users

Interviewer: "What if a user has 150M followers?"

Strong Answer:
"This is the 'hot user problem'. Solutions:

1. Skip fan-out for celebrity users:
   - Don't write to 150M timelines
   - Fetch celebrity tweets on-demand
   - Merge with pre-computed timeline

2. Separate infrastructure:
   - Dedicated cache for celebrity tweets
   - Higher cache TTL (24 hours vs 5 minutes)
   - CDN caching for popular content

3. Rate limiting:
   - Limit fan-out to 10K writes/second
   - Process in batches of 1000
   - Use priority queue (active users first)

4. Async processing:
   - Fan-out via message queue
   - Non-blocking for user
   - Eventual consistency acceptable

This prevents celebrity tweets from overwhelming the system."

Common Pitfalls to Avoid

Pitfall 1: Jumping to Implementation Details

❌ Wrong: "I'll use Kafka with 100 partitions and..."
✅ Right: "I need a message queue for async processing. Kafka would work well because..."

Explain WHY before HOW.

Pitfall 2: Ignoring Scale

❌ Wrong: "I'll use a single database server..."
✅ Right: "With 6K writes/second, I need database sharding..."

Always consider scale implications.

Pitfall 3: Over-Engineering

❌ Wrong: "I'll implement CQRS, event sourcing, saga pattern..."
✅ Right: "I'll start with microservices and add complexity as needed..."

Start simple, add complexity when justified.

Pitfall 4: Not Discussing Tradeoffs

❌ Wrong: "I'll use eventual consistency."
✅ Right: "I'll use eventual consistency for timelines because 5-second delay is acceptable, but strong consistency for follow operations because..."

Always explain tradeoffs.

Pitfall 5: Forgetting Edge Cases

❌ Wrong: Only discussing happy path
✅ Right: "What if a tweet goes viral? What if a user deletes a tweet after it's been retweeted?"

Proactively mention edge cases.

Impressive Points to Mention

Technical Depth

1. Snowflake ID Generation:
   "For tweet IDs, I'd use Snowflake algorithm: 
   - 41 bits timestamp
   - 10 bits machine ID
   - 12 bits sequence number
   - Sortable, unique, distributed generation"

2. Consistent Hashing:
   "For cache distribution, I'd use consistent hashing:
   - Minimize cache misses during scaling
   - Virtual nodes for even distribution
   - Handles node failures gracefully"

3. Count-Min Sketch:
   "For trending topics, I'd use Count-Min Sketch:
   - Probabilistic data structure
   - Space-efficient counting
   - Acceptable error rate for trends"

4. Bloom Filters:
   "For spam detection, I'd use Bloom filters:
   - Check if tweet is duplicate
   - Space-efficient
   - False positives acceptable"

System Design Patterns

1. Circuit Breaker:
   "Prevent cascading failures with circuit breakers:
   - Open circuit after N failures
   - Fail fast instead of waiting
   - Periodic health checks to close circuit"

2. Bulkhead Pattern:
   "Isolate resources to prevent total failure:
   - Separate thread pools per service
   - Limit connections per dependency
   - One service failure doesn't affect others"

3. CQRS (if asked about advanced topics):
   "Separate read and write models:
   - Optimize writes for tweet creation
   - Optimize reads for timeline generation
   - Different databases for each"

Real-World Considerations

1. Cost Optimization:
   "To reduce costs:
   - Tiered storage (hot/warm/cold)
   - Aggressive CDN caching (85% hit rate)
   - Compress media files
   - Archive old tweets to S3 Glacier
   - Target: <$0.11 per DAU per month"

2. Monitoring:
   "Key metrics to monitor:
   - Tweet posting latency (p95 <200ms)
   - Timeline load time (p95 <1s)
   - Error rate (<0.1%)
   - Cache hit rate (>90%)
   - Database query latency (p95 <10ms)"

3. Disaster Recovery:
   "Multi-region active-active:
   - Async replication between regions
   - RPO: 15 minutes
   - RTO: 2 hours
   - Automatic failover with health checks"

Time Management

45-Minute Interview Breakdown

0-5 min: Requirements clarification
5-10 min: Back-of-envelope calculations
10-25 min: High-level design and core components
25-40 min: Deep dive into 2-3 components
40-45 min: Wrap-up, edge cases, questions

Adjust based on interviewer's focus.

What to Prioritize

Must Cover:
- High-level architecture
- Database design
- Fan-out strategy
- Caching strategy
- Scaling approach

Nice to Have:
- Security and privacy
- Monitoring and alerting
- Cost optimization
- Disaster recovery

Skip if Time Limited:
- Detailed API design
- Specific code implementations
- Advanced ML algorithms

Sample Interview Questions and Answers

Q: "How would you implement the home timeline?"

Strong Answer:

"I'd use a hybrid approach:

1. For regular users:
   - Pre-compute timeline using fan-out on write
   - Store in Cassandra partitioned by user_id
   - Cache in Redis for fast access

2. For celebrity users:
   - Fetch tweets on-demand (fan-out on read)
   - Merge with pre-computed timeline
   - Cache aggressively

3. Timeline generation:
   - Fetch from cache (Redis)
   - If miss, query database
   - Merge celebrity tweets
   - Sort by timestamp
   - Return top 50 tweets
   - Cache result (TTL: 5 minutes)

4. Optimization:
   - Cursor-based pagination
   - Lazy loading for media
   - Prefetch next page

This provides <1 second load time for 95% of requests."

Q: "How do you handle a viral tweet?"

Strong Answer:

"Viral tweets require special handling:

1. Detection:
   - Monitor engagement velocity
   - Threshold: >1000 likes/minute

2. Actions:
   - Increase cache TTL (5 min → 1 hour)
   - Pre-warm CDN cache globally
   - Scale up read replicas
   - Enable aggressive caching

3. Optimization:
   - Serve from CDN edge locations
   - Use stale-while-revalidate
   - Implement request coalescing
   - Add circuit breakers

4. Monitoring:
   - Track cache hit rate
   - Monitor database load
   - Alert on anomalies

This prevents viral content from overwhelming the system."

Final Tips

Do's ✅

Ask clarifying questions upfront
Start with simple design, add complexity gradually
Explain your reasoning for each decision
Discuss tradeoffs explicitly
Consider scale at every step
Mention edge cases proactively
Be prepared to dive deep into any component
Show enthusiasm and engagement

Don'ts ❌

Don't jump straight to implementation
Don't ignore scale requirements
Don't over-engineer from the start
Don't forget about failures and edge cases
Don't be afraid to say "I don't know"
Don't argue with interviewer
Don't spend too long on one component
Don't forget to manage time

If You Get Stuck

1. Ask for hints: "Could you give me a hint about X?"
2. Think out loud: "I'm thinking about two approaches..."
3. Discuss tradeoffs: "Approach A has X benefit but Y drawback..."
4. Relate to experience: "In my previous project, we used..."
5. Be honest: "I'm not familiar with X, but I would approach it by..."

Remember: The goal is to demonstrate your system design thinking process, not to create a perfect solution. Show how you approach complex problems, consider tradeoffs, and make informed decisions.