Interview Tips

📖 11 min read 📄 Part 10 of 10

Design Instagram - Interview Tips

Interview Approach Strategy

Step 1: Requirements Clarification (5-10 minutes)

Essential Questions to Ask

Functional Requirements:
- "What are the core features? Just photo sharing, or also stories, DMs, shopping?"
- "Do we need to support videos?"
- "Should we implement filters and effects?"
- "Do we need real-time features like live streaming?"

Scale Requirements:
- "How many users? 500M, 1B, 2B?"
- "How many photos/videos uploaded per day?"
- "What's the read-to-write ratio?"
- "What's acceptable latency for feed loads and uploads?"

Non-Functional Requirements:
- "What's the availability requirement? 99.9%, 99.95%?"
- "Do we need to support multiple regions?"
- "Are there compliance requirements (GDPR, CCPA)?"
- "What's the budget constraint?"

Sample Clarification Dialog

Interviewer: "Design Instagram"

You: "Great! Let me clarify the requirements:

Core Features:
- Users can upload photos and videos
- Users can follow other users
- Users see a feed of posts from people they follow
- Users can like, comment, and share posts
- Should I include stories (24-hour content)?

Scale:
- Are we targeting Instagram's scale (2B users, 500M DAU)?
- Should I assume 100M photos and 50M videos per day?
- Is the read-to-write ratio around 500:1?

Performance:
- Is <1 second feed load acceptable?
- Should photo upload be <5 seconds?
- Do we need real-time updates or is eventual consistency okay?

Interviewer: "Yes, focus on core photo/video sharing and feed. 
Instagram scale is good. Real-time is nice-to-have."

You: "Perfect! Let me start with back-of-envelope calculations..."

Step 2: Back-of-the-Envelope Calculations (5 minutes)

Calculate Scale

Users:
- Total users: 2B
- DAU: 500M (25%)
- Average followers: 150

Content:
- Photos per day: 100M
- Videos per day: 50M
- Stories per day: 500M
- Photos per second: 100M / 86400 = ~1,157 TPS
- Peak TPS: 3x = 3,500 TPS

Feed Requests:
- Each user checks feed 10 times/day
- Total requests: 500M × 10 = 5B requests/day
- Requests per second: 5B / 86400 = ~58K RPS
- Peak RPS: 3x = 174K RPS

Storage:
- Photo size: 2MB (original), 200KB (compressed)
- Video size: 20MB (original), 5MB (compressed)
- Daily storage: 100M × 2MB + 50M × 20MB = 1.2PB/day
- With compression: 100M × 200KB + 50M × 5MB = 270TB/day

Bandwidth:
- Ingress: 1.2PB/day = 14GB/s
- Egress: 5B requests × 20 posts × 200KB = 20PB/day = 231GB/s
- With CDN (90% hit rate): Origin serves 23GB/s

Pro Tip: Write these calculations on the whiteboard. Shows you understand scale.

Step 3: High-Level Design (10-15 minutes)

Start Simple

Step 1: Basic Architecture
┌─────────┐     ┌─────────┐     ┌─────────┐
│ Clients │────▶│   API   │────▶│Database │
└─────────┘     │ Gateway │     └─────────┘
                └─────────┘

Step 2: Add Core Services
┌─────────┐     ┌─────────┐     ┌──────────┐     ┌─────────┐
│ Clients │────▶│   API   │────▶│  Upload  │────▶│   S3    │
└─────────┘     │ Gateway │     │ Service  │     └─────────┘
                └─────────┘     └──────────┘
                      │
                      ▼
                ┌──────────┐
                │   Feed   │
                │ Service  │
                └──────────┘

Step 3: Add Processing and CDN
[Full architecture with media processing, CDN, caching, etc.]

Pro Tip: Draw incrementally. Don't overwhelm with complexity upfront.

Key Components to Mention

1. API Gateway: Authentication, rate limiting, routing
2. Upload Service: Handle photo/video uploads
3. Media Processing Service: Resize, compress, transcode
4. Feed Service: Generate personalized feeds
5. Story Service: Handle 24-hour ephemeral content
6. User Service: User profiles and authentication
7. Social Graph Service: Follow relationships
8. Search Service: Full-text search
9. Notification Service: Push notifications
10. S3 + CDN: Media storage and delivery
11. Cache Layer: Redis for performance
12. Message Queue: Kafka for async processing

Step 4: Deep Dive (15-20 minutes)

Critical Design Decisions

Decision 1: Media Storage Strategy

Interviewer: "How do you store and serve billions of photos?"

Strong Answer:
"I'd use a multi-tier storage and delivery strategy:

1. Storage (S3):
   - Store original and compressed versions
   - Original: 2MB (for editing, downloads)
   - Compressed: 200KB (for feed display)
   - Thumbnail: 20KB (for grid view)
   - Lifecycle policies for archival

2. CDN (CloudFront):
   - 90% cache hit rate
   - 200+ edge locations globally
   - Automatic image optimization
   - Reduce origin load to 10%

3. Optimization:
   - Compress images (JPEG quality 85, WebP)
   - Multiple sizes for different contexts
   - Lazy loading for images
   - Progressive JPEG for faster rendering

This provides fast delivery (<500ms) while minimizing storage costs."

Decision 2: Feed Generation Strategy

Interviewer: "How do you generate personalized feeds?"

Strong Answer:
"I'd use a hybrid fan-out approach:

1. Regular Users (<10K followers):
   - Fan-out on write (push model)
   - Pre-compute feeds when post is created
   - Store in Cassandra partitioned by user_id
   - Fast reads (<100ms)

2. Celebrity Users (>1M followers):
   - Fan-out on read (pull model)
   - Fetch posts on-demand
   - Merge with pre-computed feed
   - Cache aggressively (TTL: 5 minutes)

3. Feed Ranking:
   - Fetch candidate posts (1000+)
   - Score using ML model (engagement prediction)
   - Rank by score
   - Apply business rules (diversity, freshness)
   - Return top 20 posts

This balances write and read performance across different user types."

Decision 3: Handling Celebrity Posts

Interviewer: "What if a celebrity with 600M followers posts a photo?"

Strong Answer:
"This is the 'hot user problem'. Solutions:

1. Skip fan-out for celebrity users:
   - Don't write to 600M feeds
   - Fetch celebrity posts on-demand
   - Merge with pre-computed feed

2. Separate infrastructure:
   - Dedicated cache for celebrity posts
   - Higher cache TTL (1 hour vs 5 minutes)
   - CDN caching for popular content

3. Rate limiting:
   - Limit fan-out to 20K writes/second
   - Process in batches of 1000
   - Use priority queue (active users first)

4. Async processing:
   - Fan-out via Kafka
   - Non-blocking for user
   - Eventual consistency acceptable

This prevents celebrity posts from overwhelming the system."

Common Pitfalls to Avoid

Pitfall 1: Ignoring Media Processing

Wrong: "Store photos directly in database"Right: "Upload to S3, process asynchronously, store multiple sizes"

Always consider media processing pipeline.

Pitfall 2: Not Considering Storage Costs

❌ Wrong: "Store all photos at original resolution"
✅ Right: "Compress to 200KB for feed, store original for editing"

Storage costs are significant at scale (2PB+ for 5 years).

Pitfall 3: Forgetting CDN

Wrong: "Serve all media from origin servers"Right: "Use CDN with 90% hit rate to reduce origin load"

CDN is critical for media-heavy applications.

Pitfall 4: Not Handling Celebrity Users

Wrong: "Fan-out to all followers for every post"Right: "Use hybrid approach based on follower count"

Celebrity users require special handling.

Pitfall 5: Ignoring Stories

❌ Wrong: Only discussing permanent posts
✅ Right: "Stories require 24-hour expiration, different storage strategy"

Stories are a major feature of Instagram.

Impressive Points to Mention

Technical Depth

1. Perceptual Hashing:
   "For deduplication, I'd use perceptual hashing (pHash):
   - Hash images based on visual content
   - Detect duplicate uploads
   - Store once, reference multiple times
   - 20% storage savings"

2. Adaptive Bitrate Streaming:
   "For videos, I'd use adaptive bitrate streaming:
   - Transcode to multiple quality levels (1080p, 720p, 480p)
   - HLS/DASH protocol
   - Client selects quality based on bandwidth
   - Better user experience"

3. Image Optimization:
   "For image delivery, I'd use:
   - WebP format for modern browsers (30% smaller)
   - Progressive JPEG for faster rendering
   - Responsive images (srcset)
   - Lazy loading for images below fold"

4. Content ID Matching:
   "For copyright protection, I'd use:
   - Hash-based matching for images
   - Audio fingerprinting for videos
   - Visual similarity detection using ML
   - Automated DMCA takedown process"

System Design Patterns

1. Circuit Breaker:
   "Prevent cascading failures with circuit breakers:
   - Open circuit after N failures
   - Fail fast instead of waiting
   - Periodic health checks to close circuit"

2. Bulkhead Pattern:
   "Isolate resources to prevent total failure:
   - Separate thread pools per service
   - Limit connections per dependency
   - One service failure doesn't affect others"

3. Cache-Aside Pattern:
   "For feed caching:
   - Check cache first
   - If miss, query database
   - Update cache with result
   - Set appropriate TTL (5 minutes)"

Real-World Considerations

1. Cost Optimization:
   "To reduce costs:
   - Aggressive image compression (2MB → 200KB)
   - CDN caching (90% hit rate)
   - Tiered storage (hot/warm/cold)
   - Deduplication (20% savings)
   - Target: <$0.52 per DAU per month"

2. Monitoring:
   "Key metrics to monitor:
   - Upload success rate (>99.9%)
   - Feed load time (p95 <1s)
   - Image load time (p95 <500ms)
   - CDN hit rate (>90%)
   - Error rate (<0.1%)"

3. Disaster Recovery:
   "Multi-region active-active:
   - Async replication between regions
   - RPO: 1 hour
   - RTO: 4 hours
   - Automatic failover with health checks"

Time Management

45-Minute Interview Breakdown

0-5 min: Requirements clarification
5-10 min: Back-of-envelope calculations
10-25 min: High-level design and core components
25-40 min: Deep dive into 2-3 components
40-45 min: Wrap-up, edge cases, questions

Adjust based on interviewer's focus.

What to Prioritize

Must Cover:
- High-level architecture
- Media storage and delivery (S3 + CDN)
- Feed generation strategy
- Database design
- Scaling approach

Nice to Have:
- Stories feature
- Security and privacy
- Monitoring and alerting
- Cost optimization

Skip if Time Limited:
- Detailed API design
- Specific code implementations
- Advanced ML algorithms

Sample Interview Questions and Answers

Q: "How would you implement the photo upload flow?"

Strong Answer:

"I'd use an asynchronous upload flow:

1. Client requests upload URL:
   - POST /api/v1/posts/upload
   - Server generates pre-signed S3 URL
   - Returns URL to client

2. Client uploads directly to S3:
   - Direct upload to S3 (no server bottleneck)
   - S3 triggers SNS notification
   - Fast response to user (<1 second)

3. Async processing:
   - SQS queue picks up notification
   - Media processing workers:
     a. Resize to multiple sizes
     b. Compress (2MB → 200KB)
     c. Generate thumbnail
     d. Extract metadata
   - Store processed images in S3
   - Update metadata in database

4. Completion:
   - Warm CDN cache
   - Publish event to Kafka
   - Fan-out to followers
   - Notify user of completion

This provides fast upload response while handling processing asynchronously."

Q: "How do you handle stories that expire after 24 hours?"

Strong Answer:

"I'd use multiple mechanisms for story expiration:

1. S3 Lifecycle Policy:
   - Set 24-hour TTL on story objects
   - Automatic deletion by S3
   - No manual cleanup needed

2. Redis TTL:
   - Store story metadata in Redis
   - Set 24-hour TTL
   - Automatic expiration

3. Database Cleanup:
   - Mark stories as expired
   - Background job runs hourly
   - Delete expired stories
   - Update user stats

4. CDN Cache Invalidation:
   - Invalidate CDN cache after 24 hours
   - Prevent serving expired stories
   - Use cache headers (max-age: 86400)

This ensures stories disappear after 24 hours with eventual consistency."

Final Tips

Do's ✅

  • Ask clarifying questions upfront
  • Start with simple design, add complexity gradually
  • Explain reasoning for each decision
  • Discuss tradeoffs explicitly
  • Consider media storage and delivery
  • Mention CDN and caching strategies
  • Think about celebrity users
  • Be prepared to dive deep into any component

Don'ts ❌

  • Don't jump straight to implementation
  • Don't ignore storage costs
  • Don't forget about CDN
  • Don't overlook media processing
  • Don't ignore celebrity user problem
  • Don't forget about stories
  • Don't assume unlimited resources
  • Don't forget to manage time

If You Get Stuck

1. Ask for hints: "Could you give me a hint about X?"
2. Think out loud: "I'm considering two approaches..."
3. Discuss tradeoffs: "Approach A has X benefit but Y drawback..."
4. Relate to experience: "In my previous project, we used..."
5. Be honest: "I'm not familiar with X, but I would approach it by..."

Remember: The goal is to demonstrate your system design thinking process, not to create a perfect solution. Show how you approach complex problems, consider tradeoffs, and make informed decisions.