Facebook Messenger - Interview Tips
System Design Interview Approach
Step 1: Requirements Clarification (5-10 minutes)
Key Questions to Ask
- Scale: How many users? Daily active users? Messages per day?
- Features: One-on-one vs group chat? Media sharing? Voice/video calls?
- Platforms: Web, mobile, desktop? Cross-platform sync required?
- Real-time: What's acceptable latency? <100ms, <1s?
- Reliability: Uptime requirements? Message delivery guarantees?
- Geography: Global system? Regional data requirements?
Sample Clarification Dialog
Interviewer: "Design a messaging system like Facebook Messenger"
You: "Great! Let me clarify the requirements:
- Are we building for global scale like Facebook's 1.3B users?
- Do we need both individual and group messaging?
- Should we support media sharing (images, videos, files)?
- What's the acceptable message delivery latency?
- Do we need features like read receipts and online presence?
- Any specific compliance requirements like GDPR?"Step 2: High-Level Architecture (10-15 minutes)
Start with Simple Architecture
[Client Apps] → [Load Balancer] → [API Gateway] → [Message Service] → [Database]
↓
[WebSocket Service]Gradually Add Components
- Authentication Service for user management
- Notification Service for offline users
- Media Service for file handling
- Presence Service for online status
- Cache Layer for performance
Architecture Presentation Tips
- Draw boxes and arrows clearly
- Label each component with its purpose
- Explain data flow with numbered steps
- Start simple, then add complexity
- Ask "Does this make sense?" frequently
Step 3: Deep Dive into Core Components (15-20 minutes)
WebSocket vs HTTP Trade-offs Discussion
Interviewer: "Why choose WebSocket over HTTP for real-time messaging?"
Strong Answer:
"WebSocket provides several advantages for messaging:
1. Full-duplex communication - both client and server can initiate
2. Lower latency - no HTTP request/response overhead
3. Persistent connections - avoid connection establishment cost
4. Real-time updates - instant message delivery and typing indicators
However, WebSocket has challenges:
1. Connection management complexity at scale
2. Load balancing requires sticky sessions
3. Some firewalls/proxies block WebSocket
4. Higher server resource usage per connection
For fallback, I'd implement Server-Sent Events or long polling for environments that block WebSocket."Message Ordering and Consistency
Interviewer: "How do you ensure message ordering in group chats?"
Strong Answer:
"Message ordering is critical for user experience. I'd use:
1. **Sequence Numbers**: Assign incrementing sequence numbers per conversation
2. **Single Writer**: Route all messages for a conversation through one server
3. **Message Queue**: Use Kafka with conversation_id as partition key
4. **Client-side Ordering**: Clients sort messages by sequence number
5. **Gap Detection**: Clients detect missing sequence numbers and request them
For global ordering across all conversations, I'd use:
- Vector clocks for causal ordering
- Lamport timestamps for total ordering
- But this adds complexity, so I'd only implement if required"Step 4: Scaling Considerations (10-15 minutes)
Database Scaling Strategy
Interviewer: "How would you scale the database for billions of messages?"
Strong Answer:
"I'd use a multi-database approach:
1. **Message Storage (Cassandra)**:
- Partition by conversation_id for locality
- Time-series data model with TIMEUUID
- 3x replication across data centers
- Handles high write volume efficiently
2. **User Data (PostgreSQL)**:
- Master-replica setup for read scaling
- Shard by user_id if needed
- ACID compliance for critical user data
3. **Caching (Redis)**:
- Recent messages cached for fast access
- User sessions and presence data
- Distributed cache with consistent hashing
4. **Search (Elasticsearch)**:
- Full-text search across message content
- Separate index for better performance"Connection Scaling
Interviewer: "How do you handle 100 million concurrent WebSocket connections?"
Strong Answer:
"Connection scaling requires several strategies:
1. **Horizontal Scaling**:
- 2,000-5,000 connections per server
- 20,000+ WebSocket servers globally
- Auto-scaling based on connection count
2. **Load Balancing**:
- Consistent hashing for sticky sessions
- Health checks and failover mechanisms
- Geographic routing to nearest data center
3. **Connection Pooling**:
- Reuse connections across conversations
- Connection multiplexing where possible
- Graceful connection migration during updates
4. **Resource Optimization**:
- 8KB memory per connection
- Efficient serialization (Protocol Buffers)
- Connection compression and heartbeat optimization"Common Interview Pitfalls and How to Avoid Them
Pitfall 1: Jumping to Implementation Details
Wrong Approach: Starting with specific technologies or code Right Approach: Begin with requirements and high-level design
Pitfall 2: Ignoring Scale Requirements
Wrong Approach: Designing for small scale then trying to retrofit Right Approach: Design for target scale from the beginning
Pitfall 3: Over-engineering Early
Wrong Approach: Adding every possible feature and optimization Right Approach: Start simple, then add complexity when needed
Pitfall 4: Not Considering Trade-offs
Wrong Approach: Presenting only one solution Right Approach: Discuss alternatives and explain trade-offs
Advanced Topics to Discuss
Real-time System Design Patterns
Event-Driven Architecture
"For real-time messaging, I'd use event-driven architecture:
1. **Message Events**: message_sent, message_delivered, message_read
2. **Presence Events**: user_online, user_offline, user_typing
3. **System Events**: connection_established, connection_lost
Benefits:
- Loose coupling between services
- Easy to add new features
- Natural fit for real-time systems
- Scalable event processing"CQRS (Command Query Responsibility Segregation)
"For high-scale messaging, I'd separate read and write operations:
Write Side (Commands):
- Send message
- Create conversation
- Update user profile
Read Side (Queries):
- Get message history
- Search messages
- Get conversation list
This allows:
- Independent scaling of reads vs writes
- Optimized data models for each operation
- Better performance and availability"Performance Optimization Strategies
Caching Strategies
"I'd implement multi-layer caching:
L1 (Application Cache):
- In-memory cache for frequently accessed data
- 1-minute TTL to balance freshness and performance
L2 (Distributed Cache - Redis):
- Recent messages per conversation
- User presence and session data
- 1-hour TTL with cache warming
L3 (Database Query Cache):
- Query result caching
- Invalidated on data changes"Database Optimization
"Database optimization strategies:
1. **Indexing**:
- Composite indexes on (conversation_id, timestamp)
- Partial indexes for active conversations only
2. **Partitioning**:
- Time-based partitioning for message tables
- Hash partitioning for user tables
3. **Connection Pooling**:
- PgBouncer for PostgreSQL connections
- Connection pool sizing based on load
4. **Read Replicas**:
- Route read queries to replicas
- Async replication for better write performance"Sample Interview Questions and Answers
Q: "How would you implement typing indicators?"
Strong Answer:
"Typing indicators require real-time updates with specific characteristics:
1. **Client Side**:
- Detect typing events (keypress, focus)
- Debounce to avoid excessive updates (500ms)
- Send typing_start when user begins typing
- Send typing_stop after 3 seconds of inactivity
2. **Server Side**:
- Broadcast typing events to conversation participants
- Use Redis with 5-second TTL for typing state
- Don't persist typing events (ephemeral data)
3. **Optimization**:
- Rate limit typing events (max 1 per second)
- Only show typing for active conversation
- Batch multiple typing updates
This provides responsive UX while minimizing server load."Q: "How do you handle message delivery when users are offline?"
Strong Answer:
"Offline message delivery requires multiple strategies:
1. **Message Queuing**:
- Store messages in persistent queue (Kafka/RabbitMQ)
- Partition by user_id for ordered delivery
- Retry logic with exponential backoff
2. **Push Notifications**:
- APNs for iOS, FCM for Android
- Rich notifications with message preview
- Badge count updates
3. **Synchronization**:
- When user comes online, sync missed messages
- Use sequence numbers to detect gaps
- Incremental sync to minimize data transfer
4. **Storage**:
- Offline messages stored in database
- TTL for cleanup (30 days)
- Compression for large message backlogs
This ensures reliable message delivery regardless of connectivity."Q: "How would you implement end-to-end encryption?"
Strong Answer:
"E2E encryption requires careful key management and protocol design:
1. **Protocol Choice**:
- Signal Protocol for proven security
- Double Ratchet for forward secrecy
- X3DH for initial key exchange
2. **Key Management**:
- Identity keys stored locally
- Pre-keys uploaded to server
- Session keys derived per conversation
3. **Message Flow**:
- Encrypt message with session key
- Include ephemeral key for ratcheting
- Server routes encrypted envelope
- Recipient decrypts with session key
4. **Challenges**:
- Key verification (QR codes, safety numbers)
- Multi-device synchronization
- Backup and recovery
- Performance impact
This provides strong security while maintaining usability."Final Interview Tips
Do's
- ✅ Ask clarifying questions upfront
- ✅ Start with simple design, then add complexity
- ✅ Explain your reasoning for each decision
- ✅ Discuss trade-offs and alternatives
- ✅ Consider both functional and non-functional requirements
- ✅ Think about failure scenarios and edge cases
- ✅ Be prepared to dive deep into any component
Don'ts
- ❌ Jump straight into implementation details
- ❌ Ignore scale requirements
- ❌ Design everything as microservices from start
- ❌ Forget about data consistency and reliability
- ❌ Overlook security and privacy concerns
- ❌ Assume unlimited resources
- ❌ Get stuck on one approach without considering alternatives
Time Management
- 5-10 min: Requirements clarification
- 10-15 min: High-level architecture
- 15-20 min: Deep dive into core components
- 10-15 min: Scaling and optimization
- 5-10 min: Wrap-up and additional questions
Remember: The goal is to demonstrate your system design thinking process, not to create a perfect solution. Show how you approach complex problems, consider trade-offs, and make informed decisions.