Design Twitter - System Architecture

High-Level Architecture Overview

Architecture Principles

Microservices Architecture: 100+ independent services
Event-Driven Design: Asynchronous communication via message queues
Horizontal Scalability: Scale services independently based on load
Multi-Region Deployment: Active-active across 10+ regions globally
Fault Tolerance: Circuit breakers, retries, graceful degradation
Read-Optimized: Heavy caching and denormalization for read performance

System Architecture Diagram

Twitter high-level architecture — Write path (fan-out on write) vs Read path (pre-computed timelines)

Core Services Architecture

1. Tweet Service

Responsibilities:

Create, read, update, delete tweets
Validate tweet content and media
Generate unique tweet IDs (Snowflake algorithm)
Store tweets in database
Publish tweet events to message queue

Technology Stack:

Language: Java/Go for high performance
Framework: Spring Boot / gRPC
Database: Cassandra (partitioned by tweet_id)
Cache: Redis for recent tweets
Message Queue: Kafka for event publishing

API Endpoints:

POST   /api/v1/tweets              - Create tweet
GET    /api/v1/tweets/:id          - Get tweet by ID
DELETE /api/v1/tweets/:id          - Delete tweet
PUT    /api/v1/tweets/:id          - Edit tweet
POST   /api/v1/tweets/:id/like     - Like tweet
POST   /api/v1/tweets/:id/retweet  - Retweet

2. Timeline Service

Responsibilities:

Generate personalized home timelines
Fetch user timelines
Merge and rank tweets from multiple sources
Handle pagination and cursor-based navigation
Cache timeline results

Timeline Generation Strategies:

Fan-out on Write (Push Model):

When user posts tweet:
1. Get list of all followers
2. Insert tweet into each follower's timeline
3. Pre-compute timelines for fast reads

Pros: Fast reads, simple implementation
Cons: Slow writes for celebrity users, storage overhead
Use for: Regular users (<10K followers)

Fan-out on Read (Pull Model):

When user requests timeline:
1. Get list of users they follow
2. Fetch recent tweets from each user
3. Merge and sort tweets by timestamp
4. Cache result

Pros: Fast writes, no storage overhead
Cons: Slow reads, complex merge logic
Use for: Celebrity users (>1M followers)

Hybrid Approach:

Regular users: Fan-out on write
Celebrity users: Fan-out on read
Medium users (10K-1M): Mixed approach

Timeline = Pre-computed timeline + Celebrity tweets (fetched on-demand)

3. Fan-out Service

Responsibilities:

Distribute tweets to followers' timelines
Handle fan-out for regular users
Rate limit fan-out for celebrity users
Batch timeline writes for efficiency
Handle fan-out failures and retries

Fan-out Architecture:

Fan-out service architecture — distributing tweets to follower timelines in batches

Fan-out Optimization:

Batch writes: 1000 timeline inserts per batch
Parallel processing: 100 workers processing fan-out
Rate limiting: Max 10K fan-outs per second per user
Celebrity handling: Skip fan-out for users with >1M followers
Async processing: Non-blocking fan-out via message queue

4. User Service

Responsibilities:

User registration and authentication
Profile management (name, bio, avatar)
User settings and preferences
Account verification and suspension
User search and discovery

Database Schema:

users (
  user_id BIGINT PRIMARY KEY,
  username VARCHAR(15) UNIQUE,
  email VARCHAR(255) UNIQUE,
  password_hash VARCHAR(255),
  display_name VARCHAR(50),
  bio TEXT,
  profile_image_url VARCHAR(255),
  verified BOOLEAN,
  created_at TIMESTAMP,
  updated_at TIMESTAMP
)

user_stats (
  user_id BIGINT PRIMARY KEY,
  followers_count INT,
  following_count INT,
  tweets_count INT,
  likes_count INT,
  updated_at TIMESTAMP
)

5. Social Graph Service

Responsibilities:

Manage follow/unfollow relationships
Get followers and following lists
Check if user A follows user B
Suggest users to follow
Handle blocking and muting

Graph Storage:

Following Relationship:
Key: user_id
Value: Set of user_ids they follow

Followers Relationship:
Key: user_id
Value: Set of user_ids who follow them

Technology: Redis (for hot data) + Cassandra (for persistence)

API Design:

POST   /api/v1/users/:id/follow      - Follow user
DELETE /api/v1/users/:id/follow      - Unfollow user
GET    /api/v1/users/:id/followers   - Get followers
GET    /api/v1/users/:id/following   - Get following
GET    /api/v1/users/:id/follows/:target_id - Check relationship

6. Media Service

Responsibilities:

Upload and process images/videos
Generate multiple image sizes (thumbnail, medium, large)
Transcode videos to multiple formats
Store media in object storage (S3)
Serve media via CDN
Handle media deletion

Media Processing Pipeline:

Media processing pipeline — upload, transform, and store media assets with parallel image/video processing

7. Search Service

Responsibilities:

Index tweets for full-text search
Search tweets by keywords, hashtags, users
Autocomplete for search queries
Trending topics computation
Real-time index updates

Search Architecture:

Search indexing pipeline — real-time tweet indexing via Kafka for full-text search

Index Structure:

{
  "tweet_id": "123456789",
  "user_id": "987654321",
  "username": "johndoe",
  "text": "Hello world #greeting",
  "hashtags": ["greeting"],
  "mentions": ["@janedoe"],
  "created_at": "2026-01-08T10:00:00Z",
  "likes_count": 42,
  "retweets_count": 10,
  "language": "en"
}

8. Notification Service

Responsibilities:

Send push notifications to mobile devices
Send email notifications
In-app notification badges
Notification preferences management
Batch notifications to reduce spam

Notification Types:

New follower
Tweet liked
Tweet retweeted
Mentioned in tweet
Reply to tweet
Direct message (out of scope)

Notification Pipeline:

Event → Kafka → Notification Worker → Filter by Preferences → 
Push Service (APNs/FCM) → Device

9. Trending Service

Responsibilities:

Compute trending hashtags in real-time
Detect trending topics and events
Personalized trending based on location
Trending tweets and moments
Spam and bot detection for trends

Trending Algorithm:

Trending Score = (Tweet Count × Recency Factor) / Time Window

Components:
- Tweet Count: Number of tweets with hashtag
- Recency Factor: Exponential decay (recent tweets weighted higher)
- Time Window: Sliding window of 1 hour
- Velocity: Rate of increase in tweet count

Implementation:
- Count-Min Sketch for approximate counting
- Sliding window with 5-minute buckets
- Top-K algorithm for ranking (Heap)
- Update frequency: Every 5 minutes

Data Flow Diagrams

Tweet Posting Flow

1. User posts tweet via mobile app
2. API Gateway authenticates request
3. Tweet Service validates content
4. Generate unique tweet_id (Snowflake)
5. Store tweet in Cassandra
6. Publish tweet event to Kafka
7. Fan-out Service consumes event
8. Get followers from Social Graph Service
9. Write tweet to followers' timelines (batch)
10. Update user stats (tweet count)
11. Index tweet in Elasticsearch
12. Send notifications to mentioned users
13. Return success response to client

Timeline Fetch Flow

1. User requests home timeline
2. API Gateway authenticates request
3. Timeline Service checks cache (Redis)
4. If cache hit: Return cached timeline
5. If cache miss:
   a. Get list of users they follow
   b. Fetch recent tweets from Timeline DB
   c. Merge celebrity tweets (pull model)
   d. Rank and sort tweets
   e. Cache result in Redis (TTL: 5 minutes)
6. Return timeline to client

Technology Stack

Programming Languages

Backend Services: Java (Spring Boot), Go (high-performance services)
Real-time Processing: Scala (Spark Streaming)
Scripts: Python (data analysis, ML)

Databases

Tweet Storage: Cassandra (wide-column, high write throughput)
User Data: PostgreSQL (relational, ACID compliance)
Social Graph: Redis (in-memory, fast lookups) + Cassandra (persistence)
Timeline Storage: Cassandra (time-series data)

Caching

Application Cache: Redis Cluster (10TB capacity)
CDN: CloudFront / Cloudflare (media delivery)
Browser Cache: HTTP caching headers

Message Queue

Event Streaming: Apache Kafka (high throughput, durability)
Task Queue: RabbitMQ (job processing)

Search

Full-text Search: Elasticsearch (distributed search)
Autocomplete: Redis (sorted sets)

Storage

Object Storage: Amazon S3 (media files)
Block Storage: EBS (database volumes)

Monitoring and Observability

Metrics: Prometheus + Grafana
Logging: ELK Stack (Elasticsearch, Logstash, Kibana)
Tracing: Jaeger (distributed tracing)
Alerting: PagerDuty

Infrastructure

Container Orchestration: Kubernetes
Service Mesh: Istio
Load Balancing: NGINX, HAProxy
DNS: Route 53 (GeoDNS)

This architecture provides a scalable, reliable, and performant foundation for building a Twitter-like platform capable of handling hundreds of millions of users with real-time tweet delivery and high availability.