Design Instagram - Scale and Constraints
User Scale Analysis
User Base Metrics
- Total Registered Users: 2 billion
- Daily Active Users (DAU): 500 million (25% of registered)
- Monthly Active Users (MAU): 1.5 billion (75% of registered)
- Peak Concurrent Users: 100 million during major events
- New User Signups: 1 million per day
- User Growth Rate: 20% year-over-year
User Distribution
- Geographic Distribution:
- North America: 20% (100M users)
- Europe: 15% (75M users)
- Asia: 45% (225M users)
- Latin America: 12% (60M users)
- Rest of World: 8% (40M users)
- Platform Distribution:
- Mobile (iOS/Android): 95%
- Web: 5%
- User Activity Levels:
- Power users (>5 posts/week): 10% (50M users)
- Active users (1-5 posts/week): 30% (150M users)
- Casual users (<1 post/week): 60% (300M users)
Influencer and Celebrity Impact
- Celebrity Users: 50,000 users with >1M followers
- Influencers: 500,000 users with >100K followers
- Average User: 150 followers
- Median User: 80 followers
- Top User: 600M followers (Instagram's own account)
Content Volume Analysis
Photo Generation
- Total Photos per Day: 100 million
- Photos per Second:
- Average: 1,157 photos/second
- Peak: 3,500 photos/second (during events)
- Off-peak: 500 photos/second
- Photo Distribution:
- Single photo posts: 60% (60M/day)
- Carousel posts (2-10 photos): 40% (40M/day)
- Average photos per carousel: 3
Video Generation
- Total Videos per Day: 50 million
- Videos per Second:
- Average: 579 videos/second
- Peak: 1,700 videos/second
- Off-peak: 250 videos/second
- Video Duration:
- Average: 30 seconds
- Feed videos: 15-60 seconds
- IGTV: 10-60 minutes
Stories Generation
- Total Stories per Day: 500 million
- Stories per Second:
- Average: 5,787 stories/second
- Peak: 17,000 stories/second
- Off-peak: 2,500 stories/second
- Story Retention: 24 hours only
- Story Highlights: 10% saved permanently
Content Size Estimates
- Original Photo: 2MB average (JPEG, 4000×3000)
- Compressed Photo: 200KB (optimized for feed)
- Thumbnail: 20KB (profile grid)
- Original Video: 20MB average (1080p, 30s)
- Compressed Video: 5MB (720p, optimized)
- Story Photo: 1MB average
- Story Video: 10MB average (15s)
Daily Storage Requirements
Photos:
Original: 100M × 2MB = 200TB/day
Compressed: 100M × 200KB = 20TB/day
Thumbnails: 100M × 20KB = 2TB/day
Total Photos: 222TB/day
Videos:
Original: 50M × 20MB = 1PB/day
Compressed: 50M × 5MB = 250TB/day
Thumbnails: 50M × 100KB = 5TB/day
Total Videos: 1.255PB/day
Stories (24h retention):
Photos: 300M × 1MB = 300TB/day
Videos: 200M × 10MB = 2PB/day
Total Stories: 2.3PB/day
Total Daily Storage: ~3.8PB/day
Monthly Storage: ~114PB/month
Yearly Storage: ~1.4EB/yearTraffic Patterns and Load
Read Traffic
- Feed Requests: 50 billion per day
- Average: 578,703 requests/second
- Peak: 1.7 million requests/second
- Photo Views: 100 billion per day
- Video Views: 20 billion per day
- Story Views: 10 billion per day
- Profile Views: 5 billion per day
- Search Queries: 2 billion per day
Write Traffic
- Photo Uploads: 100 million per day (1,157 TPS)
- Video Uploads: 50 million per day (579 TPS)
- Story Uploads: 500 million per day (5,787 TPS)
- Likes: 4 billion per day (46,296 TPS)
- Comments: 1 billion per day (11,574 TPS)
- Follows: 100 million per day (1,157 TPS)
Read:Write Ratio
- Overall Ratio: 500:1 (extremely read-heavy)
- Feed Reads: 50B requests vs 100M photo uploads = 500:1
- Engagement Reads: High read amplification for popular posts
- Cache Hit Rate: Target >90% for media content
Peak Load Scenarios
- Major Events: World Cup, Olympics, Celebrity posts
- Traffic Spike: 3-5x normal load
- Duration: 2-6 hours sustained peak
- Geographic Concentration: 70% traffic from event region
- Example: Celebrity wedding generates 50M photo views in 1 hour
Performance Targets
Latency Requirements
- Photo Upload: <5 seconds p95 (including processing)
- Video Upload: <30 seconds p95 (including transcoding)
- Feed Load: <1 second p95 (20 posts)
- Image Load: <500ms p95 (compressed)
- Video Playback Start: <2 seconds p95
- Story Load: <800ms p95
- Search Results: <300ms p95
- Profile Load: <600ms p95
Throughput Requirements
- API Gateway: 2 million requests/second peak
- Photo Ingestion: 3,500 photos/second peak
- Video Ingestion: 1,700 videos/second peak
- Feed Generation: 1.7M feeds/second peak
- Database Reads: 5 million queries/second
- Database Writes: 100,000 writes/second
- Cache Operations: 10 million ops/second
Availability Targets
- System Uptime: 99.95% (4.38 hours downtime/year)
- API Availability: 99.99% for critical endpoints
- Data Durability: 99.999999999% (11 9's)
- Regional Failover: <10 minutes to failover
- Disaster Recovery: RTO <4 hours, RPO <1 hour
Data Storage Constraints
Database Storage
User Data:
2B users × 5KB per user = 10TB
Post Metadata:
100M posts/day × 365 days × 5 years × 2KB = 365TB
Social Graph:
2B users × 150 followers × 16 bytes = 4.8TB
Engagement Data (likes, comments):
4B likes/day × 365 days × 1 year × 24 bytes = 35TB/year
Total Database Storage: ~400TBMedia Storage
Photos (5 years):
Original: 100M/day × 365 × 5 × 2MB = 365PB
Compressed: 100M/day × 365 × 5 × 200KB = 36.5PB
Thumbnails: 100M/day × 365 × 5 × 20KB = 3.65PB
Videos (5 years):
Original: 50M/day × 365 × 5 × 20MB = 1,825PB
Compressed: 50M/day × 365 × 5 × 5MB = 456PB
Stories (30 days rolling):
500M/day × 30 × 2MB = 30PB
Total Media Storage: ~2,700PB
With deduplication (20% savings): ~2,160PBCache Storage
- Redis Cluster: 50TB for hot data
- Feed cache: 30TB
- User profiles: 10TB
- Post metadata: 5TB
- Session data: 5TB
Search Index Storage
- Elasticsearch Cluster: 100TB
- User index: 20TB
- Post index: 60TB
- Hashtag index: 10TB
- Location index: 10TB
Network Bandwidth
Ingress Bandwidth
Photo Upload:
100M photos/day × 2MB = 200TB/day = 2.3GB/s average
Video Upload:
50M videos/day × 20MB = 1PB/day = 11.6GB/s average
Story Upload:
500M stories/day × 2MB = 1PB/day = 11.6GB/s average
Total Ingress: ~25GB/s average, ~75GB/s peakEgress Bandwidth
Photo Views:
100B views/day × 200KB = 20PB/day = 231GB/s average
Video Views:
20B views/day × 5MB = 100PB/day = 1,157GB/s average
Story Views:
10B views/day × 2MB = 20PB/day = 231GB/s average
Total Egress: ~1,600GB/s average, ~4,800GB/s peakCDN Offload
- CDN Hit Rate: 90% for media content
- Origin Traffic: 10% of total = 160GB/s average
- CDN Bandwidth: 1,440GB/s average, 4,320GB/s peak
- CDN PoPs: 200+ locations globally
Compute Resources
Application Servers
- Total Servers: 100,000 application servers
- Server Specs: 32 vCPU, 128GB RAM per server
- Requests per Server: 20,000 requests/second
- Auto-scaling: Scale up to 300,000 servers during peaks
Database Servers
- Primary Databases: 2,000 shards
- Read Replicas: 10,000 replicas (5 per shard)
- Server Specs: 64 vCPU, 512GB RAM, 20TB NVMe SSD
- Connections per Server: 2,000 connections
Cache Servers
- Redis Nodes: 1,000 nodes
- Node Specs: 32 vCPU, 256GB RAM
- Cache Capacity: 50TB total
- Operations per Node: 100,000 ops/second
Media Processing Servers
- Image Processing: 5,000 workers
- Video Transcoding: 10,000 workers
- Worker Specs: 16 vCPU, 64GB RAM, GPU for video
- Processing Queue: Kafka with 1PB/day throughput
Message Queue Servers
- Kafka Brokers: 500 brokers
- Broker Specs: 32 vCPU, 128GB RAM, 40TB SSD
- Throughput per Broker: 200MB/s
- Total Throughput: 100GB/s
Cost Estimates
Infrastructure Costs (Monthly)
Compute (Application Servers):
100,000 servers × $200/month = $20M/month
Database Servers:
12,000 servers × $500/month = $6M/month
Cache Servers:
1,000 servers × $300/month = $300K/month
Media Processing:
15,000 workers × $150/month = $2.25M/month
Storage (Database):
400TB × $100/TB/month = $40K/month
Storage (Media - S3):
2,160PB × $20/TB/month = $43.2M/month
CDN Bandwidth:
1,440GB/s × 2.6PB/month × $50/TB = $187M/month
Total Monthly Cost: ~$259M/month
Cost per DAU: $259M / 500M = $0.52 per DAU/monthScaling Bottlenecks
Write Bottlenecks
- Photo Upload: 3,500 TPS peak requires distributed processing
- Video Transcoding: CPU-intensive, requires GPU acceleration
- Database Writes: 100,000 writes/second requires sharding
- Solution: Horizontal scaling, async processing, GPU clusters
Read Bottlenecks
- Feed Generation: 1.7M requests/second peak
- Media Delivery: 1,600GB/s egress bandwidth
- Database Reads: 5M queries/second requires replicas
- Solution: Aggressive caching, CDN, read replicas
Storage Bottlenecks
- Media Storage Growth: 3.8PB/day requires object storage
- Database Growth: 400TB requires sharding and archival
- Index Size: Search index grows with content
- Solution: Tiered storage, compression, deduplication
Network Bottlenecks
- Egress Bandwidth: 1,600GB/s requires CDN
- Cross-Region Traffic: Global users need regional data centers
- Media Delivery: 90% CDN hit rate critical
- Solution: Multi-region deployment, CDN, edge caching
Constraints and Limitations
Technical Constraints
- Photo Size: Max 8MB per photo
- Video Duration: Max 60 seconds for feed, 60 minutes for IGTV
- Carousel Limit: Max 10 photos/videos per post
- Caption Length: Max 2,200 characters
- Hashtag Limit: Max 30 hashtags per post
- Story Duration: 24 hours before expiration
- API Rate Limits: 200 requests per hour per user
Business Constraints
- Content Moderation: <0.5% of content requires manual review
- Compliance: GDPR, CCPA, COPPA data retention and deletion
- Uptime SLA: 99.95% availability guarantee
- Data Residency: Regional data storage requirements
- Cost Optimization: <$0.52 per DAU per month target
Operational Constraints
- Deployment Frequency: Multiple deployments per day
- Monitoring: <5 minute detection for critical issues
- Incident Response: <15 minute response time
- Backup Frequency: Continuous replication, hourly snapshots
- Disaster Recovery: <4 hour RTO, <1 hour RPO
This comprehensive scale analysis provides the foundation for designing a system that can handle Instagram's massive scale while maintaining performance, reliability, and cost-effectiveness.