Back-of-Envelope Estimation: Complete Guide for System Design
Overview
Every system design interview expects you to ground your design in real numbers. Back-of-envelope estimation demonstrates that you can reason quantitatively about scale, identify bottlenecks before they happen, and make informed capacity decisions.
The goal isn't precision โ it's demonstrating structured thinking and arriving at the right order of magnitude.
1. Powers of 2 Reference Table
Memorize these. They come up constantly.
Power
Exact Value
Approximate
Common Name
2^10
1,024
~1 Thousand
1 KB
2^20
1,048,576
~1 Million
1 MB
2^30
1,073,741,824
~1 Billion
1 GB
2^40
1,099,511,627,776
~1 Trillion
1 TB
2^50
~1.13 ร 10^15
~1 Quadrillion
1 PB
Quick Conversions
Copy 1 KB = 1 ,000 bytes (for estimation, use 10 ^3 )
1 MB = 1 ,000 KB = 10 ^6 bytes
1 GB = 1 ,000 MB = 10 ^9 bytes
1 TB = 1 ,000 GB = 10 ^12 bytes
1 PB = 1 ,000 TB = 10 ^15 bytes
Time :
1 day = 86 ,400 seconds โ 10 ^5 seconds (use 100 K)
1 month โ 2 .5 ร 10 ^6 seconds (use 2 .5 M)
1 year โ 3 ร 10 ^7 seconds (use 30 M)Useful Multipliers
Copy Seconds in a day: 86 ,400 โ 100 ,000 (10 ^5 )
Seconds in a month: 2 ,592 ,000 โ 2 .5 ร 10 ^6
Seconds in a year: 31 ,536 ,000 โ 3 ร 10 ^7
For QPS calculations, use:
Daily requests / 100 ,000 = average QPS
Peak QPS โ 2 -5 ร average QPS
2. Latency Numbers Every Programmer Should Know
The Complete Table (2024 Numbers)
Operation
Latency
Notes
L1 cache reference
1 ns
Branch mispredict
3 ns
L2 cache reference
4 ns
Mutex lock/unlock
17 ns
L3 cache reference
12 ns
Main memory reference
100 ns
Compress 1KB (Snappy)
3,000 ns (3 ฮผs)
Read 1MB sequentially (memory)
3,000 ns (3 ฮผs)
SSD random read
16,000 ns (16 ฮผs)
Read 1MB sequentially (SSD)
49,000 ns (49 ฮผs)
Round trip same datacenter
500,000 ns (500 ฮผs)
0.5 ms
Read 1MB sequentially (HDD)
825,000 ns (825 ฮผs)
Disk seek (HDD)
2,000,000 ns (2 ms)
Redis GET (same DC)
500,000-1,000,000 ns
0.5-1 ms
Database query (indexed)
1-5 ms
Send packet CAโNetherlandsโCA
150,000,000 ns
150 ms
TLS handshake
2-10 ms
Depends on version
TCP handshake (same region)
0.5-1 ms
TCP handshake (cross-region)
50-150 ms
Visual Scale
Copy 1 ns โโโโ L1 cache
4 ns โโโโโโโโ L2 cache
100 ns โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ RAM
โ 1000 x gap โ
16 ฮผs โโโโ SSD random read
49 ฮผs โโโโโโโโโโโโ SSD sequential 1M B
500 ฮผs โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Network (same DC)
โ 1000 x gap โ
2 ms โโโโ HDD seek
5 ms โโโโโโโโโโโโ DB query
150 ms โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Cross-continent Key Takeaways for Design
Memory is 100x faster than SSD โ Cache aggressively
SSD is 100x faster than network โ Minimize network hops
Same-DC network is 300x faster than cross-continent โ Co-locate services
Sequential reads are 50x faster than random โ Design for sequential access
Compression is cheap โ Compress before sending over network
3. How to Estimate QPS from DAU
Framework
Copy Given : DAU (Daily Active Users)
Step 1 : Estimate actions per user per day
Step 2 : Calculate daily requests
Step 3 : Convert to QPS (รท 86 ,400 โ รท 100 ,000 )
Step 4 : Estimate peak (2 -5 ร average)
Formula :
Average QPS = (DAU ร actions_per_user) / 86 ,400
Peak QPS = Average QPS ร peak_factor (typically 2 -5 ร)Common Ratios
Platform Type
Actions/User/Day
Read:Write Ratio
Social media (read-heavy)
20-50 reads, 1-2 writes
100:1
Messaging
20-40 messages sent
1:1 (send/receive)
E-commerce
5-10 page views, 0.1 purchases
100:1
Search engine
5-10 searches
Read-only
Video streaming
2-5 videos watched
1000:1 (views:uploads)
File storage
3-5 reads, 1-2 writes
5:1
Worked Example: Twitter-like Service
Copy Given: 300M DAU
Tweets:
- Average user posts 0.5 tweets/day
- Write QPS = 300M ร 0.5 / 100 K = 1 ,500 QPS
- Peak write QPS = 1 ,500 ร 3 = 4 ,500 QPS
Timeline reads:
- Average user reads timeline 10 times/day
- Read QPS = 300M ร 10 / 100 K = 30 ,000 QPS
- Peak read QPS = 30 ,000 ร 3 = 90 ,000 QPS
Fanout:
- Average 200 followers per user
- Fanout writes = 1 ,500 ร 200 = 300 ,000 writes/sec to timelines
4. Storage Estimation Framework
Formula
Copy Total Storage = per_item_size ร items_per_user ร total_users ร retention_period
Growth rate = new_items_per_day ร per_item_sizeCommon Data Sizes
Data Type
Typical Size
User ID (UUID)
16 bytes
Timestamp
8 bytes
Short text (tweet)
280 bytes (UTF-8)
Email/message
5-50 KB
User profile (JSON)
1-5 KB
Thumbnail image
10-50 KB
Standard photo
200 KB - 2 MB
High-res photo
2-10 MB
1 min video (720p)
10-20 MB
1 min video (1080p)
30-60 MB
1 min video (4K)
100-300 MB
1 hour video (1080p)
2-4 GB
Database row (typical)
200 bytes - 2 KB
Log entry
200-500 bytes
Worked Example: Instagram-like Service
Copy Given: 500 M DAU, 2 B total users
Photos:
- 10 % of DAU upload daily = 50 M photos/day
- Average photo size: 2 MB (original) + 200 KB (thumbnails) โ 2.2 MB
- Daily storage: 50 M ร 2.2 MB = 110 TB/day
- Annual storage: 110 TB ร 365 = ~40 PB/year
Metadata:
- Per photo: user_id(8 B) + timestamp(8 B) + location(16 B) +
caption(500 B) + tags(200 B) โ 1 KB
- Daily metadata: 50 M ร 1 KB = 50 GB/day
- Annual metadata: 50 GB ร 365 = ~18 TB/year
Total after 5 years:
- Photos: ~200 PB
- Metadata: ~90 TB
5. Bandwidth Estimation
Formula
Copy Bandwidth = QPS ร average_response_size
Ingress (incoming) = write_QPS ร request_payload_size
Egress (outgoing) = read_QPS ร response_payload_sizeWorked Example: Video Streaming Service
Copy Given: 100 M DAU, average 1 hour of video/day
Streaming bandwidth:
- Concurrent viewers (assume 10 % at peak): 10 M
- Bitrate: 5 Mbps (1080 p adaptive)
- Peak egress: 10 M ร 5 Mbps = 50 Tbps (50 Petabits/ sec)
This is why CDNs are essential!
With CDN (90 % cache hit): Origin serves 5 Tbps
Upload bandwidth:
- 500 K videos uploaded/day
- Average video: 500 MB
- Upload bandwidth: 500 K ร 500 MB / 86 ,400 s = ~3 GB/s = 24 GbpsBandwidth Cost Reference
Copy AWS Data Transfer (2024 approximate):
- Inbound: Free
- Outbound (first 10 TB): $0.09 /GB
- Outbound (next 40 TB): $0.085 /GB
- Outbound (100 TB+ ): $0.07 /GB
- CloudFront: $0.0 2-0.085 /GB (cheaper than direct)
- Same-region transfer: $0.01 /GB
- Cross-region: $0.02 /GB
Example: Serving 1 PB/month outbound
= 1 ,000 ,000 GB ร $0.07 = $70 ,000 /month
With CloudFront: ~$50 ,000 / month
6. Cost Estimation
Compute Costs (2024 Approximate)
Resource
Cost
Notes
AWS EC2 m5.xlarge (4 vCPU, 16GB)
$0.192/hr โ $140/month
General purpose
AWS EC2 c5.2xlarge (8 vCPU, 16GB)
$0.34/hr โ $245/month
Compute optimized
AWS EC2 r5.2xlarge (8 vCPU, 64GB)
$0.504/hr โ $363/month
Memory optimized
AWS Lambda
$0.20 per 1M requests + duration
Serverless
Kubernetes pod (1 vCPU, 2GB)
~$50-70/month
Managed K8s
Storage Costs
Storage Type
Cost/GB/Month
Use Case
S3 Standard
$0.023
Frequently accessed
S3 Infrequent Access
$0.0125
Monthly access
S3 Glacier
$0.004
Archival
EBS gp3 (SSD)
$0.08
Database volumes
EBS io2 (high IOPS)
$0.125 + IOPS cost
High-performance DB
RDS PostgreSQL (db.r5.xlarge)
~$500/month
Managed database
ElastiCache Redis (r5.large)
~$200/month
In-memory cache
DynamoDB (on-demand)
$1.25/M writes, $0.25/M reads
NoSQL
Quick Cost Estimation Template
Copy Monthly cost estimate for a service with:
- 10M DAU, 1000 QPS average, 3000 QPS peak
Compute (handle 3000 QPS):
- Each server handles ~500 QPS
- Need: 6 servers + 2 redundancy = 8 servers
- 8 ร m5.xlarge = 8 ร $140 = $1 ,120/month
Database:
- Primary + replica: 2 ร db.r5.xlarge = $1 ,000/month
- Storage (500GB): 500 ร $0 .08 = $40 /month
Cache:
- Redis cluster (3 nodes): 3 ร $200 = $600 /month
Storage (S3):
- 10TB media: 10,000 ร $0 .023 = $230 /month
CDN:
- 50TB egress: ~$4 ,000/month
Load Balancer:
- ALB: ~$50 /month
Total: ~$7 ,040/month โ $85K /year
7. Common Estimation Examples
Example 1: Twitter/X
Copy Scale:
- 400 M DAU
- 500 M tweets/day
- Average tweet: 280 chars + metadata โ 500 bytes
- Average user follows 200 accounts
- Timeline: 200 tweets shown
QPS:
- Tweet writes: 500 M / 100 K = 5 ,000 QPS
- Timeline reads: 400 M ร 10 reads/day / 100 K = 40 ,000 QPS
- Peak: 5 ร = 200 ,000 read QPS
Storage (tweets only):
- Daily: 500 M ร 500 B = 250 GB/day
- Annual: 250 GB ร 365 = ~90 TB/year
- 5 years: ~450 TB
Fanout:
- 5 ,000 tweets/sec ร 200 avg followers = 1 M timeline writes/sec
- Celebrity tweet (50 M followers): single tweet โ 50 M writesExample 2: YouTube
Copy Scale:
- 2 B monthly users, 500 M DAU
- 500 hours of video uploaded per minute
- Average video: 5 minutes, 50 MB (compressed)
Upload:
- 500 hours/min = 30 ,000 hours/day
- 30 ,000 ร 60 min ร 50 MB/5min = 18 PB/day raw
- With transcoding (5 resolutions): 18 PB ร 5 = 90 PB/day
Storage:
- Daily new content: ~90 PB (all resolutions)
- Annual: ~33 EB (exabytes)
Streaming:
- 500 M DAU ร 40 min average watch time
- Concurrent viewers (peak): ~50 M
- Bandwidth: 50 M ร 5 Mbps = 250 Tbps peak
- CDN handles 95 %: Origin = 12.5 Tbps
QPS:
- Video views: 500 M ร 8 videos/day / 100 K = 40 ,000 QPS
- Search: 500 M ร 3 searches/day / 100 K = 15 ,000 QPSExample 3: WhatsApp
Copy Scale :
- 2B users, 500M DAU
- 100B messages/day
QPS :
- Messages: 100B / 100K = 1,000,000 QPS (1M QPS!)
- Peak: 3M QPS
Storage :
- Average message: 100 bytes (text)
- Daily text: 100B ร 100B = 10TB/day
- Media messages (20% of total): 20B ร 200KB avg = 4PB/day
- 30-day retention: 120PB media
Connection management :
- 500M concurrent connections (WebSocket/MQTT)
- Each connection: ~10KB memory
- Total memory for connections: 500M ร 10KB = 5TB RAM
- At 64GB per server: ~80,000 servers just for connections
- Actual: use efficient protocols, ~2M connections per server
- Need: ~250 connection servers
8. Rules of Thumb
The 80/20 Rule (Pareto Principle)
Copy - 20
- 20
- 20
Application:
- Cache size : 20
- Hot partition: 20 Read/Write Ratios
System Type
Read:Write
Implication
Social media
100:1 to 1000:1
Optimize for reads, cache heavily
Messaging
1:1
Balance read/write paths
Logging/Analytics
1:100
Optimize for writes (LSM-tree)
E-commerce catalog
100:1
Cache product pages
Financial trading
1:1
Low latency both directions
Server Capacity Rules of Thumb
Copy Single server can handle:
- Web server (Nginx): 10 ,00 0-100 ,000 concurrent connections
- Application server: 50 0-2 ,000 QPS (depends on complexity)
- Database (PostgreSQL): 5 ,00 0-20 ,000 QPS (simple queries)
- Redis: 100 ,00 0-200 ,000 QPS
- Kafka broker: 100 ,00 0-200 ,000 messages/sec
Memory:
- Modern server: 6 4-512 GB RAM
- Redis: 25 GB usable per instance (leave room for overhead)
- JVM application: 4 - 32 GB heap typical
Disk:
- SSD IOPS: 10 ,00 0-100 ,000 (depends on drive)
- SSD throughput: 500 MB/s - 3 GB/s
- HDD IOPS: 10 0-200
- HDD throughput: 10 0-200 MB/ sNetwork Rules of Thumb
Copy - 1 Gbps link: ~125 MB/s throughput
- 10 Gbps link: ~1.25 GB/s throughput
- Typical server NIC: 1 0-25 Gbps
- Cross-AZ latency: 1 - 2 ms
- Cross-region latency: 5 0-200 ms
- CDN edge to user: 5 - 30 ms
Estimation Process (Interview Template)
Copy Step 1 : Clarify scale
"How many users? DAU? Geographic distribution?"
Step 2 : Estimate traffic
- Actions per user per day
- Calculate QPS (average and peak)
- Identify read vs write ratio
Step 3 : Estimate storage
- Per-item size ร items/day ร retention
- Separate hot (SSD/cache) from cold (HDD/S3)
Step 4 : Estimate bandwidth
- QPS ร payload size
- Identify if CDN is needed
Step 5 : Estimate compute
- QPS / per-server-capacity = number of servers
- Add redundancy (N+2 or 3ร for HA)
Step 6 : Identify bottlenecks
- Which resource hits limits first?
- Where do we need to scale horizontally?
Interview Cheat Sheet
When interviewer asks...
Framework to use
"How many servers do we need?"
QPS / per-server-capacity + redundancy
"How much storage?"
per_item ร items/day ร retention
"Can a single database handle this?"
Compare QPS to DB limits (5-20K QPS)
"Do we need a cache?"
If read QPS > DB capacity, yes
"Do we need a CDN?"
If serving media to global users, yes
"What's the cost?"
Compute + storage + bandwidth + managed services
"How to handle peak traffic?"
Auto-scaling, over-provision 3ร, queue overflow
Common Mistakes to Avoid
Forgetting peak vs average โ Design for peak, not average
Ignoring metadata โ Indexes, replicas, and overhead add 2-3ร raw data size
Forgetting redundancy โ Always multiply by replication factor (typically 3ร)
Precise numbers โ Round aggressively. 86,400 โ 100,000. Don't waste time on arithmetic.
Not stating assumptions โ Always say "assuming X" so interviewer can correct you