๐Ÿงฎ Estimation

Back-of-Envelope Estimation

๐Ÿ“– 13 min read ๐Ÿง  Complete Guide

Back-of-Envelope Estimation: Complete Guide for System Design

Overview

Every system design interview expects you to ground your design in real numbers. Back-of-envelope estimation demonstrates that you can reason quantitatively about scale, identify bottlenecks before they happen, and make informed capacity decisions.

The goal isn't precision โ€” it's demonstrating structured thinking and arriving at the right order of magnitude.


1. Powers of 2 Reference Table

Memorize these. They come up constantly.

Power Exact Value Approximate Common Name
2^10 1,024 ~1 Thousand 1 KB
2^20 1,048,576 ~1 Million 1 MB
2^30 1,073,741,824 ~1 Billion 1 GB
2^40 1,099,511,627,776 ~1 Trillion 1 TB
2^50 ~1.13 ร— 10^15 ~1 Quadrillion 1 PB

Quick Conversions

1 KB = 1,000 bytes (for estimation, use 10^3)
1 MB = 1,000 KB = 10^6 bytes
1 GB = 1,000 MB = 10^9 bytes
1 TB = 1,000 GB = 10^12 bytes
1 PB = 1,000 TB = 10^15 bytes

Time:
1 day = 86,400 seconds โ‰ˆ 10^5 seconds (use 100K)
1 month โ‰ˆ 2.5 ร— 10^6 seconds (use 2.5M)
1 year โ‰ˆ 3 ร— 10^7 seconds (use 30M)

Useful Multipliers

Seconds in a day:    86,400 โ‰ˆ 100,000 (10^5)
Seconds in a month:  2,592,000 โ‰ˆ 2.5 ร— 10^6
Seconds in a year:   31,536,000 โ‰ˆ 3 ร— 10^7

For QPS calculations, use:
  Daily requests / 100,000 = average QPS
  Peak QPS โ‰ˆ 2-5ร— average QPS

2. Latency Numbers Every Programmer Should Know

The Complete Table (2024 Numbers)

Operation Latency Notes
L1 cache reference 1 ns
Branch mispredict 3 ns
L2 cache reference 4 ns
Mutex lock/unlock 17 ns
L3 cache reference 12 ns
Main memory reference 100 ns
Compress 1KB (Snappy) 3,000 ns (3 ฮผs)
Read 1MB sequentially (memory) 3,000 ns (3 ฮผs)
SSD random read 16,000 ns (16 ฮผs)
Read 1MB sequentially (SSD) 49,000 ns (49 ฮผs)
Round trip same datacenter 500,000 ns (500 ฮผs) 0.5 ms
Read 1MB sequentially (HDD) 825,000 ns (825 ฮผs)
Disk seek (HDD) 2,000,000 ns (2 ms)
Redis GET (same DC) 500,000-1,000,000 ns 0.5-1 ms
Database query (indexed) 1-5 ms
Send packet CAโ†’Netherlandsโ†’CA 150,000,000 ns 150 ms
TLS handshake 2-10 ms Depends on version
TCP handshake (same region) 0.5-1 ms
TCP handshake (cross-region) 50-150 ms

Visual Scale

1 ns    โ–ˆโ–ˆโ–ˆโ–ˆ L1 cache
4 ns    โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ L2 cache
100 ns  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ RAM

                    โ†“ 1000x gap โ†“

16 ฮผs   โ–ˆโ–ˆโ–ˆโ–ˆ SSD random read
49 ฮผs   โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ SSD sequential 1MB
500 ฮผs  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ Network (same DC)

                    โ†“ 1000x gap โ†“

2 ms    โ–ˆโ–ˆโ–ˆโ–ˆ HDD seek
5 ms    โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ DB query
150 ms  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ Cross-continent

Key Takeaways for Design

  1. Memory is 100x faster than SSD โ†’ Cache aggressively
  2. SSD is 100x faster than network โ†’ Minimize network hops
  3. Same-DC network is 300x faster than cross-continent โ†’ Co-locate services
  4. Sequential reads are 50x faster than random โ†’ Design for sequential access
  5. Compression is cheap โ†’ Compress before sending over network

3. How to Estimate QPS from DAU

Framework

Given: DAU (Daily Active Users)

Step 1: Estimate actions per user per day
Step 2: Calculate daily requests
Step 3: Convert to QPS (รท 86,400 โ‰ˆ รท 100,000)
Step 4: Estimate peak (2-5ร— average)

Formula:
  Average QPS = (DAU ร— actions_per_user) / 86,400
  Peak QPS = Average QPS ร— peak_factor (typically 2-5ร—)

Common Ratios

Platform Type Actions/User/Day Read:Write Ratio
Social media (read-heavy) 20-50 reads, 1-2 writes 100:1
Messaging 20-40 messages sent 1:1 (send/receive)
E-commerce 5-10 page views, 0.1 purchases 100:1
Search engine 5-10 searches Read-only
Video streaming 2-5 videos watched 1000:1 (views:uploads)
File storage 3-5 reads, 1-2 writes 5:1

Worked Example: Twitter-like Service

Given: 300M DAU

Tweets:
  - Average user posts 0.5 tweets/day
  - Write QPS = 300M ร— 0.5 / 100K = 1,500 QPS
  - Peak write QPS = 1,500 ร— 3 = 4,500 QPS

Timeline reads:
  - Average user reads timeline 10 times/day
  - Read QPS = 300M ร— 10 / 100K = 30,000 QPS
  - Peak read QPS = 30,000 ร— 3 = 90,000 QPS

Fanout:
  - Average 200 followers per user
  - Fanout writes = 1,500 ร— 200 = 300,000 writes/sec to timelines

4. Storage Estimation Framework

Formula

Total Storage = per_item_size ร— items_per_user ร— total_users ร— retention_period

Growth rate = new_items_per_day ร— per_item_size

Common Data Sizes

Data Type Typical Size
User ID (UUID) 16 bytes
Timestamp 8 bytes
Short text (tweet) 280 bytes (UTF-8)
Email/message 5-50 KB
User profile (JSON) 1-5 KB
Thumbnail image 10-50 KB
Standard photo 200 KB - 2 MB
High-res photo 2-10 MB
1 min video (720p) 10-20 MB
1 min video (1080p) 30-60 MB
1 min video (4K) 100-300 MB
1 hour video (1080p) 2-4 GB
Database row (typical) 200 bytes - 2 KB
Log entry 200-500 bytes

Worked Example: Instagram-like Service

Given: 500M DAU, 2B total users

Photos:
  - 10% of DAU upload daily = 50M photos/day
  - Average photo size: 2MB (original) + 200KB (thumbnails) โ‰ˆ 2.2MB
  - Daily storage: 50M ร— 2.2MB = 110TB/day
  - Annual storage: 110TB ร— 365 = ~40PB/year

Metadata:
  - Per photo: user_id(8B) + timestamp(8B) + location(16B) + 
    caption(500B) + tags(200B) โ‰ˆ 1KB
  - Daily metadata: 50M ร— 1KB = 50GB/day
  - Annual metadata: 50GB ร— 365 = ~18TB/year

Total after 5 years:
  - Photos: ~200PB
  - Metadata: ~90TB

5. Bandwidth Estimation

Formula

Bandwidth = QPS ร— average_response_size

Ingress (incoming) = write_QPS ร— request_payload_size
Egress (outgoing) = read_QPS ร— response_payload_size

Worked Example: Video Streaming Service

Given: 100M DAU, average 1 hour of video/day

Streaming bandwidth:
  - Concurrent viewers (assume 10% at peak): 10M
  - Bitrate: 5 Mbps (1080p adaptive)
  - Peak egress: 10M ร— 5 Mbps = 50 Tbps (50 Petabits/sec)
  
  This is why CDNs are essential!
  With CDN (90% cache hit): Origin serves 5 Tbps

Upload bandwidth:
  - 500K videos uploaded/day
  - Average video: 500MB
  - Upload bandwidth: 500K ร— 500MB / 86,400s = ~3 GB/s = 24 Gbps

Bandwidth Cost Reference

AWS Data Transfer (2024 approximate):
  - Inbound: Free
  - Outbound (first 10TB): $0.09/GB
  - Outbound (next 40TB): $0.085/GB
  - Outbound (100TB+): $0.07/GB
  - CloudFront: $0.02-0.085/GB (cheaper than direct)
  - Same-region transfer: $0.01/GB
  - Cross-region: $0.02/GB

Example: Serving 1PB/month outbound
  = 1,000,000 GB ร— $0.07 = $70,000/month
  With CloudFront: ~$50,000/month

6. Cost Estimation

Compute Costs (2024 Approximate)

Resource Cost Notes
AWS EC2 m5.xlarge (4 vCPU, 16GB) $0.192/hr โ‰ˆ $140/month General purpose
AWS EC2 c5.2xlarge (8 vCPU, 16GB) $0.34/hr โ‰ˆ $245/month Compute optimized
AWS EC2 r5.2xlarge (8 vCPU, 64GB) $0.504/hr โ‰ˆ $363/month Memory optimized
AWS Lambda $0.20 per 1M requests + duration Serverless
Kubernetes pod (1 vCPU, 2GB) ~$50-70/month Managed K8s

Storage Costs

Storage Type Cost/GB/Month Use Case
S3 Standard $0.023 Frequently accessed
S3 Infrequent Access $0.0125 Monthly access
S3 Glacier $0.004 Archival
EBS gp3 (SSD) $0.08 Database volumes
EBS io2 (high IOPS) $0.125 + IOPS cost High-performance DB
RDS PostgreSQL (db.r5.xlarge) ~$500/month Managed database
ElastiCache Redis (r5.large) ~$200/month In-memory cache
DynamoDB (on-demand) $1.25/M writes, $0.25/M reads NoSQL

Quick Cost Estimation Template

Monthly cost estimate for a service with:
  - 10M DAU, 1000 QPS average, 3000 QPS peak

Compute (handle 3000 QPS):
  - Each server handles ~500 QPS
  - Need: 6 servers + 2 redundancy = 8 servers
  - 8 ร— m5.xlarge = 8 ร— $140 = $1,120/month

Database:
  - Primary + replica: 2 ร— db.r5.xlarge = $1,000/month
  - Storage (500GB): 500 ร— $0.08 = $40/month

Cache:
  - Redis cluster (3 nodes): 3 ร— $200 = $600/month

Storage (S3):
  - 10TB media: 10,000 ร— $0.023 = $230/month

CDN:
  - 50TB egress: ~$4,000/month

Load Balancer:
  - ALB: ~$50/month

Total: ~$7,040/month โ‰ˆ $85K/year

7. Common Estimation Examples

Example 1: Twitter/X

Scale:
  - 400M DAU
  - 500M tweets/day
  - Average tweet: 280 chars + metadata โ‰ˆ 500 bytes
  - Average user follows 200 accounts
  - Timeline: 200 tweets shown

QPS:
  - Tweet writes: 500M / 100K = 5,000 QPS
  - Timeline reads: 400M ร— 10 reads/day / 100K = 40,000 QPS
  - Peak: 5ร— = 200,000 read QPS

Storage (tweets only):
  - Daily: 500M ร— 500B = 250GB/day
  - Annual: 250GB ร— 365 = ~90TB/year
  - 5 years: ~450TB

Fanout:
  - 5,000 tweets/sec ร— 200 avg followers = 1M timeline writes/sec
  - Celebrity tweet (50M followers): single tweet โ†’ 50M writes

Example 2: YouTube

Scale:
  - 2B monthly users, 500M DAU
  - 500 hours of video uploaded per minute
  - Average video: 5 minutes, 50MB (compressed)

Upload:
  - 500 hours/min = 30,000 hours/day
  - 30,000 ร— 60 min ร— 50MB/5min = 18PB/day raw
  - With transcoding (5 resolutions): 18PB ร— 5 = 90PB/day

Storage:
  - Daily new content: ~90PB (all resolutions)
  - Annual: ~33EB (exabytes)

Streaming:
  - 500M DAU ร— 40 min average watch time
  - Concurrent viewers (peak): ~50M
  - Bandwidth: 50M ร— 5Mbps = 250 Tbps peak
  - CDN handles 95%: Origin = 12.5 Tbps

QPS:
  - Video views: 500M ร— 8 videos/day / 100K = 40,000 QPS
  - Search: 500M ร— 3 searches/day / 100K = 15,000 QPS

Example 3: WhatsApp

Scale:
  - 2B users, 500M DAU
  - 100B messages/day

QPS:
  - Messages: 100B / 100K = 1,000,000 QPS (1M QPS!)
  - Peak: 3M QPS

Storage:
  - Average message: 100 bytes (text)
  - Daily text: 100B ร— 100B = 10TB/day
  - Media messages (20% of total): 20B ร— 200KB avg = 4PB/day
  - 30-day retention: 120PB media

Connection management:
  - 500M concurrent connections (WebSocket/MQTT)
  - Each connection: ~10KB memory
  - Total memory for connections: 500M ร— 10KB = 5TB RAM
  - At 64GB per server: ~80,000 servers just for connections
  - Actual: use efficient protocols, ~2M connections per server
  - Need: ~250 connection servers

8. Rules of Thumb

The 80/20 Rule (Pareto Principle)

- 20% of users generate 80% of traffic
- 20% of data is accessed 80% of the time (cache this!)
- 20% of features handle 80% of use cases

Application:
  - Cache size: 20% of total data covers 80% of reads
  - Hot partition: 20% of keys get 80% of requests

Read/Write Ratios

System Type Read:Write Implication
Social media 100:1 to 1000:1 Optimize for reads, cache heavily
Messaging 1:1 Balance read/write paths
Logging/Analytics 1:100 Optimize for writes (LSM-tree)
E-commerce catalog 100:1 Cache product pages
Financial trading 1:1 Low latency both directions

Server Capacity Rules of Thumb

Single server can handle:
  - Web server (Nginx): 10,000-100,000 concurrent connections
  - Application server: 500-2,000 QPS (depends on complexity)
  - Database (PostgreSQL): 5,000-20,000 QPS (simple queries)
  - Redis: 100,000-200,000 QPS
  - Kafka broker: 100,000-200,000 messages/sec

Memory:
  - Modern server: 64-512GB RAM
  - Redis: 25GB usable per instance (leave room for overhead)
  - JVM application: 4-32GB heap typical

Disk:
  - SSD IOPS: 10,000-100,000 (depends on drive)
  - SSD throughput: 500MB/s - 3GB/s
  - HDD IOPS: 100-200
  - HDD throughput: 100-200MB/s

Network Rules of Thumb

- 1 Gbps link: ~125 MB/s throughput
- 10 Gbps link: ~1.25 GB/s throughput
- Typical server NIC: 10-25 Gbps
- Cross-AZ latency: 1-2ms
- Cross-region latency: 50-200ms
- CDN edge to user: 5-30ms

Estimation Process (Interview Template)

Step 1: Clarify scale
  "How many users? DAU? Geographic distribution?"

Step 2: Estimate traffic
  - Actions per user per day
  - Calculate QPS (average and peak)
  - Identify read vs write ratio

Step 3: Estimate storage
  - Per-item size ร— items/day ร— retention
  - Separate hot (SSD/cache) from cold (HDD/S3)

Step 4: Estimate bandwidth
  - QPS ร— payload size
  - Identify if CDN is needed

Step 5: Estimate compute
  - QPS / per-server-capacity = number of servers
  - Add redundancy (N+2 or 3ร— for HA)

Step 6: Identify bottlenecks
  - Which resource hits limits first?
  - Where do we need to scale horizontally?

Interview Cheat Sheet

When interviewer asks... Framework to use
"How many servers do we need?" QPS / per-server-capacity + redundancy
"How much storage?" per_item ร— items/day ร— retention
"Can a single database handle this?" Compare QPS to DB limits (5-20K QPS)
"Do we need a cache?" If read QPS > DB capacity, yes
"Do we need a CDN?" If serving media to global users, yes
"What's the cost?" Compute + storage + bandwidth + managed services
"How to handle peak traffic?" Auto-scaling, over-provision 3ร—, queue overflow

Common Mistakes to Avoid

  1. Forgetting peak vs average โ€” Design for peak, not average
  2. Ignoring metadata โ€” Indexes, replicas, and overhead add 2-3ร— raw data size
  3. Forgetting redundancy โ€” Always multiply by replication factor (typically 3ร—)
  4. Precise numbers โ€” Round aggressively. 86,400 โ‰ˆ 100,000. Don't waste time on arithmetic.
  5. Not stating assumptions โ€” Always say "assuming X" so interviewer can correct you