Uber Backend - Scale and Constraints

Traffic and Load Estimates

User Base Statistics

Total Registered Users: 150 million riders globally
Monthly Active Users: 130 million MAU (87% of registered)
Daily Active Users: 25 million DAU (19% of MAU)
Active Drivers: 6 million drivers globally
Daily Active Drivers: 3 million drivers online daily
Geographic Distribution: 10,000+ cities across 70+ countries

Trip Volume Analysis

Daily Trips: 50 million trips per day globally
Peak Hour Trips: 8 million trips during peak hours (6-9 PM)
Average Trip Duration: 20 minutes per trip
Concurrent Active Rides: 10 million rides at peak
Trip Requests per Second: 5,000 requests/second average, 15,000 peak
Annual Trip Volume: 18 billion trips per year

Request Patterns

Time Distribution (24-hour cycle):
00:00-06:00: 5% of daily trips   (2.5M trips, 500 req/s)
06:00-09:00: 20% of daily trips  (10M trips, 3,000 req/s)
09:00-12:00: 15% of daily trips  (7.5M trips, 2,000 req/s)
12:00-14:00: 12% of daily trips  (6M trips, 1,500 req/s)
14:00-18:00: 18% of daily trips  (9M trips, 2,500 req/s)
18:00-21:00: 25% of daily trips  (12.5M trips, 4,000 req/s)
21:00-24:00: 5% of daily trips   (2.5M trips, 500 req/s)

Geographic Distribution:
North America: 35% (17.5M trips/day)
Europe: 25% (12.5M trips/day)
Asia-Pacific: 30% (15M trips/day)
Latin America: 8% (4M trips/day)
Other: 2% (1M trips/day)

GPS and Location Data Scale

Location Update Volume

Active Drivers Sending GPS: 3 million drivers during peak
Update Frequency: Every 4 seconds per driver
GPS Updates per Second: 750,000 updates/second
Daily GPS Data Points: 65 billion location updates per day
GPS Data Size: 100 bytes per update
Daily GPS Storage: 6.5 TB of raw GPS data per day

Location Processing Requirements

Real-time Processing: Process 750K GPS updates/second
ETA Calculations: 15,000 ETA calculations/second during peak
Route Calculations: 10,000 route calculations/second
Geospatial Queries: 50,000 nearby driver queries/second
Map Tile Requests: 100,000 map tile requests/second
Geocoding Requests: 20,000 address lookups/second

Geospatial Index Size

Active Driver Locations: 3 million points in real-time index
Historical Trip Routes: 50 million trips × 300 points = 15B points
City Boundaries: 10,000 cities with polygon definitions
Service Areas: 50,000 geofenced zones globally
Heat Map Cells: 100 million grid cells for demand/supply tracking
Total Geospatial Data: 50 TB of geospatial indices

Matching and Dispatch Scale

Matching System Load

Ride Requests per Second: 5,000 average, 15,000 peak
Drivers Searched per Request: Average 50 drivers within search radius
Total Driver Evaluations: 250,000 evaluations/second average
Matching Decisions: 5,000 matches/second
Failed Match Retries: 1,000 retries/second (20% initial failure rate)
Scheduled Rides: 5 million scheduled rides in queue

Driver State Management

Driver Status Updates: 100,000 status changes/second
Driver Availability Checks: 500,000 checks/second
Driver Location Updates: 750,000 updates/second
Driver Acceptance/Rejection: 10,000 responses/second
Driver Routing Updates: 50,000 route recalculations/second
Driver Earnings Updates: 5,000 trip completions/second

Matching Algorithm Complexity

Per Ride Request:
1. Geospatial query: O(log n) for nearby drivers
2. Filter available drivers: O(k) where k = nearby drivers
3. Score and rank drivers: O(k log k)
4. Send requests: O(1) per driver
5. Wait for acceptance: Timeout-based

Total Complexity: O(k log k) per request
With k=50 average: ~300 operations per request
At 15,000 req/s peak: 4.5M operations/second

Payment Processing Scale

Transaction Volume

Daily Transactions: 50 million payment transactions
Transactions per Second: 580 TPS average, 1,800 TPS peak
Payment Methods: 60% cards, 25% digital wallets, 10% cash, 5% corporate
Average Transaction Value: $15 per trip
Daily Payment Volume: $750 million processed daily
Annual GMV: $270 billion gross merchandise value

Payment Processing Requirements

Authorization Requests: 1,800 requests/second peak
Capture Requests: 1,800 captures/second peak
Refund Requests: 100 refunds/second
Payout to Drivers: 3 million payouts per day
Split Payments: 5 million split payment calculations/day
Fraud Checks: 1,800 fraud evaluations/second

Financial Data Storage

Transaction Records: 50M transactions/day × 1KB = 50 GB/day
Payment Methods: 150M users × 2 methods × 500 bytes = 150 GB
Driver Earnings: 6M drivers × 365 days × 2KB = 4.4 TB/year
Invoices and Receipts: 50M receipts/day × 10KB = 500 GB/day
Audit Logs: 100M events/day × 500 bytes = 50 GB/day
Total Financial Data: 600 GB/day, 220 TB/year

Data Storage Requirements

Database Storage Breakdown

User Data:
- Rider Profiles: 150M × 5KB = 750 GB
- Driver Profiles: 6M × 20KB = 120 GB
- Payment Methods: 150M × 1KB = 150 GB
- Preferences: 150M × 2KB = 300 GB
Total User Data: ~1.3 TB

Trip Data:
- Active Trips: 10M × 10KB = 100 GB
- Daily Trip Records: 50M × 5KB = 250 GB/day
- Trip History (2 years): 36.5B × 5KB = 183 TB
- GPS Trails: 50M trips × 300 points × 100 bytes = 1.5 TB/day
Total Trip Data: 200+ TB

Location Data:
- Real-time Driver Locations: 3M × 200 bytes = 600 MB
- Historical GPS Data (90 days): 6.5TB/day × 90 = 585 TB
- Heat Maps: 100M cells × 1KB = 100 GB
Total Location Data: 600+ TB

Payment Data:
- Transaction History: 220 TB/year
- Payment Methods: 150 GB
- Invoices: 500 GB/day × 365 = 183 TB/year
Total Payment Data: 400+ TB

Analytics and Logs:
- Application Logs: 10 TB/day
- Analytics Events: 5 TB/day
- Audit Logs: 50 GB/day
Total Logs: 15 TB/day, 5.5 PB/year

Grand Total Storage: 1.5 PB active data, 6 PB total with history

Cache Requirements

Driver Location Cache: 3M drivers × 500 bytes = 1.5 GB
Rider Session Cache: 10M active sessions × 10KB = 100 GB
Surge Pricing Cache: 100K zones × 1KB = 100 MB
ETA Cache: 50M cached ETAs × 500 bytes = 25 GB
User Profile Cache: 25M hot profiles × 5KB = 125 GB
Total Cache Memory: 250 GB across distributed cache cluster

Network Bandwidth Requirements

Data Transfer Volume

Inbound Traffic:
- GPS Updates: 750K/s × 100 bytes = 75 MB/s = 6.5 TB/day
- API Requests: 50K req/s × 2KB = 100 MB/s = 8.6 TB/day
- Image Uploads: 10K uploads/s × 500KB = 5 GB/s = 432 TB/day
Total Inbound: 5.2 GB/s peak, 450 TB/day

Outbound Traffic:
- Map Tiles: 100K req/s × 50KB = 5 GB/s = 432 TB/day
- API Responses: 50K resp/s × 5KB = 250 MB/s = 21.6 TB/day
- Push Notifications: 100K notif/s × 1KB = 100 MB/s = 8.6 TB/day
- Real-time Updates: 1M updates/s × 500 bytes = 500 MB/s = 43 TB/day
Total Outbound: 5.8 GB/s peak, 505 TB/day

Total Bandwidth: 11 GB/s peak, 955 TB/day

CDN and Edge Requirements

Static Assets: 500 GB of app assets, images, maps
CDN Cache Hit Rate: 95% for static content
Edge Locations: 200+ PoPs globally
CDN Bandwidth: 5 GB/s for cached content
Origin Bandwidth: 250 MB/s for cache misses

Compute Requirements

Application Server Capacity

API Servers: 5,000 instances (16 vCPU, 32GB RAM each)
Matching Service: 2,000 instances (32 vCPU, 64GB RAM each)
Location Service: 3,000 instances (16 vCPU, 32GB RAM each)
Payment Service: 1,000 instances (8 vCPU, 16GB RAM each)
Notification Service: 500 instances (8 vCPU, 16GB RAM each)
Total Compute: 200,000 vCPUs, 400 TB RAM

Database Capacity

Primary Databases: 500 shards (64 vCPU, 256GB RAM each)
Read Replicas: 1,500 replicas (32 vCPU, 128GB RAM each)
Cache Clusters: 200 nodes (16 vCPU, 128GB RAM each)
Analytics Databases: 100 nodes (64 vCPU, 512GB RAM each)
Total Database Compute: 100,000 vCPUs, 250 TB RAM

Message Queue Capacity

Kafka Brokers: 500 brokers (16 vCPU, 64GB RAM each)
Message Throughput: 10 million messages/second
Message Retention: 7 days of message history
Storage per Broker: 10 TB NVMe SSD
Total Queue Storage: 5 PB

Cost Estimates

Infrastructure Costs (Monthly)

Compute:
- Application Servers: $500K/month
- Database Servers: $800K/month
- Cache Clusters: $100K/month
- Message Queues: $200K/month
Total Compute: $1.6M/month

Storage:
- Database Storage: $300K/month
- Object Storage: $200K/month
- Backup Storage: $100K/month
Total Storage: $600K/month

Network:
- Data Transfer: $400K/month
- CDN: $300K/month
- Load Balancers: $50K/month
Total Network: $750K/month

Third-Party Services:
- Maps API: $500K/month
- Payment Processing: $200K/month (+ 2.5% transaction fees)
- SMS/Push Notifications: $100K/month
- Monitoring/Logging: $50K/month
Total Third-Party: $850K/month

Grand Total: $3.8M/month infrastructure
Transaction Fees: $187M/month (2.5% of $7.5B monthly GMV)
Total Operating Cost: $191M/month, $2.3B/year

Cost per Trip

Infrastructure Cost: $0.076 per trip ($3.8M / 50M trips)
Transaction Fees: $0.375 per trip (2.5% of $15 average)
Total Cost per Trip: $0.45
Revenue per Trip: $3.75 (25% take rate on $15 trip)
Gross Margin: $3.30 per trip (88% margin)

Scaling Bottlenecks

Critical Bottlenecks

Geospatial Queries: 250K driver evaluations/second
GPS Data Ingestion: 750K location updates/second
Real-time Matching: 15K matching decisions/second
Payment Processing: 1,800 transactions/second
Database Writes: 100K writes/second across all shards
Cache Invalidation: Coordinating cache updates across regions

Mitigation Strategies

Geospatial: Distributed geospatial indices with regional sharding
GPS Ingestion: Kafka-based buffering with batch processing
Matching: Parallel matching with pre-filtered driver pools
Payments: Async payment processing with eventual consistency
Database: Aggressive sharding and read replica scaling
Cache: Regional cache clusters with lazy invalidation

This scale analysis demonstrates the massive infrastructure required to operate a global ride-sharing platform at Uber's scale, processing billions of events daily while maintaining sub-second response times.