Uber Backend - System Architecture

High-Level Architecture Overview

System Architecture Principles

Microservices Architecture: 2,000+ independent services
Event-Driven Design: Async communication via message queues
Geographic Sharding: Data partitioned by city/region
Multi-Region Deployment: Active-active across 10+ regions
Real-time First: Optimized for sub-second latency
Fault Tolerance: Graceful degradation and automatic recovery

Core Architecture Diagram

┌─────────────────────────────────────────────────────────────────┐
│                        Client Layer                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐         │
│  │ Rider Apps   │  │ Driver Apps  │  │  Web Portal  │         │
│  │ (iOS/Android)│  │ (iOS/Android)│  │   (Browser)  │         │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘         │
└─────────┼──────────────────┼──────────────────┼─────────────────┘
          │                  │                  │
          └──────────────────┼──────────────────┘
                            │
          ┌─────────────────┴─────────────────┐
          │      Global Load Balancer         │
          │    (GeoDNS + Anycast Routing)     │
          └─────────────────┬─────────────────┘
                            │
          ┌─────────────────┴─────────────────┐
          │         API Gateway               │
          │  (Auth, Rate Limit, Routing)      │
          └─────────────────┬─────────────────┘
                            │
    ┌───────────────────────┼───────────────────────┐
    │                       │                       │
┌───┴────────┐    ┌─────────┴─────────┐    ┌───────┴────────┐
│  Matching  │    │   Location        │    │   Payment      │
│  Service   │    │   Service         │    │   Service      │
└───┬────────┘    └─────────┬─────────┘    └───────┬────────┘
    │                       │                       │
    └───────────────────────┼───────────────────────┘
                            │
          ┌─────────────────┴─────────────────┐
          │      Message Queue (Kafka)        │
          │    (Event Streaming Platform)     │
          └─────────────────┬─────────────────┘
                            │
    ┌───────────────────────┼───────────────────────┐
    │                       │                       │
┌───┴────────┐    ┌─────────┴─────────┐    ┌───────┴────────┐
│   Trip     │    │   Notification    │    │   Analytics    │
│  Service   │    │   Service         │    │   Service      │
└───┬────────┘    └─────────┬─────────┘    └───────┬────────┘
    │                       │                       │
    └───────────────────────┼───────────────────────┘
                            │
          ┌─────────────────┴─────────────────┐
          │         Data Layer                │
          │  ┌──────────┐  ┌──────────┐      │
          │  │PostgreSQL│  │  Redis   │      │
          │  │ Clusters │  │ Clusters │      │
          │  └──────────┘  └──────────┘      │
          └───────────────────────────────────┘

Core Service Architecture

1. Matching Service (DISCO - Dispatch Optimization)

┌─────────────────────────────────────────────────────────────┐
│                    Matching Service                         │
├─────────────────────────────────────────────────────────────┤
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │   Request    │  │   Driver     │  │   Matching   │     │
│  │   Handler    │  │   Finder     │  │   Algorithm  │     │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘     │
│         │                  │                  │             │
│         ▼                  ▼                  ▼             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │  Validation  │  │  Geospatial  │  │   Scoring    │     │
│  │   & Queue    │  │    Index     │  │   Engine     │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
└─────────────────────────────────────────────────────────────┘

Responsibilities:

Receive ride requests from riders
Query geospatial index for nearby available drivers
Score and rank drivers based on multiple factors
Send ride offers to selected drivers
Handle driver acceptance/rejection
Retry matching if initial attempts fail
Optimize for ETA, driver earnings, and rider experience

Matching Algorithm:

1. Receive ride request (pickup, dropoff, ride type)
2. Determine search radius (start 0.5 miles, expand to 5 miles)
3. Query geospatial index for drivers in radius
4. Filter drivers:
   - Available status
   - Correct vehicle type
   - Minimum rating threshold
   - Not recently rejected by rider
5. Score each driver:
   - Distance to pickup (40% weight)
   - Driver rating (20% weight)
   - Acceptance rate (15% weight)
   - Driver earnings balance (15% weight)
   - Time since last trip (10% weight)
6. Rank drivers by score
7. Send offer to top 3 drivers simultaneously
8. First to accept gets the trip
9. If no acceptance in 15 seconds, expand search and retry

Geospatial Indexing:

Technology: S2 Geometry library for spatial indexing
Cell Levels: Level 13 cells (~1km²) for driver indexing
Update Frequency: Real-time updates as drivers move
Query Performance: O(log n) for nearby driver queries
Sharding: Partition by city/region for horizontal scaling

2. Location Service (Real-time GPS Tracking)

┌─────────────────────────────────────────────────────────────┐
│                    Location Service                         │
├─────────────────────────────────────────────────────────────┤
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │   GPS Data   │  │   Location   │  │     ETA      │     │
│  │   Ingestion  │  │   Storage    │  │  Calculator  │     │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘     │
│         │                  │                  │             │
│         ▼                  ▼                  ▼             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │   Kafka      │  │  Geospatial  │  │   Routing    │     │
│  │   Stream     │  │   Database   │  │   Engine     │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
└─────────────────────────────────────────────────────────────┘

Responsibilities:

Ingest 750K GPS updates per second from driver apps
Store real-time driver locations in geospatial index
Calculate ETAs using traffic data and routing algorithms
Provide location history for trip reconstruction
Detect geofence events (arrival at pickup/dropoff)
Monitor driver movement patterns for fraud detection

GPS Data Pipeline:

Driver App → Load Balancer → Location API → Kafka Topic
                                                  ↓
                                          Stream Processor
                                                  ↓
                                    ┌─────────────┴─────────────┐
                                    ↓                           ↓
                            Geospatial Index              Time-Series DB
                            (Real-time queries)           (Historical data)

ETA Calculation:

Routing Engine: Google Maps API / Mapbox / Internal routing
Traffic Data: Real-time traffic conditions from multiple sources
Historical Patterns: ML models trained on historical trip data
Dynamic Updates: Recalculate ETA every 30 seconds during trip
Accuracy Target: Within 2 minutes 90% of the time

3. Payment Service

┌─────────────────────────────────────────────────────────────┐
│                    Payment Service                          │
├─────────────────────────────────────────────────────────────┤
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │   Payment    │  │    Fraud     │  │   Billing    │     │
│  │  Processing  │  │  Detection   │  │   Engine     │     │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘     │
│         │                  │                  │             │
│         ▼                  ▼                  ▼             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │   Stripe/    │  │   ML Fraud   │  │   Invoice    │     │
│  │   Braintree  │  │   Models     │  │  Generator   │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
└─────────────────────────────────────────────────────────────┘

Responsibilities:

Process 1,800 payment transactions per second at peak
Support multiple payment methods (cards, wallets, cash)
Calculate trip fares with surge pricing
Handle split payments and promotions
Detect and prevent fraudulent transactions
Generate invoices and receipts
Process driver payouts

Payment Flow:

1. Trip Completion Event
2. Calculate Fare:
   - Base fare + (distance × per-mile rate) + (time × per-minute rate)
   - Apply surge multiplier
   - Apply promotions/discounts
   - Calculate taxes and fees
3. Fraud Check:
   - Verify payment method validity
   - Check user fraud score
   - Validate trip legitimacy
4. Process Payment:
   - Authorize payment method
   - Capture funds
   - Handle payment failures with retry logic
5. Distribute Funds:
   - Rider charged
   - Driver credited (minus Uber commission)
   - Generate receipt
6. Async Processing:
   - Update analytics
   - Trigger notifications
   - Archive transaction

4. Trip Service (Ride Lifecycle Management)

┌─────────────────────────────────────────────────────────────┐
│                      Trip Service                           │
├─────────────────────────────────────────────────────────────┤
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │     Trip     │  │    State     │  │   History    │     │
│  │  Management  │  │   Machine    │  │   Storage    │     │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘     │
│         │                  │                  │             │
│         ▼                  ▼                  ▼             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │   Trip DB    │  │   Event      │  │   Analytics  │     │
│  │   (Sharded)  │  │   Stream     │  │   Pipeline   │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
└─────────────────────────────────────────────────────────────┘

Trip State Machine:

REQUESTED → MATCHED → ACCEPTED → ARRIVING → ARRIVED → 
STARTED → IN_PROGRESS → COMPLETED → PAID

Cancellation States:
REQUESTED → CANCELLED_BY_RIDER
MATCHED → CANCELLED_BY_DRIVER
ACCEPTED → CANCELLED_BY_RIDER (with fee)
ARRIVING → CANCELLED_BY_DRIVER (with penalty)

Trip Data Model:

{
  "trip_id": "uuid",
  "rider_id": "uuid",
  "driver_id": "uuid",
  "status": "IN_PROGRESS",
  "ride_type": "UBER_X",
  "pickup": {
    "lat": 37.7749,
    "lng": -122.4194,
    "address": "123 Market St, SF",
    "timestamp": "2026-01-08T10:00:00Z"
  },
  "dropoff": {
    "lat": 37.7849,
    "lng": -122.4094,
    "address": "456 Mission St, SF",
    "timestamp": "2026-01-08T10:20:00Z"
  },
  "fare": {
    "base_fare": 2.50,
    "distance_fare": 8.00,
    "time_fare": 3.50,
    "surge_multiplier": 1.5,
    "total": 21.00,
    "currency": "USD"
  },
  "route": {
    "distance_miles": 4.2,
    "duration_minutes": 18,
    "gps_trail": [...]
  },
  "created_at": "2026-01-08T09:55:00Z",
  "updated_at": "2026-01-08T10:20:00Z"
}

5. Surge Pricing Service (Dynamic Pricing)

┌─────────────────────────────────────────────────────────────┐
│                  Surge Pricing Service                      │
├─────────────────────────────────────────────────────────────┤
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │   Supply/    │  │   Surge      │  │   Price      │     │
│  │   Demand     │  │  Calculator  │  │  Optimizer   │     │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘     │
│         │                  │                  │             │
│         ▼                  ▼                  ▼             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │   Heat Map   │  │   ML Price   │  │   Cache      │     │
│  │   Generator  │  │   Models     │  │   Layer      │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
└─────────────────────────────────────────────────────────────┘

Surge Calculation Algorithm:

1. Divide city into hexagonal grid cells (H3 geospatial index)
2. For each cell, calculate:
   - Active ride requests (demand)
   - Available drivers (supply)
   - Supply/Demand ratio
3. Determine surge multiplier:
   - Ratio > 2.0: No surge (1.0x)
   - Ratio 1.5-2.0: Low surge (1.2x)
   - Ratio 1.0-1.5: Medium surge (1.5x)
   - Ratio 0.5-1.0: High surge (2.0x)
   - Ratio < 0.5: Extreme surge (3.0x-5.0x)
4. Apply smoothing:
   - Gradual changes to avoid price shocks
   - Neighboring cell influence
   - Time-based decay
5. Update surge map every 1-2 minutes
6. Cache surge values for fast lookups

ML-Based Price Optimization:

Features: Time of day, day of week, weather, events, historical patterns
Model: Gradient boosting for demand prediction
Training: Continuous learning from trip data
Objective: Maximize rider acceptance rate while balancing supply

Data Architecture

Database Sharding Strategy

┌─────────────────────────────────────────────────────────────┐
│                  Database Sharding                          │
├─────────────────────────────────────────────────────────────┤
│  Geographic Sharding (Primary Strategy)                     │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │   US-West    │  │   US-East    │  │   Europe     │     │
│  │   Shard      │  │   Shard      │  │   Shard      │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
│                                                             │
│  User ID Sharding (Secondary Strategy)                     │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │  Shard 0-99  │  │ Shard 100-199│  │ Shard 200-299│     │
│  │  (User IDs)  │  │  (User IDs)  │  │  (User IDs)  │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
└─────────────────────────────────────────────────────────────┘

Sharding Keys:

Trips: Shard by city_id (geographic locality)
Users: Shard by user_id hash (even distribution)
Drivers: Shard by home_city_id (driver's primary city)
Payments: Shard by transaction_id (time-based UUID)

Caching Strategy

┌─────────────────────────────────────────────────────────────┐
│                    Cache Architecture                       │
├─────────────────────────────────────────────────────────────┤
│  L1 Cache (Application Memory)                              │
│  - Driver locations (5-second TTL)                          │
│  - Surge pricing (60-second TTL)                            │
│  - User sessions (30-minute TTL)                            │
│                                                             │
│  L2 Cache (Redis Cluster)                                   │
│  - User profiles (1-hour TTL)                               │
│  - Driver profiles (1-hour TTL)                             │
│  - Trip history (5-minute TTL)                              │
│  - Payment methods (10-minute TTL)                          │
│                                                             │
│  L3 Cache (CDN)                                             │
│  - Static assets (24-hour TTL)                              │
│  - Map tiles (7-day TTL)                                    │
│  - Profile images (1-day TTL)                               │
└─────────────────────────────────────────────────────────────┘

Event-Driven Architecture

Event Streaming with Kafka

┌─────────────────────────────────────────────────────────────┐
│                    Kafka Topics                             │
├─────────────────────────────────────────────────────────────┤
│  trip-events          : Trip lifecycle events               │
│  location-updates     : GPS location updates                │
│  payment-events       : Payment transactions                │
│  driver-status-events : Driver availability changes         │
│  surge-updates        : Surge pricing changes               │
│  notification-events  : Push notification triggers          │
│  analytics-events     : User behavior and metrics           │
└─────────────────────────────────────────────────────────────┘

Event Flow Example (Trip Completion):

1. Driver marks trip as completed
2. Trip Service publishes "trip.completed" event to Kafka
3. Multiple consumers process event:
   - Payment Service: Calculate and charge fare
   - Notification Service: Send receipt to rider
   - Analytics Service: Update metrics and ML models
   - Rating Service: Prompt rider/driver to rate each other
   - Earnings Service: Update driver earnings
   - Fraud Service: Analyze trip for anomalies
4. Each service publishes its own events
5. Eventual consistency achieved across all services

Multi-Region Architecture

Active-Active Deployment

┌─────────────────────────────────────────────────────────────┐
│                  Global Architecture                        │
├─────────────────────────────────────────────────────────────┤
│  Region: US-West (Primary)                                  │
│  - Serves: California, Nevada, Oregon, Washington          │
│  - Data Centers: San Francisco, Los Angeles, Seattle       │
│                                                             │
│  Region: US-East (Primary)                                  │
│  - Serves: New York, Boston, DC, Florida                   │
│  - Data Centers: Virginia, New York, Atlanta               │
│                                                             │
│  Region: Europe (Primary)                                   │
│  - Serves: UK, France, Germany, Netherlands                │
│  - Data Centers: London, Amsterdam, Frankfurt              │
│                                                             │
│  Region: Asia-Pacific (Primary)                             │
│  - Serves: India, Singapore, Australia, Japan              │
│  - Data Centers: Mumbai, Singapore, Sydney, Tokyo          │
└─────────────────────────────────────────────────────────────┘

Cross-Region Replication:

User Data: Async replication with 5-minute lag
Trip Data: Replicated to backup region only
Payment Data: Sync replication for compliance
Analytics Data: Async replication with 1-hour lag

This architecture enables Uber to handle millions of concurrent rides globally while maintaining sub-second response times and high reliability through geographic distribution, microservices, and event-driven design.