Scaling Considerations

📖 9 min read 📄 Part 6 of 10

Digital Wallet - Scaling Considerations

Reading Time: 20 minutes

Horizontal Scaling

Application Tier Scaling

Auto-Scaling Configuration:
├── Min Instances: 10
├── Max Instances: 500
├── Target CPU: 70%
├── Target Memory: 80%
├── Scale-Up: 50% per minute
└── Scale-Down: 10% per minute

Load Distribution:
├── Round-robin for stateless services
├── Consistent hashing for cache
├── Geo-routing for multi-region
└── Weighted routing for canary

Service-Specific Scaling

Transaction Service:
- Peak: 10,000 TPS
- Instances: 200 pods
- CPU: 4 cores per pod
- Memory: 8 GB per pod

Fraud Detection Service:
- Peak: 10,000 checks/sec
- Instances: 100 pods
- CPU: 8 cores per pod (ML inference)
- Memory: 16 GB per pod

Notification Service:
- Peak: 50,000 notifications/sec
- Instances: 50 pods
- Queue-based processing
- Async workers

Database Scaling

Sharding Strategy

Shard by user_id:
- Total Shards: 1000
- Calculation: user_id % 1000
- Shard Size: 100K users per shard

Shard Distribution:
Region US-East: Shards 0-249
Region US-West: Shards 250-499
Region EU-West: Shards 500-749
Region APAC: Shards 750-999

Benefits:
- Even distribution
- Predictable routing
- Easy to add shards (re-shard)

Read Replicas

Configuration per Shard:
├── 1 Primary (writes)
├── 3 Replicas (reads)
├── Replication Lag: < 100ms
└── Failover: Automatic

Read Distribution:
- Transaction history: 90% reads → Replicas
- Balance checks: 80% reads → Replicas
- Payment authorization: 100% writes → Primary
- P2P transfers: 100% writes → Primary

Connection Pooling:
- Primary: 100 connections
- Each Replica: 100 connections
- Total per shard: 400 connections

Partitioning

Transactions Table (Time-Based):
├── Partition by month
├── Retention: 90 days hot, 5 years warm
├── Auto-create new partitions
└── Auto-archive old partitions

CREATE TABLE transactions_2026_01 PARTITION OF transactions
    FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');

Benefits:
- Faster queries (scan only relevant partitions)
- Easy archival (drop old partitions)
- Better index performance

Caching Strategy

Multi-Layer Caching

L1: Application Cache (In-Memory)
├── Size: 1 GB per instance
├── TTL: 1 minute
├── Data: User session, hot config
└── Hit Rate: 95%

L2: Redis Cache (Distributed)
├── Size: 384 GB cluster
├── TTL: 5-60 minutes
├── Data: User profile, payment methods, recent transactions
└── Hit Rate: 90%

L3: CDN Cache (Edge)
├── Size: Unlimited
├── TTL: 1 hour - 1 day
├── Data: Static assets, receipts
└── Hit Rate: 99%

Cache Warming

Pre-Warm Strategy:
1. Identify top 10% active users
2. Load their data into cache at startup
3. Refresh every 5 minutes
4. Predictive pre-loading based on patterns

Example:
- Morning commute: Pre-load transit payment methods
- Lunch time: Pre-load restaurant payment history
- Evening: Pre-load grocery store data

Cache Invalidation

Strategies:
1. Write-Through: Update DB → Update Cache
2. Cache-Aside: Read Cache → Miss → Read DB → Write Cache
3. Event-Based: Kafka event → Invalidate cache

Invalidation Events:
- User profile update → user:{user_id}
- Payment method addedpayment_methods:{user_id}
- Transaction completed → transactions:{user_id}:recent
- Balance changed → balance:{user_id}

Message Queue Scaling

Kafka Configuration

Topics:
├── transaction-events (10 partitions)
├── fraud-alerts (5 partitions)
├── notification-events (20 partitions)
└── audit-logs (5 partitions)

Partitioning:
- Key: user_id
- Partitions: 10-20 per topic
- Replication Factor: 3
- Min In-Sync Replicas: 2

Consumer Groups:
- Transaction Service: 10 consumers
- Fraud Detection: 5 consumers
- Notification Service: 20 consumers
- Analytics: 5 consumers

Throughput:
- Peak: 100K messages/sec
- Average: 20K messages/sec
- Retention: 7 days

Queue-Based Processing

SQS Queues:
├── High Priority (Payments): 10K msgs/sec
├── Medium Priority (Notifications): 50K msgs/sec
└── Low Priority (Analytics): 5K msgs/sec

Worker Scaling:
- Auto-scale based on queue depth
- Target: < 1000 messages in queue
- Scale-up: Add 10 workers per 1000 messages
- Scale-down: Remove workers when queue < 100

CDN and Edge Optimization

CloudFront Configuration

Distribution:
├── Origins: 3 regions (US, EU, APAC)
├── Edge Locations: 200+ worldwide
├── Cache Behaviors:
│   ├── Static assets: 1 day TTL
│   ├── Receipts: 1 hour TTL
│   └── API responses: No cache
└── Compression: Gzip, Brotli

Benefits:
- Reduced latency (< 50ms)
- Offload 80% of static traffic
- DDoS protection
- SSL termination at edge

Edge Computing

Lambda@Edge Functions:
├── Authentication validation
├── Request routing (geo-based)
├── Response transformation
└── A/B testing

Use Cases:
- Route US users to US-East
- Route EU users to EU-West
- Validate JWT at edge
- Add security headers

Multi-Region Architecture

Active-Active Deployment

Region Configuration:
US-East (Primary):
├── 40% traffic
├── 100 K8s nodes
├── 250 database shards
└── Full service deployment

US-West (Primary):
├── 30% traffic
├── 100 K8s nodes
├── 250 database shards
└── Full service deployment

EU-West (Primary):
├── 20% traffic
├── 80 K8s nodes
├── 250 database shards
└── Full service deployment

APAC (Primary):
├── 10% traffic
├── 50 K8s nodes
├── 250 database shards
└── Full service deployment

Cross-Region Replication

Database Replication:
├── Async replication between regions
├── Replication lag: < 1 second
├── Conflict resolution: Last-write-wins
└── Failover: Automatic

Data Consistency:
- Strong consistency within region
- Eventual consistency across regions
- Critical data: Synchronous replication
- Non-critical: Asynchronous replication

Geo-Routing

Route 53 Configuration:
├── Geolocation routing
├── Latency-based routing
├── Health checks every 30 seconds
└── Automatic failover

Routing Rules:
- US users → US-East (primary) or US-West (failover)
- EU users → EU-West (primary) or US-East (failover)
- APAC users → APAC (primary) or US-West (failover)

Performance Optimization

Database Query Optimization

Indexing Strategy:
- B-tree indexes for equality/range queries
- Hash indexes for exact matches
- Partial indexes for filtered queries
- Covering indexes to avoid table lookups

Example:
CREATE INDEX idx_transactions_user_date 
ON transactions(user_id, created_at DESC)
WHERE status = 'completed';

Query Optimization:
- Use EXPLAIN ANALYZE
- Avoid N+1 queries
- Batch queries where possible
- Use connection pooling

API Response Optimization

Techniques:
1. Pagination (limit + offset)
2. Field filtering (only return requested fields)
3. Response compression (gzip)
4. HTTP/2 multiplexing
5. GraphQL for flexible queries

Example:
GET /transactions?fields=id,amount,status&limit=20

Response:
{
  "transactions": [
    {"id": "123", "amount": 50.00, "status": "completed"}
  ]
}

Connection Pooling

Database Connection Pool:
├── Min Connections: 10
├── Max Connections: 100
├── Idle Timeout: 10 minutes
├── Max Lifetime: 30 minutes
└── Connection Validation: On borrow

Redis Connection Pool:
├── Min Connections: 5
├── Max Connections: 50
├── Idle Timeout: 5 minutes
└── Connection Validation: On borrow

Monitoring and Observability

Metrics Collection

Application Metrics:
- Request rate (requests/sec)
- Error rate (errors/sec)
- Latency (P50, P95, P99)
- Throughput (transactions/sec)

Infrastructure Metrics:
- CPU utilization
- Memory utilization
- Disk I/O
- Network I/O

Business Metrics:
- Transaction volume
- Transaction value
- Success rate
- Fraud detection rate

Distributed Tracing

Jaeger Configuration:
├── Trace every request
├── Sample rate: 1% (production)
├── Retention: 7 days
└── Span tags: user_id, transaction_id

Trace Example:
API Gateway (50ms)
  → User Service (100ms)
    → Token Vault (50ms)
  → Fraud Detection (300ms)
  → Transaction Service (200ms)
    → Payment Network (1200ms)
  → Notification Service (50ms)
Total: 1950ms

Alerting

Critical Alerts (Page Immediately):
- API error rate > 1%
- Payment success rate < 99%
- Latency P95 > 3 seconds
- Database CPU > 90%
- Fraud detection down

Warning Alerts (Notify Team):
- API error rate > 0.5%
- Payment success rate < 99.5%
- Latency P95 > 2 seconds
- Database CPU > 80%
- Cache hit rate < 85%

Capacity Planning

Traffic Forecasting

Historical Analysis:
- Daily patterns (peak at lunch, evening)
- Weekly patterns (higher on weekends)
- Seasonal patterns (holidays, Black Friday)
- Growth trends (20% YoY)

Capacity Planning:
- Current: 10K TPS
- Peak: 20K TPS (2x buffer)
- Growth: +20% per year
- Plan for: 30K TPS (3x current)

Resource Allocation:
- Compute: 500 instances (current 200)
- Database: 1500 shards (current 1000)
- Cache: 500 GB (current 384 GB)

Cost Optimization

Strategies:
1. Right-sizing instances (use smaller instances)
2. Spot instances for non-critical workloads
3. Reserved instances for baseline capacity
4. Auto-scaling to match demand
5. Data archival to cheaper storage

Cost Breakdown:
- Compute: 40% ($160K/month)
- Database: 20% ($80K/month)
- Network: 15% ($65K/month)
- Third-party: 25% ($100K/month)
Total: $405K/month

Optimization Potential:
- Spot instances: Save 30% on compute
- Reserved instances: Save 40% on baseline
- Data archival: Save 50% on storage
- Total savings: ~$100K/month (25%)

Disaster Recovery

Backup Strategy

Database Backups:
├── Full backup: Daily
├── Incremental backup: Hourly
├── Point-in-time recovery: 5 minutes
├── Retention: 30 days
└── Cross-region replication

Application State:
├── Stateless services (no backup needed)
├── Configuration: Version controlled
├── Secrets: Backed up in Vault
└── Logs: Retained for 90 days

Failover Procedures

Automatic Failover:
1. Health check fails (3 consecutive)
2. Route 53 removes unhealthy endpoint
3. Traffic routed to healthy region
4. Failover time: < 30 seconds

Manual Failover:
1. Incident detected
2. On-call engineer notified
3. Runbook executed
4. Traffic manually shifted
5. Failover time: < 5 minutes

Interview Discussion Points

Q: How handle 10x traffic spike?

Strategy:
1. Auto-scale to 2000 instances (from 200)
2. Increase database connections
3. Add read replicas
4. Enable aggressive caching
5. Rate limit non-critical endpoints
6. Queue non-urgent processing

Q: How optimize database queries?

Techniques:
1. Proper indexing
2. Query optimization (EXPLAIN)
3. Connection pooling
4. Read replicas for reads
5. Caching hot data
6. Partitioning large tables

Q: How ensure low latency globally?

Strategy:
1. Multi-region deployment
2. CDN for static content
3. Edge computing (Lambda@Edge)
4. Geo-routing
5. Local caching
6. Optimized database queries

Estimated Reading Time: 20 minutes