Digital Wallet - Scaling Considerations
Reading Time: 20 minutes
Horizontal Scaling
Application Tier Scaling
Auto-Scaling Configuration:
├── Min Instances: 10
├── Max Instances: 500
├── Target CPU: 70%
├── Target Memory: 80%
├── Scale-Up: 50% per minute
└── Scale-Down: 10% per minute
Load Distribution:
├── Round-robin for stateless services
├── Consistent hashing for cache
├── Geo-routing for multi-region
└── Weighted routing for canaryService-Specific Scaling
Transaction Service:
- Peak: 10,000 TPS
- Instances: 200 pods
- CPU: 4 cores per pod
- Memory: 8 GB per pod
Fraud Detection Service:
- Peak: 10,000 checks/sec
- Instances: 100 pods
- CPU: 8 cores per pod (ML inference)
- Memory: 16 GB per pod
Notification Service:
- Peak: 50,000 notifications/sec
- Instances: 50 pods
- Queue-based processing
- Async workersDatabase Scaling
Sharding Strategy
Shard by user_id:
- Total Shards: 1000
- Calculation: user_id % 1000
- Shard Size: 100K users per shard
Shard Distribution:
Region US-East: Shards 0-249
Region US-West: Shards 250-499
Region EU-West: Shards 500-749
Region APAC: Shards 750-999
Benefits:
- Even distribution
- Predictable routing
- Easy to add shards (re-shard)Read Replicas
Configuration per Shard:
├── 1 Primary (writes)
├── 3 Replicas (reads)
├── Replication Lag: < 100ms
└── Failover: Automatic
Read Distribution:
- Transaction history: 90% reads → Replicas
- Balance checks: 80% reads → Replicas
- Payment authorization: 100% writes → Primary
- P2P transfers: 100% writes → Primary
Connection Pooling:
- Primary: 100 connections
- Each Replica: 100 connections
- Total per shard: 400 connectionsPartitioning
Transactions Table (Time-Based):
├── Partition by month
├── Retention: 90 days hot, 5 years warm
├── Auto-create new partitions
└── Auto-archive old partitions
CREATE TABLE transactions_2026_01 PARTITION OF transactions
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
Benefits:
- Faster queries (scan only relevant partitions)
- Easy archival (drop old partitions)
- Better index performanceCaching Strategy
Multi-Layer Caching
L1: Application Cache (In-Memory)
├── Size: 1 GB per instance
├── TTL: 1 minute
├── Data: User session, hot config
└── Hit Rate: 95%
L2: Redis Cache (Distributed)
├── Size: 384 GB cluster
├── TTL: 5-60 minutes
├── Data: User profile, payment methods, recent transactions
└── Hit Rate: 90%
L3: CDN Cache (Edge)
├── Size: Unlimited
├── TTL: 1 hour - 1 day
├── Data: Static assets, receipts
└── Hit Rate: 99%Cache Warming
Pre-Warm Strategy:
1. Identify top 10% active users
2. Load their data into cache at startup
3. Refresh every 5 minutes
4. Predictive pre-loading based on patterns
Example:
- Morning commute: Pre-load transit payment methods
- Lunch time: Pre-load restaurant payment history
- Evening: Pre-load grocery store dataCache Invalidation
Strategies:
1. Write-Through: Update DB → Update Cache
2. Cache-Aside: Read Cache → Miss → Read DB → Write Cache
3. Event-Based: Kafka event → Invalidate cache
Invalidation Events:
- User profile update → user:{user_id}
- Payment method added → payment_methods:{user_id}
- Transaction completed → transactions:{user_id}:recent
- Balance changed → balance:{user_id}Message Queue Scaling
Kafka Configuration
Topics:
├── transaction-events (10 partitions)
├── fraud-alerts (5 partitions)
├── notification-events (20 partitions)
└── audit-logs (5 partitions)
Partitioning:
- Key: user_id
- Partitions: 10-20 per topic
- Replication Factor: 3
- Min In-Sync Replicas: 2
Consumer Groups:
- Transaction Service: 10 consumers
- Fraud Detection: 5 consumers
- Notification Service: 20 consumers
- Analytics: 5 consumers
Throughput:
- Peak: 100K messages/sec
- Average: 20K messages/sec
- Retention: 7 daysQueue-Based Processing
SQS Queues:
├── High Priority (Payments): 10K msgs/sec
├── Medium Priority (Notifications): 50K msgs/sec
└── Low Priority (Analytics): 5K msgs/sec
Worker Scaling:
- Auto-scale based on queue depth
- Target: < 1000 messages in queue
- Scale-up: Add 10 workers per 1000 messages
- Scale-down: Remove workers when queue < 100CDN and Edge Optimization
CloudFront Configuration
Distribution:
├── Origins: 3 regions (US, EU, APAC)
├── Edge Locations: 200+ worldwide
├── Cache Behaviors:
│ ├── Static assets: 1 day TTL
│ ├── Receipts: 1 hour TTL
│ └── API responses: No cache
└── Compression: Gzip, Brotli
Benefits:
- Reduced latency (< 50ms)
- Offload 80% of static traffic
- DDoS protection
- SSL termination at edgeEdge Computing
Lambda@Edge Functions:
├── Authentication validation
├── Request routing (geo-based)
├── Response transformation
└── A/B testing
Use Cases:
- Route US users to US-East
- Route EU users to EU-West
- Validate JWT at edge
- Add security headersMulti-Region Architecture
Active-Active Deployment
Region Configuration:
US-East (Primary):
├── 40% traffic
├── 100 K8s nodes
├── 250 database shards
└── Full service deployment
US-West (Primary):
├── 30% traffic
├── 100 K8s nodes
├── 250 database shards
└── Full service deployment
EU-West (Primary):
├── 20% traffic
├── 80 K8s nodes
├── 250 database shards
└── Full service deployment
APAC (Primary):
├── 10% traffic
├── 50 K8s nodes
├── 250 database shards
└── Full service deploymentCross-Region Replication
Database Replication:
├── Async replication between regions
├── Replication lag: < 1 second
├── Conflict resolution: Last-write-wins
└── Failover: Automatic
Data Consistency:
- Strong consistency within region
- Eventual consistency across regions
- Critical data: Synchronous replication
- Non-critical: Asynchronous replicationGeo-Routing
Route 53 Configuration:
├── Geolocation routing
├── Latency-based routing
├── Health checks every 30 seconds
└── Automatic failover
Routing Rules:
- US users → US-East (primary) or US-West (failover)
- EU users → EU-West (primary) or US-East (failover)
- APAC users → APAC (primary) or US-West (failover)Performance Optimization
Database Query Optimization
Indexing Strategy:
- B-tree indexes for equality/range queries
- Hash indexes for exact matches
- Partial indexes for filtered queries
- Covering indexes to avoid table lookups
Example:
CREATE INDEX idx_transactions_user_date
ON transactions(user_id, created_at DESC)
WHERE status = 'completed';
Query Optimization:
- Use EXPLAIN ANALYZE
- Avoid N+1 queries
- Batch queries where possible
- Use connection poolingAPI Response Optimization
Techniques:
1. Pagination (limit + offset)
2. Field filtering (only return requested fields)
3. Response compression (gzip)
4. HTTP/2 multiplexing
5. GraphQL for flexible queries
Example:
GET /transactions?fields=id,amount,status&limit=20
Response:
{
"transactions": [
{"id": "123", "amount": 50.00, "status": "completed"}
]
}Connection Pooling
Database Connection Pool:
├── Min Connections: 10
├── Max Connections: 100
├── Idle Timeout: 10 minutes
├── Max Lifetime: 30 minutes
└── Connection Validation: On borrow
Redis Connection Pool:
├── Min Connections: 5
├── Max Connections: 50
├── Idle Timeout: 5 minutes
└── Connection Validation: On borrowMonitoring and Observability
Metrics Collection
Application Metrics:
- Request rate (requests/sec)
- Error rate (errors/sec)
- Latency (P50, P95, P99)
- Throughput (transactions/sec)
Infrastructure Metrics:
- CPU utilization
- Memory utilization
- Disk I/O
- Network I/O
Business Metrics:
- Transaction volume
- Transaction value
- Success rate
- Fraud detection rateDistributed Tracing
Jaeger Configuration:
├── Trace every request
├── Sample rate: 1% (production)
├── Retention: 7 days
└── Span tags: user_id, transaction_id
Trace Example:
API Gateway (50ms)
→ User Service (100ms)
→ Token Vault (50ms)
→ Fraud Detection (300ms)
→ Transaction Service (200ms)
→ Payment Network (1200ms)
→ Notification Service (50ms)
Total: 1950msAlerting
Critical Alerts (Page Immediately):
- API error rate > 1%
- Payment success rate < 99%
- Latency P95 > 3 seconds
- Database CPU > 90%
- Fraud detection down
Warning Alerts (Notify Team):
- API error rate > 0.5%
- Payment success rate < 99.5%
- Latency P95 > 2 seconds
- Database CPU > 80%
- Cache hit rate < 85%Capacity Planning
Traffic Forecasting
Historical Analysis:
- Daily patterns (peak at lunch, evening)
- Weekly patterns (higher on weekends)
- Seasonal patterns (holidays, Black Friday)
- Growth trends (20% YoY)
Capacity Planning:
- Current: 10K TPS
- Peak: 20K TPS (2x buffer)
- Growth: +20% per year
- Plan for: 30K TPS (3x current)
Resource Allocation:
- Compute: 500 instances (current 200)
- Database: 1500 shards (current 1000)
- Cache: 500 GB (current 384 GB)Cost Optimization
Strategies:
1. Right-sizing instances (use smaller instances)
2. Spot instances for non-critical workloads
3. Reserved instances for baseline capacity
4. Auto-scaling to match demand
5. Data archival to cheaper storage
Cost Breakdown:
- Compute: 40% ($160K/month)
- Database: 20% ($80K/month)
- Network: 15% ($65K/month)
- Third-party: 25% ($100K/month)
Total: $405K/month
Optimization Potential:
- Spot instances: Save 30% on compute
- Reserved instances: Save 40% on baseline
- Data archival: Save 50% on storage
- Total savings: ~$100K/month (25%)Disaster Recovery
Backup Strategy
Database Backups:
├── Full backup: Daily
├── Incremental backup: Hourly
├── Point-in-time recovery: 5 minutes
├── Retention: 30 days
└── Cross-region replication
Application State:
├── Stateless services (no backup needed)
├── Configuration: Version controlled
├── Secrets: Backed up in Vault
└── Logs: Retained for 90 daysFailover Procedures
Automatic Failover:
1. Health check fails (3 consecutive)
2. Route 53 removes unhealthy endpoint
3. Traffic routed to healthy region
4. Failover time: < 30 seconds
Manual Failover:
1. Incident detected
2. On-call engineer notified
3. Runbook executed
4. Traffic manually shifted
5. Failover time: < 5 minutesInterview Discussion Points
Q: How handle 10x traffic spike?
Strategy:
1. Auto-scale to 2000 instances (from 200)
2. Increase database connections
3. Add read replicas
4. Enable aggressive caching
5. Rate limit non-critical endpoints
6. Queue non-urgent processingQ: How optimize database queries?
Techniques:
1. Proper indexing
2. Query optimization (EXPLAIN)
3. Connection pooling
4. Read replicas for reads
5. Caching hot data
6. Partitioning large tablesQ: How ensure low latency globally?
Strategy:
1. Multi-region deployment
2. CDN for static content
3. Edge computing (Lambda@Edge)
4. Geo-routing
5. Local caching
6. Optimized database queriesEstimated Reading Time: 20 minutes