Scaling Considerations

📖 6 min read 📄 Part 6 of 10

Stock Trading Platform - Scaling Considerations

Vertical Scaling (Performance Optimization)

Hardware Optimization

CPU:
- High-frequency processors (5+ GHz)
- Large L3 cache (30+ MB)
- CPU pinning for critical threads
- NUMA-aware memory allocation

Memory:
- Low-latency RAM (DDR5)
- Huge pages (2 MB)
- Pre-allocated memory pools
- Cache-line alignment

Network:
- 100 Gbps NICs
- Kernel bypass (DPDK)
- RDMA for inter-server communication
- Direct exchange connections

Storage:
- NVMe SSDs (1M IOPS)
- Battery-backed write cache
- RAID 10 for redundancy
- Separate drives for WAL

Software Optimization

Lock-Free Programming:
- Atomic operations
- Compare-and-swap (CAS)
- Memory barriers
- Lock-free queues and skip lists

Zero-Copy:
- Avoid memory copies
- Direct buffer access
- Memory-mapped files
- Shared memory IPC

Kernel Bypass:
- DPDK for networking
- SPDK for storage
- Polling instead of interrupts
- Dedicated CPU cores

Compiler Optimization:
- Profile-guided optimization (PGO)
- Link-time optimization (LTO)
- CPU-specific instructions (AVX-512)
- Inline critical functions

Horizontal Scaling

Matching Engine Scaling

Sharding by Symbol:
- Each symbol on dedicated matching engine
- 10,000 symbols / 100 servers = 100 symbols per server
- Hot symbols get dedicated servers

Load Distribution:
- AAPL, GOOGL, MSFT: Dedicated servers
- Medium-volume stocks: 10 per server
- Low-volume stocks: 100 per server

Benefits:
- Independent scaling
- No cross-symbol contention
- Isolated failures
- Easy to add capacity

Market Data Scaling

Multicast Distribution:
- Single send reaches all subscribers
- No per-subscriber overhead
- 1 Gbps → 10 Gbps with multicast

Regional Distribution:
- Edge servers in major cities
- <10ms latency to users
- Reduce backbone traffic
- Local caching

Compression:
- Delta encoding (price changes)
- Dictionary compression
- 10x bandwidth reduction

Co-Location Strategy

Exchange Co-Location

Physical Setup:
- Servers in exchange data center
- <50 microsecond network latency
- Direct fiber connections
- Redundant power and cooling

Benefits:
- Minimize latency
- Reliable connectivity
- Competitive advantage
- Reduced network costs

Cost:
- $10K-50K per rack per month
- Worth it for HFT

Cross-Connect

Direct Connections:
- Fiber cross-connect to exchange
- Bypass internet routing
- Dedicated bandwidth
- Guaranteed latency

Performance:
- Internet: 10-50ms
- Cross-connect: <1ms
- Improvement: 10-50x faster

Latency Optimization

Network Latency

Techniques:
1. Co-location: 50 microseconds
2. Kernel bypass (DPDK): 10 microseconds
3. RDMA: 5 microseconds
4. Custom FPGA: 1 microsecond

Total: <100 microseconds

Processing Latency

Techniques:
1. Lock-free data structures: No contention
2. CPU pinning: No context switches
3. Huge pages: Fewer TLB misses
4. Pre-allocated memory: No malloc overhead

Matching Engine:
- Order validation: 20 μs
- Risk check: 100 μs
- Matching: 500 μs
- Total: <1 millisecond

Database Latency

Techniques:
1. In-memory processing: No disk I/O
2. Async writes: Don't block
3. Batch commits: Reduce overhead
4. Write-ahead log: Sequential writes

Write Latency:
- Sync write: 10ms (unacceptable)
- Async write: 200 μs (acceptable)
- Improvement: 50x faster

High Availability

Active-Active Architecture

Primary Site:
- Handles 100% of traffic
- Real-time matching
- Sub-millisecond latency

Secondary Site:
- Hot standby
- Receives all events
- Ready in <100ms

Failover:
1. Detect failure: <10ms
2. Promote secondary: <50ms
3. Redirect traffic: <40ms
Total: <100ms

State Replication

Replication Strategy:
- Snapshot every 1 second
- Event log for replay
- Async replication

Recovery:
1. Load snapshot: <500ms
2. Replay events: <500ms
Total: <1 second

Data Loss: Zero (RPO = 0)

Load Balancing

Order Gateway Load Balancing

Strategy: Consistent Hashing

Benefits:
- Session affinity
- Even distribution
- Minimal disruption on scale

Implementation:
hash(user_id) % num_servers = server_id

Scaling:
- Add server: Rebalance 1/N traffic
- Remove server: Redistribute to others

Market Data Load Balancing

Strategy: Multicast + Regional Servers

Distribution:
- Exchange → Multicast → Regional Servers → Users
- No single bottleneck
- Linear scaling

Capacity:
- 1 server: 10K users
- 100 servers: 1M users
- Easy to add capacity

Caching Strategy

Multi-Level Cache

L1 (CPU Cache):
- Order book top-of-book
- 64 bytes per symbol
- <1 nanosecond access

L2 (Application Memory):
- Full order book
- 1 MB per symbol
- <100 nanoseconds access

L3 (Redis):
- User sessions
- Account balances
- <1 millisecond access

Benefits:
- Minimize latency
- Reduce database load
- Improve throughput

Monitoring and Auto-Scaling

Real-Time Monitoring

Metrics:
- Order latency (p50, p95, p99, p999)
- Matching engine throughput
- Market data latency
- Error rates

Alerting:
- Latency p99 > 1ms: Warning
- Latency p99 > 5ms: Critical
- Error rate > 0.01%: Critical
- Failover event: Critical

Dashboard:
- Real-time latency graphs
- Throughput charts
- Error rate trends
- System health

Capacity Planning

Metrics:
- CPU utilization
- Memory usage
- Network bandwidth
- Disk I/O

Thresholds:
- CPU > 70%: Add capacity
- Memory > 80%: Add capacity
- Network > 80%: Add capacity

Scaling:
- Vertical: Upgrade hardware
- Horizontal: Add servers
- Lead time: 1-2 weeks

Cost Optimization

Hardware Costs

Matching Engines:
- 100 servers × $50K = $5M
- Amortized over 3 years: $1.67M/year

Co-Location:
- 10 racks × $30K/month = $300K/month = $3.6M/year

Network:
- Cross-connects: $500K/year
- Bandwidth: $1M/year

Total: $6.77M/year

Cost per Trade

Infrastructure: $6.77M / 50M trades/day / 252 days = $0.0054 per trade
Revenue: $2 per trade (commission)
Gross Margin: $1.9946 per trade (99.73%)

Optimization Strategies

1. Efficient Hardware:
   - Buy vs lease analysis
   - Bulk purchasing discounts
   - Refurbished equipment

2. Software Optimization:
   - Reduce CPU usage
   - Optimize memory
   - Minimize network traffic

3. Operational Efficiency:
   - Automate operations
   - Reduce manual intervention
   - Predictive maintenance

Disaster Recovery

Backup Strategy

Hot Backup:
- Real-time replication
- Secondary site ready
- <100ms failover

Warm Backup:
- Hourly snapshots
- 4-hour recovery
- Cost-effective

Cold Backup:
- Daily backups
- 24-hour recovery
- Regulatory compliance

Recovery Procedures

Scenario 1: Server Failure
- Detection: <10ms
- Failover: Automatic
- Recovery: <100ms

Scenario 2: Data Center Failure
- Detection: <100ms
- Failover: Automatic
- Recovery: <1 second

Scenario 3: Regional Failure
- Detection: <1 second
- Failover: Manual
- Recovery: <5 minutes

This comprehensive scaling strategy enables the trading platform to handle millions of orders per second with microsecond latencies through vertical optimization, horizontal scaling, and strategic co-location.