Scaling Considerations

📖 8 min read 📄 Part 6 of 10

Video Conferencing System - Scaling Considerations

WebRTC Connection Scaling

Connection Management Architecture

┌─────────────────────────────────────────────────────────────┐
│                Connection Load Balancer                     │
│  • Sticky Sessions (WebSocket Affinity)                    │
│  • Health Checks & Failover                                │
│  • Geographic Routing                                      │
└─────────────────────┬───────────────────────────────────────┘
                      │
        ┌─────────────┼─────────────┐
        │             │             │
        ▼             ▼             ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Signaling   │ │ Signaling   │ │ Signaling   │
│ Server 1    │ │ Server 2    │ │ Server 3    │
│ (10K conn)  │ │ (10K conn)  │ │ (10K conn)  │
└─────────────┘ └─────────────┘ └─────────────┘

WebSocket Connection Limits

Per Server Limits:

  • Linux: ~65K concurrent connections (file descriptor limit)
  • Practical Limit: 10K connections per server (CPU/memory constraints)
  • Memory Usage: ~8KB per WebSocket connection
  • CPU Usage: ~0.1% per active connection

Scaling Strategy:

// Connection pooling with sticky sessions
const connectionPool = {
    maxConnectionsPerServer: 10000,
    servers: [
        { id: 'server1', connections: 8500, cpu: 65% },
        { id: 'server2', connections: 9200, cpu: 72% },
        { id: 'server3', connections: 7800, cpu: 58% }
    ],
    
    // Route new connections to least loaded server
    routeConnection(userId) {
        const server = this.servers
            .filter(s => s.connections < this.maxConnectionsPerServer)
            .sort((a, b) => a.cpu - b.cpu)[0];
        return server?.id || this.scaleUp();
    }
};

Auto-Scaling WebSocket Servers

# Kubernetes HPA Configuration
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: signaling-server-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: signaling-server
  minReplicas: 10
  maxReplicas: 1000
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Pods
    pods:
      metric:
        name: websocket_connections
      target:
        type: AverageValue
        averageValue: "8000"  # Scale when >8K connections per pod

Media Server Scaling (SFU Architecture)

SFU Server Capacity

Single SFU Server Limits:

  • CPU-bound: ~500 participants (1080p video)
  • Network-bound: ~10 Gbps bandwidth
  • Memory: ~2GB for 500 participants
  • Encoding: Hardware acceleration recommended

Multi-SFU Architecture

Meeting with 2000 participants:

┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│    SFU 1    │    │    SFU 2    │    │    SFU 3    │
│ 500 users   │    │ 500 users   │    │ 500 users   │
└─────────────┘    └─────────────┘    └─────────────┘
        │                  │                  │
        └──────────────────┼──────────────────┘
                           │
                  ┌─────────────┐
                  │ SFU Bridge  │
                  │ (Cascading) │
                  └─────────────┘

SFU Cascading Implementation

class SFUCluster {
    constructor() {
        this.sfuNodes = new Map();
        this.maxParticipantsPerSFU = 500;
    }
    
    async addParticipant(meetingId, participantId) {
        let sfu = this.findOptimalSFU(meetingId);
        
        if (!sfu || sfu.participantCount >= this.maxParticipantsPerSFU) {
            sfu = await this.createNewSFU(meetingId);
            await this.setupCascading(meetingId, sfu);
        }
        
        return sfu.addParticipant(participantId);
    }
    
    async setupCascading(meetingId, newSFU) {
        const existingSFUs = this.getSFUsForMeeting(meetingId);
        
        // Create bridge connections between SFUs
        for (const existingSFU of existingSFUs) {
            await this.createBridge(existingSFU, newSFU);
        }
    }
}

Database Scaling Strategies

PostgreSQL Scaling

Read Replicas:

# Master-Slave Configuration
Master (Write):
  - All INSERT, UPDATE, DELETE operations
  - Meeting creation, user updates
  - Real-time participant state changes

Read Replicas (3x):
  - Meeting history queries
  - User profile lookups
  - Recording metadata
  - Analytics queries

Sharding Strategy:

-- Shard by user_id hash
CREATE TABLE meetings_shard_0 (
    LIKE meetings INCLUDING ALL
) INHERITS (meetings);

CREATE TABLE meetings_shard_1 (
    LIKE meetings INCLUDING ALL  
) INHERITS (meetings);

-- Partition function
CREATE OR REPLACE FUNCTION get_shard_id(user_uuid UUID)
RETURNS INTEGER AS $$
BEGIN
    RETURN abs(hashtext(user_uuid::text)) % 16;
END;
$$ LANGUAGE plpgsql;

Redis Cluster Scaling

# Redis Cluster Configuration
Cluster Nodes: 6 (3 masters, 3 slaves)
Hash Slots: 16384 (distributed across masters)
Memory per Node: 32GB
Replication Factor: 1 (each master has 1 slave)

# Session Distribution
Master 1 (slots 0-5460):     ~33% of sessions
Master 2 (slots 5461-10922): ~33% of sessions  
Master 3 (slots 10923-16383): ~34% of sessions

# Failover: Automatic promotion of slaves to masters

Cassandra Scaling

# Multi-datacenter deployment
Datacenter US-East:
  - 9 nodes (RF=3)
  - Handles US traffic
  
Datacenter EU-West:
  - 9 nodes (RF=3)  
  - Handles EU traffic
  
Datacenter Asia-Pacific:
  - 6 nodes (RF=3)
  - Handles APAC traffic

# Cross-DC replication for analytics
Consistency Level: LOCAL_QUORUM (within DC)
Cross-DC Sync: Eventual consistency

CDN and Edge Computing

Global CDN Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Global CDN Network                       │
├─────────────────┬─────────────────┬─────────────────────────┤
│   US Regions    │   EU Regions    │   APAC Regions          │
│                 │                 │                         │
│ • us-east-1     │ • eu-west-1     │ • ap-southeast-1        │
│ • us-west-2     │ • eu-central-1  │ • ap-northeast-1        │
│ • us-central-1  │ • eu-north-1    │ • ap-south-1            │
└─────────────────┴─────────────────┴─────────────────────────┘

Edge Functions for Media Optimization

// CloudFlare Worker for WebRTC optimization
addEventListener('fetch', event => {
    if (event.request.url.includes('/webrtc/offer')) {
        event.respondWith(handleWebRTCOffer(event.request));
    }
});

async function handleWebRTCOffer(request) {
    const offer = await request.json();
    
    // Optimize SDP for regional servers
    const optimizedOffer = await optimizeSDPForRegion(
        offer, 
        request.cf.country
    );
    
    // Route to nearest media server
    const mediaServer = await findNearestMediaServer(
        request.cf.colo
    );
    
    return fetch(mediaServer.url + '/offer', {
        method: 'POST',
        body: JSON.stringify(optimizedOffer)
    });
}

Auto-Scaling Policies

Meeting-Based Scaling

# Scale based on active meetings
Metrics:
  - active_meetings_count
  - average_participants_per_meeting
  - cpu_utilization
  - memory_utilization

Scaling Rules:
  Scale Up When:
    - active_meetings > 1000 AND cpu > 70%
    - average_participants > 50 AND memory > 80%
    - queue_depth > 100 (waiting to join)
  
  Scale Down When:
    - active_meetings < 500 AND cpu < 30%
    - average_participants < 20 AND memory < 40%
    - No scale-down during business hours (9 AM - 6 PM)

Predictive Scaling

import pandas as pd
from sklearn.linear_model import LinearRegression

class MeetingDemandPredictor:
    def __init__(self):
        self.model = LinearRegression()
        
    def predict_demand(self, timestamp):
        # Features: hour, day_of_week, month, historical_avg
        features = self.extract_features(timestamp)
        predicted_meetings = self.model.predict([features])[0]
        
        # Pre-scale infrastructure 15 minutes before predicted peak
        required_capacity = self.calculate_capacity(predicted_meetings)
        return required_capacity
    
    def calculate_capacity(self, predicted_meetings):
        # Each meeting server handles ~200 concurrent meetings
        meeting_servers = math.ceil(predicted_meetings / 200)
        
        # Each media server handles ~50 concurrent meetings  
        media_servers = math.ceil(predicted_meetings / 50)
        
        return {
            'meeting_servers': meeting_servers,
            'media_servers': media_servers,
            'scale_at': datetime.now() + timedelta(minutes=15)
        }

Network Optimization

TURN Server Scaling

# Global TURN Server Deployment
Regions: 20+ worldwide
Servers per Region: 5-10 (based on user density)
Capacity per Server: 10,000 concurrent relays
Total Capacity: 1M+ concurrent TURN sessions

# TURN Server Selection Algorithm
Selection Criteria:
  1. Geographic proximity (lowest latency)
  2. Server load (CPU/bandwidth utilization)
  3. Network path quality (packet loss, jitter)
  4. Cost optimization (prefer direct P2P when possible)

Bandwidth Optimization

class AdaptiveBitrateController {
    constructor() {
        this.qualityLevels = [
            { name: '4K',    width: 3840, height: 2160, bitrate: 8000000 },
            { name: '1080p', width: 1920, height: 1080, bitrate: 4000000 },
            { name: '720p',  width: 1280, height: 720,  bitrate: 2000000 },
            { name: '480p',  width: 854,  height: 480,  bitrate: 1000000 },
            { name: '240p',  width: 426,  height: 240,  bitrate: 500000 }
        ];
    }
    
    adaptQuality(networkStats) {
        const { bandwidth, packetLoss, rtt } = networkStats;
        
        // Downgrade if network conditions are poor
        if (packetLoss > 5 || rtt > 200) {
            return this.qualityLevels[4]; // 240p
        }
        
        // Select quality based on available bandwidth
        for (const quality of this.qualityLevels) {
            if (bandwidth * 0.8 >= quality.bitrate) { // 80% safety margin
                return quality;
            }
        }
        
        return this.qualityLevels[4]; // Fallback to lowest quality
    }
}

Storage Scaling

Recording Storage Architecture

# Tiered Storage Strategy
Hot Storage (S3 Standard):
  - Duration: 30 days
  - Use Case: Recent recordings, frequent access
  - Cost: $0.023/GB/month
  
Warm Storage (S3 IA):
  - Duration: 31-365 days  
  - Use Case: Occasional access
  - Cost: $0.0125/GB/month
  
Cold Storage (S3 Glacier):
  - Duration: 1-7 years
  - Use Case: Compliance, archival
  - Cost: $0.004/GB/month
  
Archive (S3 Deep Archive):
  - Duration: 7+ years
  - Use Case: Long-term retention
  - Cost: $0.00099/GB/month

Storage Optimization

class RecordingStorageManager:
    def __init__(self):
        self.compression_ratios = {
            'h264_high': 0.1,    # 10:1 compression
            'h264_medium': 0.15, # 6.7:1 compression  
            'h265_high': 0.05,   # 20:1 compression
            'av1_high': 0.03     # 33:1 compression
        }
    
    def optimize_storage(self, recording):
        # Choose codec based on content type
        if recording.content_type == 'screen_share':
            codec = 'h265_high'  # Better for screen content
        elif recording.duration > 3600:  # >1 hour
            codec = 'av1_high'   # Best compression for long recordings
        else:
            codec = 'h264_high'  # Fast encoding for short recordings
            
        compressed_size = recording.raw_size * self.compression_ratios[codec]
        
        return {
            'codec': codec,
            'compressed_size': compressed_size,
            'storage_tier': self.select_storage_tier(recording.access_pattern)
        }

Performance Monitoring

Real-time Metrics

# Key Performance Indicators
Connection Metrics:
  - websocket_connections_active
  - webrtc_peer_connections_active
  - connection_establishment_time_p95
  - connection_failure_rate

Media Quality Metrics:
  - video_bitrate_avg
  - audio_packet_loss_p95
  - video_frame_rate_avg
  - end_to_end_latency_p95

Server Performance:
  - cpu_utilization_avg
  - memory_utilization_avg
  - network_bandwidth_utilization
  - disk_io_utilization

Business Metrics:
  - meetings_active_count
  - participants_total_count
  - meeting_join_success_rate
  - user_satisfaction_score

Alerting Thresholds

Critical Alerts (PagerDuty):
  - connection_failure_rate > 5%
  - meeting_join_success_rate < 95%
  - end_to_end_latency_p95 > 500ms
  - service_availability < 99.9%

Warning Alerts (Slack):
  - cpu_utilization > 80%
  - memory_utilization > 85%
  - video_packet_loss_p95 > 3%
  - storage_utilization > 90%

Capacity Alerts:
  - websocket_connections > 80% of capacity
  - media_servers_utilization > 75%
  - database_connections > 70% of pool
  - cdn_bandwidth > 80% of limit

Cost Optimization

Resource Right-Sizing

class CostOptimizer:
    def analyze_usage_patterns(self, timeframe_days=30):
        usage_data = self.get_usage_metrics(timeframe_days)
        
        recommendations = []
        
        # Identify over-provisioned servers
        for server in usage_data['servers']:
            if server['avg_cpu'] < 30 and server['avg_memory'] < 40:
                recommendations.append({
                    'type': 'downsize',
                    'server_id': server['id'],
                    'current_type': server['instance_type'],
                    'recommended_type': self.get_smaller_instance(server['instance_type']),
                    'monthly_savings': server['monthly_cost'] * 0.5
                })
        
        # Identify underutilized regions
        for region in usage_data['regions']:
            if region['avg_utilization'] < 20:
                recommendations.append({
                    'type': 'consolidate',
                    'region': region['name'],
                    'monthly_savings': region['monthly_cost'] * 0.8
                })
        
        return recommendations

Reserved Instance Strategy

# 3-Year Reserved Instance Plan
Compute (EC2):
  - 70% Reserved Instances (3-year term)
  - 20% Spot Instances (non-critical workloads)
  - 10% On-Demand (peak scaling)
  
Storage (S3):
  - Intelligent Tiering enabled
  - Lifecycle policies for automatic archival
  - Cross-region replication optimization
  
Network:
  - CloudFront reserved capacity
  - Direct Connect for high-volume regions
  - Data transfer optimization