Video Conferencing System - Scaling Considerations
WebRTC Connection Scaling
Connection Management Architecture
┌─────────────────────────────────────────────────────────────┐
│ Connection Load Balancer │
│ • Sticky Sessions (WebSocket Affinity) │
│ • Health Checks & Failover │
│ • Geographic Routing │
└─────────────────────┬───────────────────────────────────────┘
│
┌─────────────┼─────────────┐
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Signaling │ │ Signaling │ │ Signaling │
│ Server 1 │ │ Server 2 │ │ Server 3 │
│ (10K conn) │ │ (10K conn) │ │ (10K conn) │
└─────────────┘ └─────────────┘ └─────────────┘WebSocket Connection Limits
Per Server Limits:
- Linux: ~65K concurrent connections (file descriptor limit)
- Practical Limit: 10K connections per server (CPU/memory constraints)
- Memory Usage: ~8KB per WebSocket connection
- CPU Usage: ~0.1% per active connection
Scaling Strategy:
// Connection pooling with sticky sessions
const connectionPool = {
maxConnectionsPerServer: 10000,
servers: [
{ id: 'server1', connections: 8500, cpu: 65% },
{ id: 'server2', connections: 9200, cpu: 72% },
{ id: 'server3', connections: 7800, cpu: 58% }
],
// Route new connections to least loaded server
routeConnection(userId) {
const server = this.servers
.filter(s => s.connections < this.maxConnectionsPerServer)
.sort((a, b) => a.cpu - b.cpu)[0];
return server?.id || this.scaleUp();
}
};Auto-Scaling WebSocket Servers
# Kubernetes HPA Configuration
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: signaling-server-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: signaling-server
minReplicas: 10
maxReplicas: 1000
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: websocket_connections
target:
type: AverageValue
averageValue: "8000" # Scale when >8K connections per podMedia Server Scaling (SFU Architecture)
SFU Server Capacity
Single SFU Server Limits:
- CPU-bound: ~500 participants (1080p video)
- Network-bound: ~10 Gbps bandwidth
- Memory: ~2GB for 500 participants
- Encoding: Hardware acceleration recommended
Multi-SFU Architecture
Meeting with 2000 participants:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ SFU 1 │ │ SFU 2 │ │ SFU 3 │
│ 500 users │ │ 500 users │ │ 500 users │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
└──────────────────┼──────────────────┘
│
┌─────────────┐
│ SFU Bridge │
│ (Cascading) │
└─────────────┘SFU Cascading Implementation
class SFUCluster {
constructor() {
this.sfuNodes = new Map();
this.maxParticipantsPerSFU = 500;
}
async addParticipant(meetingId, participantId) {
let sfu = this.findOptimalSFU(meetingId);
if (!sfu || sfu.participantCount >= this.maxParticipantsPerSFU) {
sfu = await this.createNewSFU(meetingId);
await this.setupCascading(meetingId, sfu);
}
return sfu.addParticipant(participantId);
}
async setupCascading(meetingId, newSFU) {
const existingSFUs = this.getSFUsForMeeting(meetingId);
// Create bridge connections between SFUs
for (const existingSFU of existingSFUs) {
await this.createBridge(existingSFU, newSFU);
}
}
}Database Scaling Strategies
PostgreSQL Scaling
Read Replicas:
# Master-Slave Configuration
Master (Write):
- All INSERT, UPDATE, DELETE operations
- Meeting creation, user updates
- Real-time participant state changes
Read Replicas (3x):
- Meeting history queries
- User profile lookups
- Recording metadata
- Analytics queriesSharding Strategy:
-- Shard by user_id hash
CREATE TABLE meetings_shard_0 (
LIKE meetings INCLUDING ALL
) INHERITS (meetings);
CREATE TABLE meetings_shard_1 (
LIKE meetings INCLUDING ALL
) INHERITS (meetings);
-- Partition function
CREATE OR REPLACE FUNCTION get_shard_id(user_uuid UUID)
RETURNS INTEGER AS $$
BEGIN
RETURN abs(hashtext(user_uuid::text)) % 16;
END;
$$ LANGUAGE plpgsql;Redis Cluster Scaling
# Redis Cluster Configuration
Cluster Nodes: 6 (3 masters, 3 slaves)
Hash Slots: 16384 (distributed across masters)
Memory per Node: 32GB
Replication Factor: 1 (each master has 1 slave)
# Session Distribution
Master 1 (slots 0-5460): ~33% of sessions
Master 2 (slots 5461-10922): ~33% of sessions
Master 3 (slots 10923-16383): ~34% of sessions
# Failover: Automatic promotion of slaves to mastersCassandra Scaling
# Multi-datacenter deployment
Datacenter US-East:
- 9 nodes (RF=3)
- Handles US traffic
Datacenter EU-West:
- 9 nodes (RF=3)
- Handles EU traffic
Datacenter Asia-Pacific:
- 6 nodes (RF=3)
- Handles APAC traffic
# Cross-DC replication for analytics
Consistency Level: LOCAL_QUORUM (within DC)
Cross-DC Sync: Eventual consistencyCDN and Edge Computing
Global CDN Architecture
┌─────────────────────────────────────────────────────────────┐
│ Global CDN Network │
├─────────────────┬─────────────────┬─────────────────────────┤
│ US Regions │ EU Regions │ APAC Regions │
│ │ │ │
│ • us-east-1 │ • eu-west-1 │ • ap-southeast-1 │
│ • us-west-2 │ • eu-central-1 │ • ap-northeast-1 │
│ • us-central-1 │ • eu-north-1 │ • ap-south-1 │
└─────────────────┴─────────────────┴─────────────────────────┘Edge Functions for Media Optimization
// CloudFlare Worker for WebRTC optimization
addEventListener('fetch', event => {
if (event.request.url.includes('/webrtc/offer')) {
event.respondWith(handleWebRTCOffer(event.request));
}
});
async function handleWebRTCOffer(request) {
const offer = await request.json();
// Optimize SDP for regional servers
const optimizedOffer = await optimizeSDPForRegion(
offer,
request.cf.country
);
// Route to nearest media server
const mediaServer = await findNearestMediaServer(
request.cf.colo
);
return fetch(mediaServer.url + '/offer', {
method: 'POST',
body: JSON.stringify(optimizedOffer)
});
}Auto-Scaling Policies
Meeting-Based Scaling
# Scale based on active meetings
Metrics:
- active_meetings_count
- average_participants_per_meeting
- cpu_utilization
- memory_utilization
Scaling Rules:
Scale Up When:
- active_meetings > 1000 AND cpu > 70%
- average_participants > 50 AND memory > 80%
- queue_depth > 100 (waiting to join)
Scale Down When:
- active_meetings < 500 AND cpu < 30%
- average_participants < 20 AND memory < 40%
- No scale-down during business hours (9 AM - 6 PM)Predictive Scaling
import pandas as pd
from sklearn.linear_model import LinearRegression
class MeetingDemandPredictor:
def __init__(self):
self.model = LinearRegression()
def predict_demand(self, timestamp):
# Features: hour, day_of_week, month, historical_avg
features = self.extract_features(timestamp)
predicted_meetings = self.model.predict([features])[0]
# Pre-scale infrastructure 15 minutes before predicted peak
required_capacity = self.calculate_capacity(predicted_meetings)
return required_capacity
def calculate_capacity(self, predicted_meetings):
# Each meeting server handles ~200 concurrent meetings
meeting_servers = math.ceil(predicted_meetings / 200)
# Each media server handles ~50 concurrent meetings
media_servers = math.ceil(predicted_meetings / 50)
return {
'meeting_servers': meeting_servers,
'media_servers': media_servers,
'scale_at': datetime.now() + timedelta(minutes=15)
}Network Optimization
TURN Server Scaling
# Global TURN Server Deployment
Regions: 20+ worldwide
Servers per Region: 5-10 (based on user density)
Capacity per Server: 10,000 concurrent relays
Total Capacity: 1M+ concurrent TURN sessions
# TURN Server Selection Algorithm
Selection Criteria:
1. Geographic proximity (lowest latency)
2. Server load (CPU/bandwidth utilization)
3. Network path quality (packet loss, jitter)
4. Cost optimization (prefer direct P2P when possible)Bandwidth Optimization
class AdaptiveBitrateController {
constructor() {
this.qualityLevels = [
{ name: '4K', width: 3840, height: 2160, bitrate: 8000000 },
{ name: '1080p', width: 1920, height: 1080, bitrate: 4000000 },
{ name: '720p', width: 1280, height: 720, bitrate: 2000000 },
{ name: '480p', width: 854, height: 480, bitrate: 1000000 },
{ name: '240p', width: 426, height: 240, bitrate: 500000 }
];
}
adaptQuality(networkStats) {
const { bandwidth, packetLoss, rtt } = networkStats;
// Downgrade if network conditions are poor
if (packetLoss > 5 || rtt > 200) {
return this.qualityLevels[4]; // 240p
}
// Select quality based on available bandwidth
for (const quality of this.qualityLevels) {
if (bandwidth * 0.8 >= quality.bitrate) { // 80% safety margin
return quality;
}
}
return this.qualityLevels[4]; // Fallback to lowest quality
}
}Storage Scaling
Recording Storage Architecture
# Tiered Storage Strategy
Hot Storage (S3 Standard):
- Duration: 30 days
- Use Case: Recent recordings, frequent access
- Cost: $0.023/GB/month
Warm Storage (S3 IA):
- Duration: 31-365 days
- Use Case: Occasional access
- Cost: $0.0125/GB/month
Cold Storage (S3 Glacier):
- Duration: 1-7 years
- Use Case: Compliance, archival
- Cost: $0.004/GB/month
Archive (S3 Deep Archive):
- Duration: 7+ years
- Use Case: Long-term retention
- Cost: $0.00099/GB/monthStorage Optimization
class RecordingStorageManager:
def __init__(self):
self.compression_ratios = {
'h264_high': 0.1, # 10:1 compression
'h264_medium': 0.15, # 6.7:1 compression
'h265_high': 0.05, # 20:1 compression
'av1_high': 0.03 # 33:1 compression
}
def optimize_storage(self, recording):
# Choose codec based on content type
if recording.content_type == 'screen_share':
codec = 'h265_high' # Better for screen content
elif recording.duration > 3600: # >1 hour
codec = 'av1_high' # Best compression for long recordings
else:
codec = 'h264_high' # Fast encoding for short recordings
compressed_size = recording.raw_size * self.compression_ratios[codec]
return {
'codec': codec,
'compressed_size': compressed_size,
'storage_tier': self.select_storage_tier(recording.access_pattern)
}Performance Monitoring
Real-time Metrics
# Key Performance Indicators
Connection Metrics:
- websocket_connections_active
- webrtc_peer_connections_active
- connection_establishment_time_p95
- connection_failure_rate
Media Quality Metrics:
- video_bitrate_avg
- audio_packet_loss_p95
- video_frame_rate_avg
- end_to_end_latency_p95
Server Performance:
- cpu_utilization_avg
- memory_utilization_avg
- network_bandwidth_utilization
- disk_io_utilization
Business Metrics:
- meetings_active_count
- participants_total_count
- meeting_join_success_rate
- user_satisfaction_scoreAlerting Thresholds
Critical Alerts (PagerDuty):
- connection_failure_rate > 5%
- meeting_join_success_rate < 95%
- end_to_end_latency_p95 > 500ms
- service_availability < 99.9%
Warning Alerts (Slack):
- cpu_utilization > 80%
- memory_utilization > 85%
- video_packet_loss_p95 > 3%
- storage_utilization > 90%
Capacity Alerts:
- websocket_connections > 80% of capacity
- media_servers_utilization > 75%
- database_connections > 70% of pool
- cdn_bandwidth > 80% of limitCost Optimization
Resource Right-Sizing
class CostOptimizer:
def analyze_usage_patterns(self, timeframe_days=30):
usage_data = self.get_usage_metrics(timeframe_days)
recommendations = []
# Identify over-provisioned servers
for server in usage_data['servers']:
if server['avg_cpu'] < 30 and server['avg_memory'] < 40:
recommendations.append({
'type': 'downsize',
'server_id': server['id'],
'current_type': server['instance_type'],
'recommended_type': self.get_smaller_instance(server['instance_type']),
'monthly_savings': server['monthly_cost'] * 0.5
})
# Identify underutilized regions
for region in usage_data['regions']:
if region['avg_utilization'] < 20:
recommendations.append({
'type': 'consolidate',
'region': region['name'],
'monthly_savings': region['monthly_cost'] * 0.8
})
return recommendationsReserved Instance Strategy
# 3-Year Reserved Instance Plan
Compute (EC2):
- 70% Reserved Instances (3-year term)
- 20% Spot Instances (non-critical workloads)
- 10% On-Demand (peak scaling)
Storage (S3):
- Intelligent Tiering enabled
- Lifecycle policies for automatic archival
- Cross-region replication optimization
Network:
- CloudFront reserved capacity
- Direct Connect for high-volume regions
- Data transfer optimization