Architecture

📖 6 min read 📄 Part 3 of 10

Video Conferencing System - Architecture

High-Level Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Client Applications                       │
├─────────────┬─────────────┬─────────────┬─────────────────────┤
│ Web Browser │ Desktop App │ Mobile App  │ Room System         │
│ (WebRTC)    │ (Native)    │ (Native)    │ (Hardware)          │
└─────────────┴─────────────┴─────────────┴─────────────────────┘
                              │
                    ┌─────────┴─────────┐
                    │   Load Balancer   │
                    │   (Global CDN)    │
                    └─────────┬─────────┘
                              │
┌─────────────────────────────┼─────────────────────────────┐
│                    Edge Layer                             │
├─────────────────────────────┼─────────────────────────────┤
│  ┌─────────────┐  ┌─────────┴─────────┐  ┌─────────────┐ │
│  │ STUN/TURN   │  │  Media Gateway    │  │ Edge Cache  │ │
│  │ Servers     │  │  (WebRTC Proxy)   │  │ (Static)    │ │
│  └─────────────┘  └─────────┬─────────┘  └─────────────┘ │
└─────────────────────────────┼─────────────────────────────┘
                              │
┌─────────────────────────────┼─────────────────────────────┐
│                   Core Services Layer                     │
├─────────────────────────────┼─────────────────────────────┤
│  ┌─────────────┐  ┌─────────┴─────────┐  ┌─────────────┐ │
│  │ Signaling   │  │  Media Servers    │  │ Recording   │ │
│  │ Service     │  │  (SFU/MCU)        │  │ Service     │ │
│  │             │  │                   │  │             │ │
│  └─────────────┘  └─────────┬─────────┘  └─────────────┘ │
│                              │                           │
│  ┌─────────────┐  ┌─────────┴─────────┐  ┌─────────────┐ │
│  │ Meeting     │  │  Chat Service     │  │ User        │ │
│  │ Service     │  │                   │  │ Service     │ │
│  └─────────────┘  └───────────────────┘  └─────────────┘ │
└─────────────────────────────────────────────────────────┘
                              │
┌─────────────────────────────┼─────────────────────────────┐
│                    Data Layer                             │
├─────────────────────────────┼─────────────────────────────┤
│  ┌─────────────┐  ┌─────────┴─────────┐  ┌─────────────┐ │
│  │ PostgreSQL  │  │     Redis         │  │ Cassandra   │ │
│  │ (Metadata)  │  │   (Sessions)      │  │ (Analytics) │ │
│  └─────────────┘  └───────────────────┘  └─────────────┘ │
│                                                          │
│  ┌─────────────┐  ┌───────────────────┐  ┌─────────────┐ │
│  │ Object      │  │    Elasticsearch  │  │ Time Series │ │
│  │ Storage     │  │    (Search/Logs)  │  │ DB (Metrics)│ │
│  │ (S3)        │  │                   │  │ (InfluxDB)  │ │
│  └─────────────┘  └───────────────────┘  └─────────────┘ │
└─────────────────────────────────────────────────────────┘

Core Components

1. Signaling Service

Purpose: WebRTC signaling, session management, and call orchestration

Responsibilities:

  • WebRTC offer/answer exchange
  • ICE candidate negotiation
  • Session state management
  • Participant join/leave coordination
  • Meeting room management

Technology Stack:

  • Node.js with Socket.IO for WebSocket connections
  • Redis for session state storage
  • PostgreSQL for persistent meeting data
  • Load balancing with sticky sessions

2. Media Servers (SFU Architecture)

Purpose: Selective Forwarding Unit for efficient media routing

SFU Benefits:

  • Lower server CPU usage vs MCU
  • Better quality preservation
  • Scalable to large meetings
  • Client-side mixing flexibility

Media Processing:

Participant A ──┐
                ├──► SFU ──┐──► Participant B
Participant C ──┘         ├──► Participant D
                          └──► Participant E

Features:

  • Adaptive bitrate streaming
  • Simulcast support (multiple quality streams)
  • Bandwidth optimization
  • Packet loss recovery (NACK, FEC)

3. Recording Service

Purpose: Meeting recording and playback functionality

Recording Pipeline:

Media StreamsCompositorEncoderStoragePlayback API

Components:

  • Media Compositor: Combines audio/video streams
  • Encoder: H.264/VP8 video, Opus audio encoding
  • Storage: Distributed file system (S3/GCS)
  • Transcoding: Multiple format generation
  • Metadata: Recording index and search

4. Chat Service

Purpose: Real-time messaging during meetings

Architecture:

  • WebSocket connections for real-time delivery
  • Message persistence in Cassandra
  • Redis pub/sub for message routing
  • Rich media support (files, images, reactions)

5. User Service

Purpose: Authentication, authorization, and user management

Features:

  • OAuth 2.0 / SAML integration
  • JWT token management
  • User profile and preferences
  • Meeting permissions and roles
  • Calendar integration

Media Architecture

WebRTC Media Flow

┌─────────────┐    Signaling     ┌─────────────┐
│  Client A   │◄────────────────►│  Client B   │
└─────────────┘                  └─────────────┘
       │                                │
       │ Media (P2P or via SFU)        │
       └────────────────────────────────┘

SFU Media Routing

┌─────────────┐                  ┌─────────────┐
│  Client A   │                  │  Client B   │
│             │                  │             │
│ ┌─────────┐ │    Upload        │ ┌─────────┐ │
│ │ Camera  │ ├──────────┐       │ │ Display │ │
│ └─────────┘ │          │       │ └─────────┘ │
│ ┌─────────┐ │          ▼       │ ┌─────────┐ │
│ │   Mic   │ │     ┌─────────┐  │ │ Speaker │ │
│ └─────────┘ │     │   SFU   │  │ └─────────┘ │
└─────────────┘     │ Server  │  └─────────────┘
                    └─────────┘
                         │
                    Download
                         ▼
                  ┌─────────────┐
                  │  Client C   │
                  │             │
                  │ ┌─────────┐ │
                  │ │ Display │ │
                  │ └─────────┘ │
                  │ ┌─────────┐ │
                  │ │ Speaker │ │
                  │ └─────────┘ │
                  └─────────────┘

Adaptive Bitrate Streaming

Client Bandwidth Assessment
           │
           ▼
┌─────────────────────┐
│ Quality Selection   │
│ - 4K (8 Mbps)      │
│ - 1080p (4 Mbps)   │
│ - 720p (2 Mbps)    │
│ - 480p (1 Mbps)    │
│ - 240p (500 kbps)  │
└─────────────────────┘
           │
           ▼
    Stream Delivery

Scaling Architecture

Horizontal Scaling

┌─────────────────────────────────────────────────────────┐
│                    Load Balancer                        │
└─────────────────────┬───────────────────────────────────┘
                      │
        ┌─────────────┼─────────────┐
        │             │             │
        ▼             ▼             ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│   Region    │ │   Region    │ │   Region    │
│   US-East   │ │   EU-West   │ │  Asia-Pac   │
└─────────────┘ └─────────────┘ └─────────────┘
        │             │             │
        ▼             ▼             ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Media       │ │ Media       │ │ Media       │
│ Servers     │ │ Servers     │ │ Servers     │
│ (Auto-scale)│ │ (Auto-scale)│ │ (Auto-scale)│
└─────────────┘ └─────────────┘ └─────────────┘

Auto-Scaling Strategy

  • CPU-based: Scale when CPU > 70%
  • Connection-based: Scale when connections > 1000 per server
  • Latency-based: Scale when P95 latency > 100ms
  • Predictive: Scale before peak hours
  • Geographic: Scale based on regional demand

Security Architecture

End-to-End Encryption

┌─────────────┐    Encrypted     ┌─────────────┐
│  Client A   │    Media Stream  │  Client B   │
│             │◄────────────────►│             │
│ Private Key │                  │ Private Key │
└─────────────┘                  └─────────────┘
       │                                │
       └────── Key Exchange ────────────┘
              (via Signaling Server)

Security Layers

  1. Transport Security: TLS 1.3 for all connections
  2. Media Encryption: SRTP with AES-256
  3. Signaling Encryption: WSS (WebSocket Secure)
  4. Authentication: OAuth 2.0 / JWT tokens
  5. Authorization: Role-based access control

Monitoring and Observability

Metrics Collection

┌─────────────┐    Metrics     ┌─────────────┐
│ Application │──────────────►│  Prometheus │
│  Services   │               │   Server    │
└─────────────┘               └─────────────┘
                                     │
                              ┌─────────────┐
                              │  Grafana    │
                              │ Dashboard   │
                              └─────────────┘

Key Metrics

  • Connection Success Rate: % of successful meeting joins
  • Audio/Video Quality: Packet loss, jitter, latency
  • Server Performance: CPU, memory, network utilization
  • User Experience: Join time, call duration, drop rate
  • Business Metrics: Active users, meeting minutes, revenue

Alerting System

  • Critical: Service outages, high error rates
  • Warning: Performance degradation, capacity limits
  • Info: Deployment notifications, maintenance windows
  • Escalation: PagerDuty integration for 24/7 support

Disaster Recovery

Multi-Region Deployment

  • Active-Active: Multiple regions serving traffic
  • Failover: Automatic traffic routing during outages
  • Data Replication: Cross-region database replication
  • Backup: Regular backups with point-in-time recovery
  • Testing: Regular disaster recovery drills

Recovery Time Objectives

  • RTO: 5 minutes for service restoration
  • RPO: 1 minute for data loss tolerance
  • Availability: 99.99% uptime SLA
  • Monitoring: Real-time health checks and alerts