🌐 Networking

Networking Protocols

πŸ“– 13 min read 🧠 Complete Guide

Networking Protocols: Complete Guide for System Design

Overview

Every system design interview involves services communicating over a network. Understanding protocols at a deep level lets you make informed decisions about latency, reliability, and scalability.


1. TCP vs UDP

TCP (Transmission Control Protocol)

How it works: Connection-oriented, reliable, ordered delivery.

Three-Way Handshake

Client                    Server
  β”‚                         β”‚
  │──── SYN (seq=x) ──────►│   1. Client initiates
  β”‚                         β”‚
  │◄─── SYN-ACK ───────────│   2. Server acknowledges + initiates
  β”‚     (seq=y, ack=x+1)   β”‚
  β”‚                         β”‚
  │──── ACK (ack=y+1) ────►│   3. Client confirms
  β”‚                         β”‚
  β”‚     Connection Open     β”‚

Cost: 1.5 RTT before any data flows. With TLS, add another 1-2 RTT.

Flow Control (Sliding Window)

  • Receiver advertises a window size (how much data it can buffer)
  • Sender never sends more than the window allows
  • Window shrinks as data arrives, grows as application reads it
  • Prevents fast sender from overwhelming slow receiver

Congestion Control

Algorithm Behavior
Slow Start Exponential growth until threshold
Congestion Avoidance Linear growth after threshold
Fast Retransmit Retransmit after 3 duplicate ACKs
Fast Recovery Don't reset to slow start on fast retransmit
BBR (Google) Model-based, measures bandwidth and RTT

Key insight for interviews: TCP slow start means new connections are slow. This is why connection pooling and HTTP/2 multiplexing matter.

Connection Teardown (Four-Way)

Client                    Server
  │──── FIN ───────────────►│
  │◄─── ACK ────────────────│
  │◄─── FIN ────────────────│
  │──── ACK ───────────────►│
  β”‚                         β”‚
  β”‚  TIME_WAIT (2Γ—MSL)     β”‚   Client waits ~60s before port reuse

TIME_WAIT problem: High-throughput servers can exhaust ephemeral ports. Solutions: SO_REUSEADDR, connection pooling, or switch to long-lived connections.

UDP (User Datagram Protocol)

How it works: Connectionless, unreliable, unordered. Just sends packets.

  • No handshake (0 RTT to start sending)
  • No guaranteed delivery or ordering
  • No flow/congestion control (application must handle)
  • Smaller header (8 bytes vs TCP's 20+ bytes)

When to Use Which

Use Case Protocol Why
Web APIs TCP Need reliability, ordering
Video streaming UDP Tolerate loss, need low latency
Gaming UDP Real-time, stale data useless
DNS queries UDP Small payload, speed matters
File transfer TCP Must have complete, ordered data
VoIP UDP Real-time, retransmission too slow
IoT telemetry UDP Lightweight, high volume

Interview tip: "We'd use TCP here because we need guaranteed delivery of financial transactions" or "UDP for live video because a retransmitted frame arrives too late to be useful."


2. HTTP/1.1 vs HTTP/2 vs HTTP/3

HTTP/1.1 (1997)

Key characteristics:

  • Text-based protocol
  • One request per TCP connection at a time (head-of-line blocking)
  • Workaround: browsers open 6-8 parallel connections per domain
  • Keep-Alive reuses connections but still sequential

Head-of-Line (HOL) Blocking:

Connection 1: [Request A]────────[Response A]────[Request C]────[Response C]
Connection 2: [Request B]──[Response B]──────────[idle]─────────────────────

If Response A is slow, Request C waits even though the server could serve it.

HTTP/2 (2015)

Key improvements:

  • Binary framing layer (more efficient parsing)
  • Multiplexing: Multiple streams over single TCP connection
  • Header compression (HPACK): Reduces redundant headers by 85-90%
  • Server push: Server sends resources before client requests them
  • Stream prioritization: Client hints which resources matter most
Single TCP Connection:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Stream 1: [Headers][Data][Data]             β”‚
β”‚ Stream 2: [Headers][Data]                   β”‚
β”‚ Stream 3: [Headers][Data][Data][Data]       β”‚
β”‚ (interleaved frames on the wire)            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Remaining problem: TCP-level HOL blocking. If a TCP packet is lost, ALL streams wait for retransmission, even unaffected ones.

Performance numbers:

  • 50-70% reduction in page load time for asset-heavy pages
  • Single connection vs 6-8 connections reduces server memory
  • Header compression saves 10-30KB per page load

HTTP/3 (2022) β€” QUIC

Key innovation: Runs over UDP instead of TCP, implements its own reliability.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           HTTP/3 (Application)        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚           QUIC (Transport)            β”‚
β”‚  β€’ Stream multiplexing               β”‚
β”‚  β€’ Per-stream flow control           β”‚
β”‚  β€’ Connection migration              β”‚
β”‚  β€’ 0-RTT connection establishment    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚           UDP (Network)               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Advantages over HTTP/2:

  • No HOL blocking: Lost packet only affects its stream
  • 0-RTT resumption: Returning clients send data immediately
  • Connection migration: Survives IP changes (WiFi β†’ cellular)
  • Built-in encryption: TLS 1.3 integrated into handshake

Connection establishment comparison:

HTTP/1.1 + TLS 1.2:  3 RTT (TCP + TLS + Request)
HTTP/2 + TLS 1.3:    2 RTT (TCP + TLS/Request combined)
HTTP/3 (new):        1 RTT (QUIC handshake includes crypto)
HTTP/3 (resumption): 0 RTT (send data with first packet)

Comparison Table

Feature HTTP/1.1 HTTP/2 HTTP/3
Transport TCP TCP QUIC (UDP)
Multiplexing No Yes Yes
HOL Blocking Application + TCP TCP only None
Header Compression None HPACK QPACK
Connection Setup 2-3 RTT 2 RTT 0-1 RTT
Connection Migration No No Yes
Encryption Optional Effectively required Mandatory

3. WebSocket vs SSE vs Long Polling

Long Polling

Client                         Server
  │── GET /updates ───────────►│
  β”‚                            β”‚  (holds connection open)
  β”‚                            β”‚  ... waits for data ...
  │◄── 200 OK + data ─────────│  (responds when data available)
  β”‚                            β”‚
  │── GET /updates ───────────►│  (immediately reconnects)
  β”‚                            β”‚  ... waits again ...

Characteristics:

  • Compatible with all infrastructure (proxies, load balancers)
  • Each response requires a new HTTP request
  • Timeout handling needed (30-60s typical)
  • Server holds connections open (resource intensive)
  • ~100ms latency per message (reconnection overhead)

Server-Sent Events (SSE)

Client                         Server
  │── GET /stream ────────────►│
  β”‚   Accept: text/event-streamβ”‚
  β”‚                            β”‚
  │◄── HTTP 200 ──────────────│
  β”‚    Content-Type:           β”‚
  β”‚    text/event-stream       β”‚
  β”‚                            β”‚
  │◄── data: message 1\n\n ───│  (server pushes)
  │◄── data: message 2\n\n ───│  (server pushes)
  │◄── data: message 3\n\n ───│  (server pushes)
  β”‚         ...                β”‚

Characteristics:

  • Unidirectional (server β†’ client only)
  • Built-in reconnection with Last-Event-ID
  • Text-based (UTF-8 only)
  • Works over standard HTTP (proxy-friendly)
  • Automatic reconnection by browser
  • Limited to ~6 connections per domain in HTTP/1.1

WebSocket

Client                         Server
  │── HTTP Upgrade Request ───►│
  β”‚   Upgrade: websocket       β”‚
  β”‚   Connection: Upgrade      β”‚
  β”‚                            β”‚
  │◄── 101 Switching ─────────│
  β”‚    Protocols               β”‚
  β”‚                            β”‚
  │◄──► Full-duplex binary ◄──►│  (bidirectional frames)
  │◄──► communication    ◄────►│

Characteristics:

  • Full-duplex (both directions simultaneously)
  • Binary and text frames
  • Low overhead per message (2-14 bytes framing vs HTTP headers)
  • Persistent connection
  • Requires WebSocket-aware load balancers
  • No built-in reconnection (application must handle)

Decision Framework

Requirement Best Choice Why
Real-time chat WebSocket Bidirectional, low latency
Live sports scores SSE Server-push only, auto-reconnect
Stock ticker WebSocket High frequency, bidirectional
News feed updates SSE Infrequent server pushes
Collaborative editing WebSocket Bidirectional, binary data
Simple notifications SSE One-way, simple implementation
Legacy system support Long Polling Works everywhere
IoT device commands WebSocket Bidirectional, persistent

Interview tip: "For a notification system, SSE is simpler and sufficient since we only push from server to client. WebSocket adds complexity we don't need."


4. DNS Resolution Flow

Complete Resolution Process

User types "www.example.com"
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     Cache hit?
β”‚  Browser Cache  │────────────────► Done (TTL-based)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚ Cache miss
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     Cache hit?
β”‚    OS Cache     │────────────────► Done
β”‚  (stub resolver)β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚ Cache miss
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     Cache hit?
β”‚ Recursive DNS   │────────────────► Done
β”‚ (ISP/8.8.8.8)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚ Cache miss (iterative queries begin)
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Root Server    │──► "Ask .com TLD server at 192.5.6.30"
β”‚  (13 clusters)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  TLD Server     │──► "Ask example.com NS at 205.251.192.1"
β”‚  (.com, .org)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Authoritative Server │──► "www.example.com = 93.184.216.34"
β”‚ (example.com)        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Record Types

Type Purpose Example
A IPv4 address example.com β†’ 93.184.216.34
AAAA IPv6 address example.com β†’ 2606:2800:220:1:...
CNAME Alias to another name www β†’ example.com
MX Mail server example.com β†’ mail.example.com
NS Nameserver delegation example.com β†’ ns1.example.com
TXT Arbitrary text SPF, DKIM, domain verification
SRV Service location _http._tcp.example.com

TTL and Caching Strategy

  • Short TTL (60-300s): Enables fast failover, more DNS traffic
  • Long TTL (3600-86400s): Reduces DNS load, slower failover
  • TTL=0: No caching (used during migrations)

System design implications:

  • DNS-based load balancing uses short TTLs for health-check responsiveness
  • CDN providers use 60s TTL for quick origin switching
  • During migrations, lower TTL days in advance, then switch

DNS in System Design

  • Global load balancing: GeoDNS routes users to nearest datacenter
  • Service discovery: Internal DNS for microservice endpoints
  • Failover: Health-checked DNS removes unhealthy endpoints
  • Blue-green deployments: Switch DNS to new environment

5. TLS Handshake

TLS 1.2 Handshake (2 RTT)

Client                              Server
  β”‚                                   β”‚
  │── ClientHello ───────────────────►│  Supported ciphers, random
  β”‚                                   β”‚
  │◄── ServerHello ──────────────────│  Chosen cipher, random
  │◄── Certificate ──────────────────│  Server's X.509 cert
  │◄── ServerKeyExchange ────────────│  DH parameters
  │◄── ServerHelloDone ──────────────│
  β”‚                                   β”‚
  │── ClientKeyExchange ─────────────►│  Client's DH public key
  │── ChangeCipherSpec ──────────────►│  "Switching to encrypted"
  │── Finished ──────────────────────►│  Encrypted verification
  β”‚                                   β”‚
  │◄── ChangeCipherSpec ─────────────│
  │◄── Finished ─────────────────────│
  β”‚                                   β”‚
  │◄══► Encrypted Application Data ◄═►│

TLS 1.3 Handshake (1 RTT)

Client                              Server
  β”‚                                   β”‚
  │── ClientHello ───────────────────►│  + key_share (DH public key)
  β”‚    + supported_versions           β”‚  + signature_algorithms
  β”‚    + key_share                    β”‚
  β”‚                                   β”‚
  │◄── ServerHello ──────────────────│  + key_share
  │◄── EncryptedExtensions ──────────│  (encrypted from here)
  │◄── Certificate ──────────────────│
  │◄── CertificateVerify ───────────│
  │◄── Finished ─────────────────────│
  β”‚                                   β”‚
  │── Finished ──────────────────────►│
  β”‚                                   β”‚
  │◄══► Encrypted Application Data ◄═►│

Key improvements in TLS 1.3:

  • 1 RTT handshake (vs 2 RTT in 1.2)
  • 0-RTT resumption (send data with first message)
  • Removed insecure algorithms (RSA key exchange, CBC, RC4, SHA-1)
  • Forward secrecy mandatory (ephemeral DH only)
  • Encrypted more of the handshake (hides certificate from observers)

Certificate Chain Verification

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Root CA (trusted)  β”‚  Pre-installed in OS/browser
β”‚   Self-signed        β”‚  ~150 root CAs trusted globally
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚ signs
           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Intermediate CA     β”‚  Issued by Root CA
β”‚                      β”‚  Used for day-to-day signing
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚ signs
           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Server Certificate  β”‚  Your domain's certificate
β”‚  (leaf cert)         β”‚  Contains public key + domain name
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

ALPN (Application-Layer Protocol Negotiation)

  • Negotiates application protocol during TLS handshake
  • Client sends list: ["h2", "http/1.1"]
  • Server picks: "h2"
  • Avoids extra round trip for protocol upgrade
  • Essential for HTTP/2 and HTTP/3 negotiation

6. gRPC and Protocol Buffers

Protocol Buffers (Protobuf)

Schema definition:

syntax = "proto3";

message User {
  string id = 1;           // field number, not value
  string name = 2;
  string email = 3;
  repeated string roles = 4;
  google.protobuf.Timestamp created_at = 5;
}

message GetUserRequest {
  string user_id = 1;
}

service UserService {
  rpc GetUser(GetUserRequest) returns (User);
  rpc ListUsers(ListUsersRequest) returns (stream User);
  rpc CreateUser(User) returns (User);
}

Binary encoding advantages:

  • 3-10x smaller than JSON
  • 20-100x faster serialization/deserialization
  • Schema evolution with backward/forward compatibility
  • Strongly typed (catches errors at compile time)

gRPC Communication Patterns

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    gRPC Modes                          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                       β”‚
β”‚  1. Unary RPC (request-response)                     β”‚
β”‚     Client ──[Request]──► Server ──[Response]──►     β”‚
β”‚                                                       β”‚
β”‚  2. Server Streaming                                  β”‚
β”‚     Client ──[Request]──► Server ──[R1][R2][R3]──►   β”‚
β”‚                                                       β”‚
β”‚  3. Client Streaming                                  β”‚
β”‚     Client ──[R1][R2][R3]──► Server ──[Response]──►  β”‚
β”‚                                                       β”‚
β”‚  4. Bidirectional Streaming                           β”‚
β”‚     Client ◄──[messages]──► Server                   β”‚
β”‚                                                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

gRPC vs REST Comparison

Aspect gRPC REST
Protocol HTTP/2 HTTP/1.1 or HTTP/2
Payload Protobuf (binary) JSON (text)
Contract .proto file (strict) OpenAPI (optional)
Streaming Native (4 modes) Limited (SSE, WebSocket)
Browser support Via grpc-web proxy Native
Code generation Built-in Third-party tools
Latency Lower (binary + HTTP/2) Higher (text + overhead)
Debugging Harder (binary) Easier (human-readable)
Load balancing L7 required (HTTP/2) L4 or L7

When to Use gRPC

Use gRPC for:

  • Internal microservice communication (performance critical)
  • Polyglot environments (code gen for 10+ languages)
  • Streaming data (real-time feeds, event streams)
  • Mobile clients with bandwidth constraints

Use REST for:

  • Public APIs (browser compatibility, developer familiarity)
  • Simple CRUD operations
  • When human readability matters for debugging
  • Third-party integrations

gRPC Performance Characteristics

Benchmark: 1000 requests, 1KB payload

REST/JSON:
  Serialization:   ~500ΞΌs
  Payload size:    ~1,200 bytes
  Total latency:   ~2ms

gRPC/Protobuf:
  Serialization:   ~50ΞΌs
  Payload size:    ~400 bytes
  Total latency:   ~0.5ms

Interview tip: "For service-to-service communication within our backend, gRPC gives us type safety, streaming, and 3-5x better performance. For our public API, REST is more accessible to third-party developers."


Quick Reference: Protocol Selection

Need reliable delivery?
β”œβ”€β”€ Yes β†’ TCP-based
β”‚   β”œβ”€β”€ Request-response? β†’ HTTP (REST or gRPC)
β”‚   β”œβ”€β”€ Server push only? β†’ SSE
β”‚   β”œβ”€β”€ Bidirectional real-time? β†’ WebSocket
β”‚   └── High-performance internal? β†’ gRPC
└── No β†’ UDP-based
    β”œβ”€β”€ Real-time media? β†’ RTP/WebRTC
    β”œβ”€β”€ Fast queries? β†’ DNS, QUIC
    └── IoT/lightweight? β†’ MQTT over UDP, CoAP

Interview Cheat Sheet

When interviewer asks... Key points to mention
"How do services communicate?" gRPC for internal, REST for external, async via message queues
"How to handle real-time updates?" WebSocket for bidirectional, SSE for server-push, consider scale
"Why is the first request slow?" TCP handshake + TLS handshake + DNS resolution = cold start
"How to reduce latency?" Connection pooling, HTTP/2 multiplexing, 0-RTT with TLS 1.3
"How does HTTPS work?" TLS handshake, certificate verification, symmetric key exchange
"HTTP/2 vs HTTP/3?" HTTP/3 eliminates TCP HOL blocking, enables connection migration