Security & Privacy

📖 16 min read 📄 Part 9 of 10

Load Balancer - Security and Privacy

DDoS Protection

SYN Flood Protection

Attack: Attacker sends massive volume of TCP SYN packets without completing handshake
Impact: Exhausts connection table memory, prevents legitimate connections
Scale: Modern attacks reach 100M+ SYN packets/second

Defense layers:

1. SYN Cookies (Kernel-level):
   - Don't allocate state until handshake completes
   - Encode connection info in initial sequence number
   - Verify on ACK receipt (stateless until connection established)
   - Capacity: Unlimited SYNs without memory exhaustion
   - Trade-off: Loses TCP options (window scaling, SACK) until validated

2. SYN Proxy (LB-level):
   - LB completes TCP handshake with client
   - Only forwards to backend after valid ACK received
   - Absorbs SYN flood without backend impact
   - Memory: Small per-SYN state (64 bytes vs 512 bytes full connection)

3. Rate Limiting SYNs per source IP:
   - Threshold: 100 SYNs/second per IP (configurable)
   - Action: DROP excess SYNs, log source IP
   - Whitelist: Known good IPs exempt from limit

4. XDP/eBPF SYN validation:
   - Process SYN packets at NIC driver level
   - Drop invalid SYNs before kernel processing
   - Capacity: 10M+ packets/sec per core
   - Used by: Cloudflare, Facebook (Katran)

Configuration example:
  syn_flood_protection:
    enabled: true
    syn_cookies: true
    syn_proxy: true
    max_syn_backlog: 65536
    syn_rate_limit_per_ip: 100
    syn_rate_limit_global: 1000000
    xdp_validation: true

Amplification Attack Protection

Attack: Attacker spoofs victim's IP, sends requests to amplifiers (DNS, NTP, memcached)
Impact: Victim receives massive amplified response traffic (100x amplification possible)
Scale: Largest recorded: 3.47 Tbps (2022)

Defense at LB layer:

1. Ingress filtering (BCP38):
   - Verify source IP is routable and not spoofed
   - Drop packets with source IPs from own network (reflection)
   - Implement uRPF (Unicast Reverse Path Forwarding)

2. Protocol-specific rate limiting:
   - DNS response rate limiting: 1000 responses/sec per destination
   - NTP monlist blocking: Drop NTP mode 7 packets
   - Memcached: Block UDP port 11211 from internet

3. Traffic scrubbing:
   - Divert suspicious traffic through scrubbing center
   - Analyze traffic patterns, drop attack traffic
   - Forward clean traffic to LB
   - Providers: Cloudflare, AWS Shield, Akamai Prolexic

4. Anycast absorption:
   - Distribute attack across multiple PoPs
   - Each PoP absorbs fraction of attack
   - 100 PoPs × 10 Gbps each = 1 Tbps absorption capacity

5. Blackhole routing (last resort):
   - Advertise /32 route to null for attacked IP
   - Drops ALL traffic (attack + legitimate)
   - Used when attack threatens network infrastructure

Slowloris and Slow HTTP Attacks

Attack: Open many connections, send data very slowly to exhaust connection slots
Impact: Ties up all available connections, denies service to legitimate users
Variants: Slow headers, slow POST body, slow read

Defense mechanisms:

1. Connection timeouts:
   - Header timeout: 10 seconds (must receive complete headers)
   - Body timeout: 30 seconds (must receive complete body)
   - Idle timeout: 60 seconds (no data = close connection)
   - Minimum data rate: 100 bytes/second (below = close)

2. Connection limits per IP:
   - Max concurrent connections per IP: 100
   - Max new connections per IP per second: 20
   - Exempt: Known good IPs, internal services

3. Request size limits:
   - Max header size: 8KB
   - Max request line: 4KB
   - Max number of headers: 100
   - Max body size: 10MB (configurable per route)

4. Behavioral detection:
   - Track connection duration vs data transferred
   - Flag connections with abnormally low throughput
   - Score-based system: multiple slow indicators = block

Configuration:
  slowloris_protection:
    header_timeout_seconds: 10
    body_timeout_seconds: 30
    idle_timeout_seconds: 60
    min_data_rate_bytes_per_second: 100
    max_connections_per_ip: 100
    max_new_connections_per_ip_per_second: 20

HTTP Flood (Layer 7 DDoS)

Attack: Legitimate-looking HTTP requests at massive scale
Impact: Overwhelms application layer (harder to distinguish from real traffic)
Scale: 10M+ requests/second from botnets

Defense mechanisms:

1. Rate limiting (progressive):
   - Tier 1: 100 req/sec per IP (soft limit, add CAPTCHA)
   - Tier 2: 500 req/sec per IP (hard limit, block)
   - Tier 3: 10,000 req/sec global per URL (protect specific endpoints)

2. Challenge-response:
   - JavaScript challenge (blocks simple bots)
   - CAPTCHA for suspicious traffic
   - Proof-of-work challenge (computational cost for client)

3. Behavioral analysis:
   - Request pattern analysis (uniform timing = bot)
   - Browser fingerprinting (headless browsers detected)
   - TLS fingerprinting (JA3/JA4 hash identifies bot libraries)
   - Mouse/keyboard interaction tracking (for web applications)

4. Reputation-based blocking:
   - IP reputation databases (Spamhaus, AbuseIPDB)
   - ASN reputation (hosting providers used by botnets)
   - Geographic anomaly detection (sudden traffic from unusual regions)
   - Device fingerprint reputation

5. Adaptive rate limiting:
   - Normal: 100 req/sec per IP
   - Under attack: 10 req/sec per IP (tighten automatically)
   - Detection: Global request rate exceeds 3x normal for 60 seconds
   - Recovery: Gradually relax limits after attack subsides

SSL/TLS Termination and Re-encryption

TLS Termination at Load Balancer

Architecture:
  Client <--[TLS 1.3]--> LB <--[plaintext or TLS]--> Backend

Benefits:
  - Centralized certificate management (one place to update certs)
  - Offload CPU-intensive crypto from backends
  - Enable L7 inspection (routing, WAF, logging)
  - Better TLS configuration (enforce modern ciphers centrally)
  - Session resumption across backends (shared session cache)

TLS Configuration (production-grade):
  tls:
    min_version: TLSv1.2
    max_version: TLSv1.3
    cipher_suites:
      # TLS 1.3 (always preferred)
      - TLS_AES_256_GCM_SHA384
      - TLS_CHACHA20_POLY1305_SHA256
      - TLS_AES_128_GCM_SHA256
      # TLS 1.2 (for compatibility)
      - ECDHE-ECDSA-AES256-GCM-SHA384
      - ECDHE-RSA-AES256-GCM-SHA384
      - ECDHE-ECDSA-CHACHA20-POLY1305
      - ECDHE-RSA-CHACHA20-POLY1305
    ecdh_curves:
      - X25519
      - P-256
      - P-384
    session_tickets: true
    session_timeout: 3600
    ocsp_stapling: true
    hsts:
      enabled: true
      max_age: 31536000
      include_subdomains: true
      preload: true

Performance impact:
  - RSA-2048 handshake: ~1ms CPU time
  - ECDSA P-256 handshake: ~0.2ms CPU time
  - Session resumption: ~0.05ms (skip key exchange)
  - TLS 1.3 0-RTT: Zero additional latency for repeat connections
  - Bulk encryption (AES-GCM): ~1 Gbps per core with AES-NI

Backend Re-encryption (TLS to Backend)

Architecture:
  Client <--[TLS 1.3]--> LB <--[TLS 1.2/1.3]--> Backend

When to use:
  - Compliance requirements (data encrypted in transit everywhere)
  - Zero-trust network model
  - Backend in different security zone
  - Regulatory requirements (PCI-DSS, HIPAA)

Configuration:
  backend_tls:
    enabled: true
    verify_certificate: true
    ca_certificate: "/certs/internal-ca.pem"
    client_certificate: "/certs/lb-client.pem"  # mTLS
    client_key: "/certs/lb-client-key.pem"
    sni_hostname: "backend.internal.example.com"
    min_version: TLSv1.2
    cipher_suites:
      - ECDHE-ECDSA-AES256-GCM-SHA384

Performance impact:
  - Additional ~0.5ms latency per request (backend TLS handshake)
  - Mitigated with connection pooling (reuse TLS connections)
  - ~10% CPU overhead on LB for re-encryption

SNI-Based Routing

How: Use TLS Server Name Indication to route before decryption

Use case: Multiple domains on same IP, route to different backends

Process:
  1. Client sends ClientHello with SNI = "api.example.com"
  2. LB reads SNI (unencrypted in TLS 1.2, encrypted in ECH/TLS 1.3)
  3. LB selects certificate and backend pool based on SNI
  4. TLS handshake completes with correct certificate

Configuration:
  sni_routing:
    - sni: "api.example.com"
      certificate: "cert-api"
      pool: "pool-api-production"
    - sni: "web.example.com"
      certificate: "cert-web"
      pool: "pool-web-production"
    - sni: "*.example.com"
      certificate: "cert-wildcard"
      pool: "pool-default"
    - default:
      action: "reject"  # No matching SNI = connection refused

WAF (Web Application Firewall) Integration

WAF at Load Balancer Layer

Architecture:
  Client -> LB (TLS termination) -> WAF Engine -> Routing -> Backend

Inspection points:
  - Request headers (Host, User-Agent, Cookie, Authorization)
  - Request URL and query parameters
  - Request body (POST data, JSON, XML, multipart)
  - Response headers and body (optional, for data leak prevention)

Rule categories:

1. OWASP Core Rule Set (CRS):
   - SQL Injection detection (pattern matching + libinjection)
   - Cross-Site Scripting (XSS) detection
   - Remote Code Execution patterns
   - Local/Remote File Inclusion
   - Command Injection

2. Protocol enforcement:
   - Valid HTTP method (block TRACE, CONNECT for web apps)
   - Content-Type validation
   - Request size limits
   - Character encoding validation
   - Multipart form validation

3. Bot detection:
   - Known bad User-Agents
   - Missing expected headers (Accept, Accept-Language)
   - TLS fingerprint analysis (JA3)
   - Request timing analysis

4. Custom rules:
   - Business logic protection (rate limit login attempts)
   - API schema validation (reject malformed JSON)
   - Geographic restrictions
   - Time-based access control

Performance impact:
  - Simple pattern matching: <0.1ms per request
  - Full CRS evaluation: 1-5ms per request
  - Body inspection (large payloads): 5-20ms per request
  - Recommendation: Inspect headers always, body selectively

Configuration:
  waf:
    enabled: true
    mode: "BLOCK"  # DETECT (log only) or BLOCK (reject)
    rule_sets:
      - owasp_crs_v4
      - custom_api_rules
    exclusions:
      - path: "/api/upload"
        rules: ["body_size_limit"]
      - path: "/webhooks/*"
        rules: ["sql_injection"]  # Webhook payloads trigger false positives
    anomaly_scoring:
      threshold: 5  # Block if cumulative score >= 5
      per_rule_score: 1-5  # Based on severity

IP Allowlisting and Blocklisting

Implementation Architecture

Data structures for fast IP lookup:

1. Exact IP match: Hash set
   - O(1) lookup
   - Memory: 10M IPs × 4 bytes = 40 MB (IPv4)

2. CIDR range match: Radix tree (Patricia trie)
   - O(32) lookup for IPv4, O(128) for IPv6
   - Memory: 100K ranges × 64 bytes = 6.4 MB

3. Country/ASN match: Pre-computed GeoIP database
   - O(1) lookup (binary search on sorted ranges)
   - Memory: ~50 MB (MaxMind database)

Processing order (short-circuit evaluation):
  1. Check allowlist (if match -> ALLOW, skip remaining checks)
  2. Check blocklist (if match -> DENY)
  3. Check rate limits
  4. Check WAF rules
  5. Default: ALLOW

Dynamic Blocklist Management

Sources of blocklist entries:

1. Automated detection:
   - Rate limit violations (auto-block after 3 violations in 1 hour)
   - WAF rule triggers (auto-block after 10 attacks in 5 minutes)
   - Failed authentication attempts (auto-block after 50 failures)
   - Port scanning detection

2. Threat intelligence feeds:
   - Spamhaus DROP/EDROP lists (updated hourly)
   - AbuseIPDB (community-reported IPs)
   - Internal threat intelligence
   - Tor exit nodes (optional, context-dependent)

3. Manual entries:
   - Operator-added blocks (with expiry)
   - Incident response blocks (immediate, reviewed within 24h)

Auto-expiry:
  - Rate limit blocks: 1 hour (escalating: 1h, 4h, 24h, 7d)
  - WAF blocks: 24 hours
  - Threat intel: Until removed from feed
  - Manual blocks: Configurable (default 30 days)

API for blocklist management:
  POST /api/v1/acls/blocklist
  {
    "ip": "203.0.113.50",
    "cidr": "203.0.113.0/24",  // or specific IP
    "reason": "Automated: rate_limit_violation",
    "source": "auto_detection",
    "expires_at": "2024-01-21T15:00:00Z",
    "severity": "medium"
  }

Rate Limiting at LB Layer

Multi-Tier Rate Limiting

Tier 1: Global rate limit (protect infrastructure)
  - 5M requests/second total capacity
  - Action: Return 503 when exceeded
  - Purpose: Prevent total system overload

Tier 2: Per-IP rate limit (prevent abuse)
  - 100 requests/second per source IP
  - Burst: 200 requests (token bucket)
  - Action: Return 429 with Retry-After header
  - Purpose: Prevent single-source abuse

Tier 3: Per-endpoint rate limit (protect specific APIs)
  - /api/login: 5 requests/minute per IP
  - /api/search: 30 requests/minute per IP
  - /api/upload: 10 requests/minute per user
  - Action: Return 429 with specific error message

Tier 4: Per-user rate limit (authenticated)
  - Free tier: 1000 requests/hour
  - Pro tier: 10000 requests/hour
  - Enterprise: Custom limits
  - Key: API key or JWT subject claim

Implementation:
  Algorithm: Token bucket (per-IP) + sliding window (per-endpoint)
  Storage: In-memory hash map with LRU eviction
  Distributed: Approximate local counting + periodic Redis sync
  Accuracy: ±10% (acceptable for rate limiting)

Response headers (RFC 6585 compliant):
  X-RateLimit-Limit: 100
  X-RateLimit-Remaining: 45
  X-RateLimit-Reset: 1705766400
  Retry-After: 30

Distributed Rate Limiting

Challenge: 200 LB instances must enforce global rate limits

Approach 1: Local approximation
  - Each instance enforces limit/N (where N = instance count)
  - Simple but inaccurate (±50% if traffic unevenly distributed)
  - No coordination overhead

Approach 2: Periodic sync via Redis
  - Each instance maintains local counter
  - Every 1 second, sync to Redis (INCRBY)
  - Read global count from Redis
  - Accuracy: ±10% (1-second window of drift)
  - Latency: No impact on request path (async sync)

Approach 3: Token bucket in Redis (exact)
  - Every request checks Redis (Lua script for atomicity)
  - Exact enforcement across all instances
  - Latency: +0.5ms per request (Redis round-trip)
  - Use only for critical limits (login, payment)

Recommended: Approach 2 for most limits, Approach 3 for security-critical endpoints

mTLS for Backend Communication

Mutual TLS Architecture

Purpose: Verify both LB and backend identity (zero-trust networking)

Certificate hierarchy:
  Root CA (offline, HSM-protected)
    └── Intermediate CA (online, short-lived)
        ├── LB Client Certificate (identifies LB to backends)
        ├── Backend Server Certificate (identifies backend to LB)
        └── Service Certificates (for service-to-service)

Handshake flow:
  1. LB connects to backend, presents client certificate
  2. Backend verifies LB certificate against trusted CA
  3. Backend presents server certificate
  4. LB verifies backend certificate against trusted CA
  5. Both parties authenticated, encrypted channel established

Configuration (LB side):
  backend_mtls:
    client_certificate: "/certs/lb-client.pem"
    client_key: "/certs/lb-client-key.pem"
    ca_certificate: "/certs/internal-ca-bundle.pem"
    verify_backend: true
    allowed_sans:  # Only connect to backends with these SANs
      - "*.api.internal.example.com"
      - "*.worker.internal.example.com"
    crl_url: "http://pki.internal/crl.pem"
    ocsp_url: "http://pki.internal/ocsp"

Benefits:
  - Prevents unauthorized backends from receiving traffic
  - Prevents unauthorized LBs from connecting to backends
  - Encrypted communication even on internal network
  - Audit trail (certificate identity in logs)
  - Compliance (PCI-DSS, SOC2, HIPAA)

Certificate Rotation for mTLS

Rotation strategy: Dual-certificate overlap period

Timeline:
  Day 0:  Generate new certificate (valid from Day 0)
  Day 0:  Deploy new cert alongside old cert (both valid)
  Day 1-7: Gradually shift to new certificate
  Day 7:  Remove old certificate
  Day 30: Old certificate expires (safety margin)

Automated rotation:
  - Certificate lifetime: 90 days
  - Rotation trigger: 30 days before expiry
  - Rotation method: Rolling deployment (no downtime)
  - Monitoring: Alert if cert expires within 14 days

Tools:
  - cert-manager (Kubernetes): Automatic rotation with Let's Encrypt
  - Vault PKI: Internal CA with short-lived certificates (24h)
  - SPIFFE/SPIRE: Identity framework with automatic cert rotation

Certificate Management and Rotation

Certificate Lifecycle Management

Stages:
  1. Generation/Procurement
     - ACME (Let's Encrypt): Automated, free, 90-day validity
     - Commercial CA: Manual, paid, 1-year validity
     - Internal CA: Automated, internal only, configurable validity

  2. Deployment
     - Push to all LB instances (configuration management)
     - Verify deployment (check all instances serving new cert)
     - Rollback plan (keep old cert available for 24h)

  3. Monitoring
     - Expiry tracking (alert at 30, 14, 7, 1 days)
     - Certificate transparency log monitoring
     - OCSP response monitoring
     - Certificate revocation monitoring

  4. Renewal
     - Automated renewal 30 days before expiry
     - Validation: HTTP-01, DNS-01, or TLS-ALPN-01 challenge
     - Deployment: Rolling update across LB fleet
     - Verification: Confirm new cert served on all instances

  5. Revocation (emergency)
     - Publish to CRL (Certificate Revocation List)
     - Update OCSP responder
     - Deploy replacement certificate immediately
     - Notify affected parties

Automation:
  certificate_management:
    provider: "letsencrypt"
    auto_renew: true
    renewal_days_before_expiry: 30
    challenge_type: "dns-01"
    dns_provider: "route53"
    deployment_strategy: "rolling"
    rollback_on_failure: true
    monitoring:
      expiry_warning_days: [30, 14, 7, 1]
      alert_channel: "pagerduty"

Multi-Domain and Wildcard Certificates

Strategy for large-scale deployments:

Option 1: Wildcard certificate
  - *.example.com covers all subdomains
  - Single cert for api.example.com, web.example.com, etc.
  - Simpler management (one cert to rotate)
  - Risk: Compromise affects all subdomains

Option 2: Per-service certificates
  - api.example.com has its own certificate
  - web.example.com has its own certificate
  - Better isolation (compromise limited to one service)
  - More complex management (many certs to track)

Option 3: SAN (Subject Alternative Name) certificates
  - Single cert with multiple domains listed
  - api.example.com, web.example.com, admin.example.com
  - Compromise: Between wildcard and per-service
  - Limitation: Must reissue to add/remove domains

Recommendation for production:
  - Wildcard for internal services (*.internal.example.com)
  - Per-service certs for external-facing services
  - Separate certs for different security zones
  - Short-lived certs (90 days) to limit exposure window

Access Logging and Audit Trails

Access Log Format

Log entry per request (structured JSON):
{
  "timestamp": "2024-01-20T15:00:00.123Z",
  "request_id": "req-abc123-def456",
  "client_ip": "203.0.113.50",
  "client_port": 54321,
  "server_ip": "10.0.42.100",
  "server_port": 8080,
  "method": "POST",
  "host": "api.example.com",
  "path": "/v1/users",
  "query": "page=1",
  "protocol": "HTTP/2",
  "status_code": 201,
  "request_size_bytes": 1024,
  "response_size_bytes": 256,
  "request_time_ms": 45.2,
  "upstream_time_ms": 42.1,
  "ssl_protocol": "TLSv1.3",
  "ssl_cipher": "TLS_AES_256_GCM_SHA384",
  "user_agent": "Mozilla/5.0...",
  "referer": "https://web.example.com/dashboard",
  "x_forwarded_for": "203.0.113.50",
  "lb_instance": "lb-us-east-1a-042",
  "pool_id": "pool-api-production",
  "backend_server": "backend-us-east-1a-api-0042",
  "rate_limited": false,
  "waf_action": "PASS",
  "waf_rules_matched": [],
  "geo_country": "US",
  "geo_city": "New York",
  "asn": 15169,
  "connection_reused": true,
  "compression": "gzip"
}

Volume: 1.16M entries/second × 1KB = 1.16 GB/s of logs
Storage: 100 TB/day (uncompressed), 10 TB/day (compressed)
Retention: 30 days hot (Elasticsearch), 1 year cold (S3)

Security Audit Trail

Audit events (control plane actions):

{
  "event_id": "audit-abc123",
  "timestamp": "2024-01-20T15:00:00Z",
  "actor": {
    "type": "user",
    "id": "admin@example.com",
    "ip": "10.0.1.50",
    "auth_method": "mTLS"
  },
  "action": "backend.remove",
  "resource": {
    "type": "backend_server",
    "id": "backend-us-east-1a-api-0042",
    "pool": "pool-api-production"
  },
  "details": {
    "drain_timeout": 300,
    "active_connections_at_removal": 0,
    "reason": "Scheduled decommission"
  },
  "result": "SUCCESS",
  "changes": {
    "before": {"status": "DRAINING", "connections": 0},
    "after": {"status": "REMOVED"}
  }
}

Audit events to capture:
  - Configuration changes (routing rules, ACLs, rate limits)
  - Backend additions/removals
  - Certificate uploads/rotations
  - Health check overrides
  - Rate limit exemptions
  - Emergency blocks/unblocks
  - Scaling events
  - Failover events

Storage:
  - Immutable append-only log
  - Cryptographically signed entries
  - Retention: 7 years (compliance)
  - Access: Read-only for most users, append-only for system
  - Tamper detection: Hash chain (each entry includes hash of previous)

Privacy Considerations

Data minimization:
  - Log client IPs (required for security)
  - Do NOT log request bodies (may contain PII)
  - Do NOT log Authorization header values
  - Do NOT log cookie values (session tokens)
  - Truncate User-Agent to 256 characters
  - Hash or mask sensitive query parameters

GDPR compliance:
  - Right to erasure: Ability to purge logs for specific IP/user
  - Data retention: Automatic deletion after retention period
  - Data access: Provide logs related to specific user on request
  - Data minimization: Only log what's necessary for security/operations

PCI-DSS compliance:
  - Never log full credit card numbers
  - Mask PAN in any logged data
  - Encrypt logs at rest
  - Restrict log access to authorized personnel
  - Retain logs for minimum 1 year

Log sanitization pipeline:
  1. Raw log generated (full data)
  2. Sanitization filter removes/masks sensitive fields
  3. Sanitized log written to storage
  4. Raw log discarded (never persisted)
  
  Sanitization rules:
    - Authorization header: Replace value with "[REDACTED]"
    - Cookie header: Replace value with "[REDACTED]"
    - Query params matching /password|token|secret|key/: Mask value
    - Request body: Never logged (only size)

Security Hardening Checklist

Network Security

□ LB management interface on separate network (not internet-facing)
□ Backend servers not directly accessible from internet
□ Management API requires mTLS or VPN access
□ ICMP rate limited (prevent ping flood)
□ Unused ports closed (only 80, 443 exposed)
□ IPv6 security equivalent to IPv4
□ BGP session authentication (MD5 or TCP-AO)
□ ARP spoofing protection on LB network segment

Application Security

□ TLS 1.2+ only (TLS 1.0/1.1 disabled)
□ Strong cipher suites only (no RC4, DES, 3DES, NULL)
□ HSTS enabled with long max-age
□ X-Frame-Options: DENY
□ X-Content-Type-Options: nosniff
□ Content-Security-Policy headers added
□ Server header removed or generic
□ Error pages don't leak internal information
□ Request size limits enforced
□ Timeout values configured (prevent resource exhaustion)

Operational Security

□ Principle of least privilege for all access
□ Multi-factor authentication for management access
□ API keys rotated every 90 days
□ Audit logging enabled and monitored
□ Automated vulnerability scanning
□ Security patches applied within 24 hours (critical)
□ Incident response plan documented and tested
□ Regular penetration testing (quarterly)
□ Configuration drift detection
□ Secrets stored in vault (not in config files)

This security design ensures the load balancer serves as a robust security boundary, protecting backend services from external threats while maintaining high performance and operational visibility.