API Design

📖 10 min read 📄 Part 5 of 10

Load Balancer - API Design

Overview

The load balancer exposes two distinct API surfaces:

  1. Management API (Control Plane): RESTful API for configuration, monitoring, and administration
  2. Data Plane: The actual traffic forwarding path (not an API per se, but configurable behavior)

All management APIs require authentication via API keys or mTLS, support JSON request/response bodies, and follow REST conventions with proper HTTP status codes.

Base URL and Versioning

Base URL: https://lb-management.internal.example.com/api/v1
Authentication: Bearer token or mTLS client certificate
Content-Type: application/json
Rate Limit: 1000 requests/minute per API key

1. Backend Server Management API

Register a Backend Server

POST /api/v1/pools/{pool_id}/backends

Request:
{
  "hostname": "api-server-042.us-east-1a.internal",
  "ip_address": "10.0.42.100",
  "port": 8080,
  "weight": 100,
  "max_connections": 10000,
  "metadata": {
    "availability_zone": "us-east-1a",
    "instance_id": "i-0abc123def456",
    "version": "v2.3.1"
  },
  "health_check": {
    "path": "/health",
    "interval_seconds": 5,
    "timeout_seconds": 2
  },
  "tls": {
    "enabled": true,
    "verify": true,
    "sni_hostname": "api.internal.example.com"
  },
  "slow_start_seconds": 30
}

Response: 201 Created
{
  "server_id": "backend-us-east-1a-api-0042",
  "pool_id": "pool-api-production",
  "status": "STARTING",
  "created_at": "2024-01-20T14:30:00Z",
  "health_check_status": "UNKNOWN",
  "effective_weight": 0,
  "message": "Server registered. Will receive traffic after passing health checks."
}

Update Backend Server Weight

PATCH /api/v1/pools/{pool_id}/backends/{server_id}

Request:
{
  "weight": 50,
  "max_connections": 5000
}

Response: 200 OK
{
  "server_id": "backend-us-east-1a-api-0042",
  "weight": 50,
  "previous_weight": 100,
  "max_connections": 5000,
  "updated_at": "2024-01-20T14:35:00Z"
}

Drain a Backend Server (Graceful Removal)

POST /api/v1/pools/{pool_id}/backends/{server_id}/drain

Request:
{
  "timeout_seconds": 300,
  "close_idle_connections": true,
  "reason": "Scheduled maintenance window"
}

Response: 202 Accepted
{
  "server_id": "backend-us-east-1a-api-0042",
  "status": "DRAINING",
  "active_connections": 1523,
  "drain_started_at": "2024-01-20T14:40:00Z",
  "estimated_completion": "2024-01-20T14:45:00Z",
  "drain_id": "drain-abc123"
}

Remove a Backend Server

DELETE /api/v1/pools/{pool_id}/backends/{server_id}?force=false

Response: 200 OK
{
  "server_id": "backend-us-east-1a-api-0042",
  "status": "REMOVED",
  "connections_terminated": 0,
  "removed_at": "2024-01-20T14:50:00Z"
}

// With force=true, immediately terminates all connections
// With force=false (default), returns 409 if connections still active

List Backend Servers

GET /api/v1/pools/{pool_id}/backends?status=HEALTHY&page=1&per_page=50

Response: 200 OK
{
  "backends": [
    {
      "server_id": "backend-us-east-1a-api-0042",
      "hostname": "api-server-042.us-east-1a.internal",
      "ip_address": "10.0.42.100",
      "port": 8080,
      "weight": 100,
      "effective_weight": 100,
      "status": "HEALTHY",
      "admin_status": "ENABLED",
      "active_connections": 1523,
      "requests_per_second": 450,
      "avg_response_time_ms": 12.5,
      "error_rate_percent": 0.02,
      "last_health_check": "2024-01-20T14:55:00Z",
      "uptime_seconds": 864000
    }
  ],
  "pagination": {
    "page": 1,
    "per_page": 50,
    "total": 200,
    "total_pages": 4
  },
  "summary": {
    "total_backends": 200,
    "healthy": 195,
    "unhealthy": 3,
    "draining": 2,
    "disabled": 0
  }
}

2. Server Pool Management API

Create a Server Pool

POST /api/v1/pools

Request:
{
  "name": "API Production Pool",
  "algorithm": "LEAST_CONNECTIONS",
  "algorithm_config": {
    "weighted": true,
    "power_of_two_choices": true,
    "slow_start_duration_seconds": 30
  },
  "health_check": {
    "protocol": "HTTP",
    "path": "/health",
    "port": 8080,
    "interval_seconds": 5,
    "timeout_seconds": 2,
    "unhealthy_threshold": 3,
    "healthy_threshold": 2,
    "expected_status_codes": [200]
  },
  "session_persistence": {
    "enabled": true,
    "type": "COOKIE",
    "cookie_name": "SERVERID",
    "ttl_seconds": 3600
  },
  "connection_limits": {
    "max_connections_per_backend": 10000,
    "connection_timeout_ms": 5000,
    "idle_timeout_seconds": 300
  },
  "circuit_breaker": {
    "enabled": true,
    "error_threshold_percent": 50,
    "evaluation_window_seconds": 30,
    "recovery_timeout_seconds": 60
  }
}

Response: 201 Created
{
  "pool_id": "pool-api-production",
  "name": "API Production Pool",
  "status": "ACTIVE",
  "backend_count": 0,
  "created_at": "2024-01-20T10:00:00Z"
}

Update Pool Algorithm

PUT /api/v1/pools/{pool_id}/algorithm

Request:
{
  "algorithm": "WEIGHTED_ROUND_ROBIN",
  "algorithm_config": {
    "weighted": true,
    "slow_start_duration_seconds": 60
  }
}

Response: 200 OK
{
  "pool_id": "pool-api-production",
  "algorithm": "WEIGHTED_ROUND_ROBIN",
  "previous_algorithm": "LEAST_CONNECTIONS",
  "updated_at": "2024-01-20T15:00:00Z",
  "message": "Algorithm change applied. Traffic distribution will adjust gradually."
}

3. Health Check Configuration API

Get Health Check Configuration

GET /api/v1/pools/{pool_id}/health-check

Response: 200 OK
{
  "pool_id": "pool-api-production",
  "health_check": {
    "protocol": "HTTP",
    "path": "/health",
    "port": 8080,
    "method": "GET",
    "interval_seconds": 5,
    "timeout_seconds": 2,
    "unhealthy_threshold": 3,
    "healthy_threshold": 2,
    "expected_status_codes": [200],
    "expected_body_regex": "\"status\":\"ok\"",
    "headers": {
      "User-Agent": "LB-HealthCheck/1.0",
      "X-Health-Check": "true"
    },
    "tls_enabled": true,
    "tls_verify": true
  },
  "stats": {
    "last_check_time": "2024-01-20T15:00:05Z",
    "healthy_backends": 195,
    "unhealthy_backends": 3,
    "avg_check_duration_ms": 5.2
  }
}

Update Health Check Configuration

PUT /api/v1/pools/{pool_id}/health-check

Request:
{
  "protocol": "HTTP",
  "path": "/health/deep",
  "port": 8080,
  "method": "GET",
  "interval_seconds": 10,
  "timeout_seconds": 5,
  "unhealthy_threshold": 5,
  "healthy_threshold": 3,
  "expected_status_codes": [200, 204],
  "expected_body_regex": "\"database\":\"connected\"",
  "headers": {
    "Authorization": "Bearer ${health_check_token}"
  }
}

Response: 200 OK
{
  "pool_id": "pool-api-production",
  "health_check": { ... },
  "updated_at": "2024-01-20T15:05:00Z",
  "message": "Health check updated. New configuration effective immediately.",
  "warning": "Increasing unhealthy_threshold to 5 means backends will take 50s to be marked unhealthy."
}

Get Health Check Results for All Backends

GET /api/v1/pools/{pool_id}/health-check/results?include_history=true&history_minutes=60

Response: 200 OK
{
  "pool_id": "pool-api-production",
  "results": [
    {
      "server_id": "backend-us-east-1a-api-0042",
      "status": "HEALTHY",
      "consecutive_successes": 1440,
      "consecutive_failures": 0,
      "last_check": {
        "time": "2024-01-20T15:00:05Z",
        "duration_ms": 4.2,
        "status_code": 200,
        "body_match": true
      },
      "history": {
        "checks_total": 720,
        "checks_passed": 718,
        "checks_failed": 2,
        "availability_percent": 99.72,
        "avg_response_time_ms": 5.1,
        "p99_response_time_ms": 15.3
      }
    }
  ]
}

Manually Override Health Status

POST /api/v1/pools/{pool_id}/backends/{server_id}/health-override

Request:
{
  "status": "UNHEALTHY",
  "reason": "Manual override for emergency maintenance",
  "duration_seconds": 3600,
  "operator": "oncall-engineer@example.com"
}

Response: 200 OK
{
  "server_id": "backend-us-east-1a-api-0042",
  "status": "UNHEALTHY",
  "override_active": true,
  "override_expires": "2024-01-20T16:00:00Z",
  "connections_draining": 1523
}

4. Monitoring and Metrics API

Get Real-Time Statistics

GET /api/v1/stats

Response: 200 OK
{
  "timestamp": "2024-01-20T15:00:00Z",
  "cluster": {
    "total_instances": 200,
    "healthy_instances": 200,
    "total_requests_per_second": 1160000,
    "total_active_connections": 10000000,
    "total_bandwidth_gbps": 62.5,
    "avg_cpu_utilization_percent": 45.2,
    "avg_memory_utilization_percent": 68.1
  },
  "traffic": {
    "requests_per_second": 1160000,
    "new_connections_per_second": 100000,
    "bytes_in_per_second": 67108864000,
    "bytes_out_per_second": 331350000000,
    "ssl_handshakes_per_second": 50000
  },
  "errors": {
    "4xx_per_second": 5800,
    "5xx_per_second": 116,
    "connection_errors_per_second": 23,
    "timeout_errors_per_second": 58,
    "rate_limited_per_second": 2900
  },
  "latency": {
    "p50_ms": 0.8,
    "p90_ms": 1.5,
    "p95_ms": 2.1,
    "p99_ms": 4.8,
    "p999_ms": 12.3
  }
}

Get Per-Pool Metrics

GET /api/v1/pools/{pool_id}/metrics?window=5m&resolution=10s

Response: 200 OK
{
  "pool_id": "pool-api-production",
  "window": "5m",
  "resolution": "10s",
  "data_points": [
    {
      "timestamp": "2024-01-20T14:55:00Z",
      "requests_per_second": 45000,
      "active_connections": 500000,
      "error_rate_percent": 0.01,
      "avg_response_time_ms": 12.5,
      "p99_response_time_ms": 45.2,
      "bytes_in_per_second": 90000000,
      "bytes_out_per_second": 225000000,
      "healthy_backends": 195,
      "total_backends": 200
    }
  ],
  "aggregates": {
    "avg_rps": 44500,
    "max_rps": 52000,
    "avg_error_rate": 0.012,
    "avg_latency_ms": 12.8
  }
}

Get Per-Backend Metrics

GET /api/v1/pools/{pool_id}/backends/{server_id}/metrics?window=1h

Response: 200 OK
{
  "server_id": "backend-us-east-1a-api-0042",
  "window": "1h",
  "current": {
    "active_connections": 1523,
    "requests_per_second": 450,
    "bytes_in_per_second": 900000,
    "bytes_out_per_second": 2250000,
    "avg_response_time_ms": 12.5,
    "p99_response_time_ms": 45.0,
    "error_rate_4xx_percent": 0.5,
    "error_rate_5xx_percent": 0.01,
    "connection_errors": 0,
    "timeouts": 2
  },
  "totals": {
    "total_requests": 1620000,
    "total_bytes_in": 3240000000,
    "total_bytes_out": 8100000000,
    "total_errors": 162
  }
}

5. SSL/TLS Certificate Management API

Upload a Certificate

POST /api/v1/certificates

Request:
{
  "name": "api-example-com-2024",
  "domains": ["api.example.com", "*.api.example.com"],
  "certificate_pem": "-----BEGIN CERTIFICATE-----\nMIIE...\n-----END CERTIFICATE-----",
  "private_key_pem": "-----BEGIN EC PRIVATE KEY-----\nMHQC...\n-----END EC PRIVATE KEY-----",
  "chain_pem": "-----BEGIN CERTIFICATE-----\nMIID...\n-----END CERTIFICATE-----",
  "auto_renew": true,
  "notify_before_expiry_days": [30, 14, 7, 1]
}

Response: 201 Created
{
  "cert_id": "cert-api-example-com-2024",
  "domains": ["api.example.com", "*.api.example.com"],
  "type": "WILDCARD",
  "key_type": "ECDSA_P256",
  "issuer": "Let's Encrypt Authority X3",
  "valid_from": "2024-01-01T00:00:00Z",
  "valid_until": "2024-04-01T00:00:00Z",
  "fingerprint_sha256": "AB:CD:EF:12:34:...",
  "status": "ACTIVE",
  "deployed_to_instances": 200,
  "created_at": "2024-01-20T10:00:00Z"
}

List Certificates

GET /api/v1/certificates?expiring_within_days=30

Response: 200 OK
{
  "certificates": [
    {
      "cert_id": "cert-api-example-com-2024",
      "domains": ["api.example.com", "*.api.example.com"],
      "status": "ACTIVE",
      "valid_until": "2024-04-01T00:00:00Z",
      "days_until_expiry": 71,
      "auto_renew": true,
      "in_use_by_pools": ["pool-api-production"]
    }
  ],
  "summary": {
    "total": 25,
    "active": 23,
    "expiring_soon": 2,
    "expired": 0
  }
}

Rotate a Certificate

POST /api/v1/certificates/{cert_id}/rotate

Request:
{
  "new_certificate_pem": "-----BEGIN CERTIFICATE-----\n...",
  "new_private_key_pem": "-----BEGIN EC PRIVATE KEY-----\n...",
  "new_chain_pem": "-----BEGIN CERTIFICATE-----\n...",
  "rollout_strategy": "ROLLING",
  "rollout_percent_per_minute": 10
}

Response: 202 Accepted
{
  "rotation_id": "rotate-abc123",
  "status": "IN_PROGRESS",
  "old_cert_id": "cert-api-example-com-2024",
  "new_cert_fingerprint": "12:34:56:...",
  "rollout_progress_percent": 0,
  "estimated_completion": "2024-01-20T10:10:00Z"
}

6. Routing Rules API

Create a Routing Rule

POST /api/v1/routing-rules

Request:
{
  "name": "API v2 Canary Route",
  "priority": 100,
  "match": {
    "hosts": ["api.example.com"],
    "paths": ["/v2/*"],
    "methods": ["GET", "POST", "PUT", "DELETE"],
    "headers": {
      "X-Canary": ["true"]
    }
  },
  "action": {
    "type": "WEIGHTED_ROUTE",
    "targets": [
      {"pool_id": "pool-api-v2-canary", "weight": 10},
      {"pool_id": "pool-api-v2-stable", "weight": 90}
    ]
  },
  "transforms": {
    "add_request_headers": {
      "X-Forwarded-Proto": "https",
      "X-Request-ID": "${uuid}",
      "X-Real-IP": "${client_ip}"
    },
    "remove_response_headers": ["Server", "X-Powered-By"],
    "url_rewrite": "/v2/(.*) -> /api/$1"
  },
  "enabled": true
}

Response: 201 Created
{
  "rule_id": "rule-api-v2-canary",
  "name": "API v2 Canary Route",
  "priority": 100,
  "status": "ACTIVE",
  "created_at": "2024-01-20T10:00:00Z",
  "version": 1
}

List Routing Rules (Ordered by Priority)

GET /api/v1/routing-rules?pool_id=pool-api-production

Response: 200 OK
{
  "rules": [
    {
      "rule_id": "rule-api-v2-canary",
      "name": "API v2 Canary Route",
      "priority": 100,
      "match_summary": "Host: api.example.com, Path: /v2/*, Header: X-Canary=true",
      "action_summary": "Weighted: canary(10%) + stable(90%)",
      "enabled": true,
      "hit_count_last_hour": 45000,
      "version": 3
    }
  ],
  "total_rules": 150
}

Test a Routing Rule (Dry Run)

POST /api/v1/routing-rules/test

Request:
{
  "method": "GET",
  "host": "api.example.com",
  "path": "/v2/users/123",
  "headers": {
    "X-Canary": "true",
    "Authorization": "Bearer token123"
  },
  "source_ip": "203.0.113.50"
}

Response: 200 OK
{
  "matched_rule": {
    "rule_id": "rule-api-v2-canary",
    "name": "API v2 Canary Route",
    "priority": 100
  },
  "selected_pool": "pool-api-v2-canary",
  "selected_backend": "backend-us-east-1a-api-0042",
  "transforms_applied": {
    "headers_added": ["X-Forwarded-Proto", "X-Request-ID", "X-Real-IP"],
    "headers_removed": [],
    "url_rewritten": "/v2/users/123 -> /api/users/123"
  },
  "rate_limit_status": "ALLOWED",
  "acl_status": "ALLOWED"
}

7. Rate Limiting Configuration API

Create a Rate Limit Rule

POST /api/v1/rate-limits

Request:
{
  "name": "Per-IP API Rate Limit",
  "scope": "PER_SOURCE_IP",
  "match": {
    "pools": ["pool-api-production"],
    "paths": ["/api/*"]
  },
  "limits": {
    "requests_per_second": 100,
    "burst_size": 200,
    "requests_per_minute": 3000,
    "requests_per_hour": 100000
  },
  "action_on_limit": {
    "status_code": 429,
    "headers": {
      "Retry-After": "${retry_after_seconds}",
      "X-RateLimit-Limit": "${limit}",
      "X-RateLimit-Remaining": "${remaining}",
      "X-RateLimit-Reset": "${reset_timestamp}"
    },
    "body": "{\"error\": \"rate_limit_exceeded\", \"retry_after\": ${retry_after_seconds}}"
  },
  "exemptions": {
    "source_ips": ["10.0.0.0/8"],
    "headers": {"X-Internal-Service": ["true"]},
    "api_keys": ["key-internal-service-*"]
  },
  "enabled": true
}

Response: 201 Created
{
  "rate_limit_id": "rl-per-ip-api",
  "name": "Per-IP API Rate Limit",
  "status": "ACTIVE",
  "created_at": "2024-01-20T10:00:00Z"
}

Get Rate Limit Status for a Client

GET /api/v1/rate-limits/status?source_ip=203.0.113.50&pool_id=pool-api-production

Response: 200 OK
{
  "source_ip": "203.0.113.50",
  "limits": [
    {
      "rule_id": "rl-per-ip-api",
      "name": "Per-IP API Rate Limit",
      "current_rate_rps": 45,
      "limit_rps": 100,
      "remaining_burst": 155,
      "requests_this_minute": 1200,
      "limit_per_minute": 3000,
      "requests_this_hour": 25000,
      "limit_per_hour": 100000,
      "is_limited": false,
      "next_reset": "2024-01-20T15:01:00Z"
    }
  ]
}

Get Rate Limiting Metrics

GET /api/v1/rate-limits/metrics?window=1h

Response: 200 OK
{
  "window": "1h",
  "total_requests_evaluated": 4176000,
  "total_requests_limited": 12500,
  "limit_rate_percent": 0.3,
  "top_limited_ips": [
    {"ip": "203.0.113.100", "limited_count": 5000, "total_requests": 50000},
    {"ip": "198.51.100.50", "limited_count": 3000, "total_requests": 35000}
  ],
  "rules_triggered": [
    {"rule_id": "rl-per-ip-api", "triggered_count": 10000},
    {"rule_id": "rl-global-burst", "triggered_count": 2500}
  ]
}

8. Access Control (ACL) API

Create an ACL Rule

POST /api/v1/acls

Request:
{
  "name": "Block Known Bad Actors",
  "priority": 1,
  "action": "DENY",
  "match": {
    "source_ips": ["192.0.2.0/24", "198.51.100.0/24"],
    "source_countries": ["XX", "YY"],
    "user_agents": ["BadBot/*", "Scrapy/*"]
  },
  "response": {
    "status_code": 403,
    "body": "{\"error\": \"forbidden\"}"
  },
  "logging": true,
  "expires_at": "2024-06-01T00:00:00Z"
}

Response: 201 Created
{
  "acl_id": "acl-block-bad-actors",
  "name": "Block Known Bad Actors",
  "status": "ACTIVE",
  "matched_entries": 512,
  "created_at": "2024-01-20T10:00:00Z"
}

9. Cluster Management API

Get Cluster Status

GET /api/v1/cluster/status

Response: 200 OK
{
  "cluster_id": "lb-cluster-us-east-1",
  "status": "HEALTHY",
  "instances": {
    "total": 200,
    "healthy": 200,
    "draining": 0,
    "unhealthy": 0
  },
  "capacity": {
    "current_rps": 1160000,
    "max_rps": 3000000,
    "utilization_percent": 38.7,
    "headroom_rps": 1840000
  },
  "auto_scaling": {
    "enabled": true,
    "min_instances": 130,
    "max_instances": 400,
    "target_cpu_percent": 60,
    "cooldown_seconds": 300
  },
  "version": "lb-v4.2.1",
  "last_config_update": "2024-01-20T14:00:00Z",
  "config_version": 1042
}

Trigger Manual Scale

POST /api/v1/cluster/scale

Request:
{
  "target_instances": 300,
  "reason": "Preparing for product launch traffic spike",
  "duration_hours": 4
}

Response: 202 Accepted
{
  "scale_operation_id": "scale-abc123",
  "current_instances": 200,
  "target_instances": 300,
  "status": "SCALING_UP",
  "estimated_completion": "2024-01-20T15:05:00Z",
  "auto_revert_at": "2024-01-20T19:00:00Z"
}

Error Responses

Standard Error Format

{
  "error": {
    "code": "BACKEND_NOT_FOUND",
    "message": "Backend server 'backend-xyz' not found in pool 'pool-api-production'",
    "details": {
      "pool_id": "pool-api-production",
      "server_id": "backend-xyz"
    },
    "request_id": "req-abc123-def456",
    "timestamp": "2024-01-20T15:00:00Z",
    "documentation_url": "https://docs.example.com/lb/errors#BACKEND_NOT_FOUND"
  }
}

Common Error Codes

400 Bad Request:        INVALID_PARAMETER, VALIDATION_ERROR
401 Unauthorized:       AUTHENTICATION_REQUIRED, TOKEN_EXPIRED
403 Forbidden:          INSUFFICIENT_PERMISSIONS, IP_NOT_ALLOWED
404 Not Found:          BACKEND_NOT_FOUND, POOL_NOT_FOUND, RULE_NOT_FOUND
409 Conflict:           BACKEND_HAS_CONNECTIONS, DUPLICATE_RULE, VERSION_CONFLICT
422 Unprocessable:      INVALID_CERTIFICATE, HEALTH_CHECK_FAILED
429 Too Many Requests:  API_RATE_LIMITED
500 Internal Error:     INTERNAL_ERROR
503 Service Unavailable: CLUSTER_OVERLOADED, MAINTENANCE_MODE

Webhook Notifications

Configure Webhooks

POST /api/v1/webhooks

Request:
{
  "name": "PagerDuty Alerts",
  "url": "https://events.pagerduty.com/v2/enqueue",
  "events": [
    "backend.unhealthy",
    "backend.recovered",
    "pool.degraded",
    "certificate.expiring",
    "rate_limit.triggered",
    "cluster.scaling"
  ],
  "headers": {
    "Authorization": "Token token=pd-routing-key-123"
  },
  "retry_policy": {
    "max_retries": 3,
    "backoff_seconds": [5, 30, 120]
  }
}

Response: 201 Created
{
  "webhook_id": "wh-pagerduty-alerts",
  "status": "ACTIVE",
  "created_at": "2024-01-20T10:00:00Z"
}

Webhook Event Payload Example

{
  "event_id": "evt-abc123",
  "event_type": "backend.unhealthy",
  "timestamp": "2024-01-20T15:00:00Z",
  "severity": "WARNING",
  "data": {
    "server_id": "backend-us-east-1a-api-0042",
    "pool_id": "pool-api-production",
    "previous_status": "HEALTHY",
    "current_status": "UNHEALTHY",
    "reason": "3 consecutive health check failures",
    "last_error": "Connection refused",
    "active_connections_at_failure": 1523
  }
}

This API design provides comprehensive load balancer management with proper REST semantics, detailed request/response examples, and production-ready error handling. The separation between control plane (management API) and data plane (traffic forwarding) ensures that administrative operations never impact request processing performance.