Container Orchestration System - Scale and Constraints
Cluster Scale
Cluster Size Limits
Nodes per Cluster: 5,000 nodes
Pods per Cluster: 150,000 pods
Containers per Cluster: 300,000 containers
Pods per Node: 110 pods (default limit)
Services per Cluster: 10,000 services
Namespaces per Cluster: 10,000 namespacesResource Capacity
Per Node:
- CPU: 64 cores
- Memory: 256 GB
- Disk: 1 TB SSD
- Network: 10 Gbps
Total Cluster:
- CPU: 320,000 cores
- Memory: 1.28 PB
- Disk: 5 PB
- Network: 50 TbpsAPI Server Load
API Request Volume
API Requests per Second: 10,000 average, 50,000 peak
Request Types:
- List operations: 40% (4K req/s)
- Watch operations: 30% (3K req/s)
- Create/Update: 20% (2K req/s)
- Delete: 10% (1K req/s)
Response Times:
- List: <100ms
- Get: <10ms
- Create: <50ms
- Update: <50ms
- Delete: <50msetcd Load
Writes per Second: 1,000 writes/second
Reads per Second: 10,000 reads/second
Watch Streams: 10,000 active watches
Database Size: 8 GB (default limit)
Key Count: 1 million keysScheduling Scale
Scheduling Throughput
Pods to Schedule: 1,000 pods/second during scale-up
Scheduling Latency: <1 second per pod
Node Evaluation: 5,000 nodes per scheduling decision
Scheduling Decisions: 1,000 decisions/second
Scheduling Algorithm Complexity:
- Filter nodes: O(n) where n = 5,000 nodes
- Score nodes: O(n log n)
- Select best: O(1)
Total: O(n log n) per podRescheduling Load
Failed Pods: 100 pods/minute (0.1% failure rate)
Node Failures: 5 nodes/hour (0.1% failure rate)
Pods to Reschedule: 550 pods/hour (110 pods/node × 5 nodes)
Evictions: 50 pods/minute (resource pressure)Network Scale
Service Discovery
Service Lookups: 100,000 lookups/second
DNS Queries: 50,000 queries/second
Service Endpoints: 10,000 services × 10 pods = 100K endpoints
Endpoint Updates: 1,000 updates/secondNetwork Traffic
Pod-to-Pod Traffic:
- 300K containers
- 10% active communication
- 30K active connections
- 1 Gbps per connection
- Total: 30 Tbps internal traffic
Ingress Traffic:
- 10K services exposed
- 100K requests/second
- 10 MB average response
- Total: 1 TB/second = 8 Tbps
Egress Traffic:
- External API calls
- Database connections
- Similar to ingress: 8 TbpsStorage Scale
Persistent Volumes
Total Volumes: 50,000 persistent volumes
Volume Sizes: 100 GB average
Total Storage: 5 PB
Volume Types:
- SSD (fast): 20% = 1 PB
- HDD (standard): 80% = 4 PB
Volume Operations:
- Create: 100 volumes/minute
- Delete: 50 volumes/minute
- Attach: 200 operations/minute
- Detach: 200 operations/minuteContainer Images
Unique Images: 10,000 images
Average Image Size: 500 MB
Total Image Storage: 5 TB
Image Pulls: 10,000 pulls/minute during scale-up
Image Registry Bandwidth: 5 GB/s = 40 GbpsControl Plane Scale
etcd Cluster
Nodes: 5 nodes (HA)
Database Size: 8 GB
Keys: 1 million keys
Writes: 1,000 writes/second
Reads: 10,000 reads/second
Watch Streams: 10,000 active watches
Performance:
- Write latency: <10ms
- Read latency: <1ms
- Consensus latency: <50ms (Raft)API Server
Instances: 10 API servers (HA)
Requests: 10,000 req/second total = 1,000 req/s per instance
Connections: 50,000 concurrent connections
Memory: 32 GB per instance
CPU: 16 cores per instanceController Manager
Controllers: 50+ controllers
Reconciliation Loops: 100 loops/second per controller
Total Reconciliations: 5,000 reconciliations/second
CPU: 8 cores
Memory: 16 GBScheduler
Instances: 5 schedulers (HA, leader election)
Scheduling Rate: 1,000 pods/second
Node Evaluations: 5,000 nodes per pod
CPU: 16 cores
Memory: 32 GBData Plane Scale
Kubelet (per node)
Pods per Node: 110 pods
Containers per Node: 220 containers
CPU: 2 cores reserved for kubelet
Memory: 4 GB reserved
API Calls: 10 calls/second to API serverContainer Runtime
Runtime: containerd / CRI-O
Container Starts: 10 containers/second per node
Container Stops: 5 containers/second per node
Image Pulls: 1 image/minute per node
CPU Overhead: 1 core
Memory Overhead: 2 GBNetwork Plugin (CNI)
Plugin: Calico / Cilium / Flannel
IP Addresses: 110 IPs per node
Network Policies: 1,000 policies per cluster
Throughput: 10 Gbps per node
Latency Overhead: <1msCost Estimates
Infrastructure Costs (Monthly)
Compute (5,000 nodes):
- Node cost: $500/month per node
- Total: $2.5M/month
Control Plane:
- API servers: $50K/month
- etcd cluster: $20K/month
- Controllers: $30K/month
Total Control Plane: $100K/month
Network:
- Load balancers: $50K/month
- Bandwidth: $100K/month
Total Network: $150K/month
Storage (5 PB):
- SSD (1 PB): $100K/month
- HDD (4 PB): $40K/month
Total Storage: $140K/month
Grand Total: $2.89M/month = $34.7M/yearCost per Pod
Infrastructure: $34.7M / 150K pods / 12 months = $19.26 per pod per month
Savings vs VMs: 60% (container density)
ROI: 3x through automation and efficiencyPerformance Bottlenecks
Critical Bottlenecks
- etcd Write Throughput: 1,000 writes/second limit
- API Server: 10,000 requests/second per instance
- Scheduler: 1,000 pods/second scheduling rate
- Network: 10 Gbps per node bandwidth
- Storage I/O: 100K IOPS per node
Mitigation Strategies
etcd:
- Limit watch streams
- Aggregate updates
- Pagination for large lists
- Separate etcd for events
API Server:
- Horizontal scaling (10+ instances)
- Caching layer
- Request prioritization
- Rate limiting
Scheduler:
- Parallel scheduling
- Pre-filtering nodes
- Caching node info
- Batch scheduling
Network:
- Network policies optimization
- Service mesh for advanced routing
- eBPF for performance
Storage:
- Local SSD for performance
- Distributed storage for durability
- Volume snapshots for backupThis scale analysis demonstrates the massive infrastructure required to orchestrate hundreds of thousands of containers across thousands of nodes while maintaining performance and reliability.