Distributed Task Scheduler - Scale and Constraints
Traffic Scale Analysis
Task Volume
- Total Scheduled Tasks: 100 million active tasks
- New Tasks per Day: 1 million new schedules
- Task Executions per Day: 10 million executions
- Peak Executions: 10,000 tasks per second
- Average Executions: 3,000 tasks per second
- Concurrent Running Tasks: 100,000 at peak
Task Distribution
- One-Time Tasks: 30% (30M tasks)
- Recurring Tasks: 50% (50M tasks)
- Cron Tasks: 20% (20M tasks)
- High Priority: 10% (10M tasks)
- Medium Priority: 60% (60M tasks)
- Low Priority: 30% (30M tasks)
Storage Requirements
Task Metadata Storage
Per Task:
- Task ID: 16 bytes (UUID)
- Schedule: 100 bytes (cron expression, timezone)
- Payload: 1 KB average (task data)
- Metadata: 200 bytes (status, timestamps, retries)
- Total per task: ~1.3 KB
Total Tasks: 100M × 1.3 KB = 130 GB
With indexes (2x): 260 GB
With replication (3x): 780 GBExecution History Storage
Per Execution:
- Execution ID: 16 bytes
- Task ID: 16 bytes
- Start/End time: 16 bytes
- Status: 1 byte
- Duration: 8 bytes
- Logs: 10 KB average
- Total per execution: ~10 KB
Daily Executions: 10M × 10 KB = 100 GB/day
Monthly: 3 TB
With compression (10:1): 300 GB/month
Retention (90 days): 900 GBWorker State Storage
Per Worker:
- Worker ID: 16 bytes
- Heartbeat: 8 bytes (timestamp)
- Capacity: 4 bytes
- Current tasks: 100 bytes (list of task IDs)
- Total per worker: ~128 bytes
Total Workers: 10,000 × 128 bytes = 1.28 MB
Negligible storageNetwork Bandwidth
Inbound Traffic (Task Scheduling)
Task Scheduling:
- 1M tasks/day ÷ 86400s = 12 tasks/s average
- Peak: 120 tasks/s (10x average)
- Payload: 1.3 KB per task
- Total: 120 × 1.3 KB = 156 KB/s peakOutbound Traffic (Task Execution)
Task Distribution to Workers:
- 10K tasks/s × 1.3 KB = 13 MB/s
- Execution results: 10K tasks/s × 10 KB = 100 MB/s
- Total outbound: ~113 MB/s peakInternal Traffic (Coordination)
Worker Heartbeats:
- 10K workers × 100 bytes × 1/second = 1 MB/s
Task Status Updates:
- 10K tasks/s × 200 bytes = 2 MB/s
Leader Election:
- Negligible (<1 MB/s)
Total Internal: ~3 MB/sCompute Requirements
Scheduler Service
Per Scheduler Instance:
- Task scanning: 2 CPU cores
- Task matching: 2 CPU cores
- Task distribution: 1 CPU core
- Coordination: 1 CPU core
- Total: 6 CPU cores per instance
Cluster: 10 instances × 6 cores = 60 CPU coresWorker Nodes
Per Worker:
- Task execution: 4 CPU cores
- Communication: 0.5 CPU cores
- Monitoring: 0.5 CPU cores
- Total: 5 CPU cores per worker
Cluster: 10,000 workers × 5 cores = 50,000 CPU coresCoordinator Service
Per Coordinator:
- Leader election: 1 CPU core
- State management: 2 CPU cores
- Health monitoring: 1 CPU core
- Total: 4 CPU cores
Cluster: 3 coordinators × 4 cores = 12 CPU coresMemory Requirements
Scheduler Service
Per Instance:
- Task queue: 10 GB (upcoming tasks)
- Task cache: 5 GB (hot tasks)
- Worker registry: 100 MB
- Application: 2 GB
- Total: ~17 GB per instance
Cluster: 10 instances × 17 GB = 170 GBWorker Nodes
Per Worker:
- Task execution: 2 GB
- Task queue: 500 MB
- Application: 500 MB
- Total: ~3 GB per worker
Cluster: 10,000 workers × 3 GB = 30 TBDatabase
Task Database:
- Active data: 260 GB
- Indexes: 260 GB
- Buffer pool: 100 GB
- Total: ~620 GB
Execution History:
- Recent data: 900 GB
- Indexes: 300 GB
- Total: ~1.2 TBLatency Constraints
Scheduling Latency Breakdown
Client → API Gateway: 10ms
API Gateway → Scheduler: 5ms
Scheduler Processing: 20ms
Database Write: 30ms
Response: 10ms
Total: 75ms (P50), 150ms (P99)Execution Latency Breakdown
Scheduled Time Reached: 0ms
Scheduler Detects: 100ms (scan interval)
Task Queued: 10ms
Worker Picks Up: 50ms (poll interval)
Task Starts: 100ms (worker processing)
Total: 260ms (P50), 1000ms (P99)Query Latency
Task Status Query:
- Cache hit: 5ms
- Database query: 50ms
- Total: 5-50ms
Task History Query:
- Database query: 100ms
- Aggregation: 50ms
- Total: 150msTiming Precision
Clock Synchronization
NTP Synchronization:
- Sync interval: 60 seconds
- Accuracy: ±10ms typical
- Max drift: ±100ms before resync
Clock Skew Handling:
- Monitor skew across servers
- Alert if skew >1 second
- Reject tasks if skew >5 secondsScheduling Precision
Precision Tiers:
High Precision (±100ms):
- 10% of tasks
- Scan every 100ms
- Higher resource usage
Normal Precision (±1s):
- 80% of tasks
- Scan every 1 second
- Standard resource usage
Low Precision (±10s):
- 10% of tasks
- Scan every 10 seconds
- Lower resource usageFailure Scenarios
Scheduler Failures
Single Scheduler Failure:
- Impact: 10% capacity loss
- Recovery: <10 seconds (failover)
- Data loss: 0 (replicated state)
All Schedulers Failure:
- Impact: No new task executions
- Recovery: <30 seconds (restart)
- Data loss: 0 (persistent storage)Worker Failures
Single Worker Failure:
- Impact: 0.01% capacity loss
- Recovery: Immediate (task reassignment)
- Data loss: In-flight tasks (retry)
Mass Worker Failure (10%):
- Impact: 10% capacity loss
- Recovery: <5 minutes (auto-scaling)
- Data loss: In-flight tasks (retry)Database Failures
Primary Database Failure:
- Impact: Read-only mode
- Recovery: <30 seconds (replica promotion)
- Data loss: <1 second of writes
Complete Database Failure:
- Impact: System halt
- Recovery: <15 minutes (restore from backup)
- Data loss: <1 minute (WAL replay)Cost Analysis
Infrastructure Costs (Monthly)
Schedulers:
- 10 instances × $200 = $2,000
Workers:
- 10,000 instances × $100 = $1,000,000
Coordinators:
- 3 instances × $200 = $600
Database:
- 2 TB storage × $0.10/GB = $200
- Compute: $5,000
Load Balancers:
- 5 × $50 = $250
Total Infrastructure: ~$1,008,000/monthOperational Costs (Monthly)
Engineering: $50,000
Monitoring: $5,000
Support: $10,000
Total Operational: $65,000
Total Monthly Cost: $1,073,000Cost Per Task
Monthly Executions: 10M × 30 = 300M tasks
Cost per task: $1,073,000 ÷ 300M = $0.0036
Cost per 1000 tasks: $3.60Scaling Strategies
Horizontal Scaling
Add Schedulers:
- Linear scaling up to 100 instances
- Partition tasks by hash
- No coordination overhead
Add Workers:
- Linear scaling up to 100,000 workers
- Auto-scaling based on queue depth
- Elastic capacity
Add Database Shards:
- Shard by task ID
- 10 shards → 100 shards
- Linear scalingVertical Scaling
Larger Schedulers:
- Up to 32 cores, 128 GB RAM
- 10x capacity per instance
- Reduce cluster size
Larger Workers:
- Up to 64 cores, 256 GB RAM
- Run more concurrent tasks
- Better resource utilizationBottleneck Analysis
Primary Bottlenecks
1. Database Write Throughput:
- 10K writes/second limit
- Mitigation: Batch writes, sharding
2. Task Scanning:
- O(n) scan of upcoming tasks
- Mitigation: Time-based indexing, partitioning
3. Worker Communication:
- Network latency to workers
- Mitigation: Regional deployment, batching
4. Clock Synchronization:
- Drift causes timing issues
- Mitigation: NTP, monitoring, tolerance
5. Coordination Overhead:
- Leader election, state sync
- Mitigation: Raft, efficient protocolsThis scale analysis provides the foundation for designing a task scheduler that handles millions of tasks while maintaining precise timing and reliability.