Scale & Constraints

📖 7 min read 📄 Part 2 of 10

Distributed Task Scheduler - Scale and Constraints

Traffic Scale Analysis

Task Volume

  • Total Scheduled Tasks: 100 million active tasks
  • New Tasks per Day: 1 million new schedules
  • Task Executions per Day: 10 million executions
  • Peak Executions: 10,000 tasks per second
  • Average Executions: 3,000 tasks per second
  • Concurrent Running Tasks: 100,000 at peak

Task Distribution

  • One-Time Tasks: 30% (30M tasks)
  • Recurring Tasks: 50% (50M tasks)
  • Cron Tasks: 20% (20M tasks)
  • High Priority: 10% (10M tasks)
  • Medium Priority: 60% (60M tasks)
  • Low Priority: 30% (30M tasks)

Storage Requirements

Task Metadata Storage

Per Task:
- Task ID: 16 bytes (UUID)
- Schedule: 100 bytes (cron expression, timezone)
- Payload: 1 KB average (task data)
- Metadata: 200 bytes (status, timestamps, retries)
- Total per task: ~1.3 KB

Total Tasks: 100M × 1.3 KB = 130 GB

With indexes (2x): 260 GB
With replication (3x): 780 GB

Execution History Storage

Per Execution:
- Execution ID: 16 bytes
- Task ID: 16 bytes
- Start/End time: 16 bytes
- Status: 1 byte
- Duration: 8 bytes
- Logs: 10 KB average
- Total per execution: ~10 KB

Daily Executions: 10M × 10 KB = 100 GB/day
Monthly: 3 TB
With compression (10:1): 300 GB/month
Retention (90 days): 900 GB

Worker State Storage

Per Worker:
- Worker ID: 16 bytes
- Heartbeat: 8 bytes (timestamp)
- Capacity: 4 bytes
- Current tasks: 100 bytes (list of task IDs)
- Total per worker: ~128 bytes

Total Workers: 10,000 × 128 bytes = 1.28 MB
Negligible storage

Network Bandwidth

Inbound Traffic (Task Scheduling)

Task Scheduling:
- 1M tasks/day ÷ 86400s = 12 tasks/s average
- Peak: 120 tasks/s (10x average)
- Payload: 1.3 KB per task
- Total: 120 × 1.3 KB = 156 KB/s peak

Outbound Traffic (Task Execution)

Task Distribution to Workers:
- 10K tasks/s × 1.3 KB = 13 MB/s
- Execution results: 10K tasks/s × 10 KB = 100 MB/s
- Total outbound: ~113 MB/s peak

Internal Traffic (Coordination)

Worker Heartbeats:
- 10K workers × 100 bytes × 1/second = 1 MB/s

Task Status Updates:
- 10K tasks/s × 200 bytes = 2 MB/s

Leader Election:
- Negligible (<1 MB/s)

Total Internal: ~3 MB/s

Compute Requirements

Scheduler Service

Per Scheduler Instance:
- Task scanning: 2 CPU cores
- Task matching: 2 CPU cores
- Task distribution: 1 CPU core
- Coordination: 1 CPU core
- Total: 6 CPU cores per instance

Cluster: 10 instances × 6 cores = 60 CPU cores

Worker Nodes

Per Worker:
- Task execution: 4 CPU cores
- Communication: 0.5 CPU cores
- Monitoring: 0.5 CPU cores
- Total: 5 CPU cores per worker

Cluster: 10,000 workers × 5 cores = 50,000 CPU cores

Coordinator Service

Per Coordinator:
- Leader election: 1 CPU core
- State management: 2 CPU cores
- Health monitoring: 1 CPU core
- Total: 4 CPU cores

Cluster: 3 coordinators × 4 cores = 12 CPU cores

Memory Requirements

Scheduler Service

Per Instance:
- Task queue: 10 GB (upcoming tasks)
- Task cache: 5 GB (hot tasks)
- Worker registry: 100 MB
- Application: 2 GB
- Total: ~17 GB per instance

Cluster: 10 instances × 17 GB = 170 GB

Worker Nodes

Per Worker:
- Task execution: 2 GB
- Task queue: 500 MB
- Application: 500 MB
- Total: ~3 GB per worker

Cluster: 10,000 workers × 3 GB = 30 TB

Database

Task Database:
- Active data: 260 GB
- Indexes: 260 GB
- Buffer pool: 100 GB
- Total: ~620 GB

Execution History:
- Recent data: 900 GB
- Indexes: 300 GB
- Total: ~1.2 TB

Latency Constraints

Scheduling Latency Breakdown

Client → API Gateway: 10ms
API Gateway → Scheduler: 5ms
Scheduler Processing: 20ms
Database Write: 30ms
Response: 10ms
Total: 75ms (P50), 150ms (P99)

Execution Latency Breakdown

Scheduled Time Reached: 0ms
Scheduler Detects: 100ms (scan interval)
Task Queued: 10ms
Worker Picks Up: 50ms (poll interval)
Task Starts: 100ms (worker processing)
Total: 260ms (P50), 1000ms (P99)

Query Latency

Task Status Query:
- Cache hit: 5ms
- Database query: 50ms
- Total: 5-50ms

Task History Query:
- Database query: 100ms
- Aggregation: 50ms
- Total: 150ms

Timing Precision

Clock Synchronization

NTP Synchronization:
- Sync interval: 60 seconds
- Accuracy: ±10ms typical
- Max drift: ±100ms before resync

Clock Skew Handling:
- Monitor skew across servers
- Alert if skew >1 second
- Reject tasks if skew >5 seconds

Scheduling Precision

Precision Tiers:

High Precision (±100ms):
- 10% of tasks
- Scan every 100ms
- Higher resource usage

Normal Precision (±1s):
- 80% of tasks
- Scan every 1 second
- Standard resource usage

Low Precision (±10s):
- 10% of tasks
- Scan every 10 seconds
- Lower resource usage

Failure Scenarios

Scheduler Failures

Single Scheduler Failure:
- Impact: 10% capacity loss
- Recovery: <10 seconds (failover)
- Data loss: 0 (replicated state)

All Schedulers Failure:
- Impact: No new task executions
- Recovery: <30 seconds (restart)
- Data loss: 0 (persistent storage)

Worker Failures

Single Worker Failure:
- Impact: 0.01% capacity loss
- Recovery: Immediate (task reassignment)
- Data loss: In-flight tasks (retry)

Mass Worker Failure (10%):
- Impact: 10% capacity loss
- Recovery: <5 minutes (auto-scaling)
- Data loss: In-flight tasks (retry)

Database Failures

Primary Database Failure:
- Impact: Read-only mode
- Recovery: <30 seconds (replica promotion)
- Data loss: <1 second of writes

Complete Database Failure:
- Impact: System halt
- Recovery: <15 minutes (restore from backup)
- Data loss: <1 minute (WAL replay)

Cost Analysis

Infrastructure Costs (Monthly)

Schedulers:
- 10 instances × $200 = $2,000

Workers:
- 10,000 instances × $100 = $1,000,000

Coordinators:
- 3 instances × $200 = $600

Database:
- 2 TB storage × $0.10/GB = $200
- Compute: $5,000

Load Balancers:
- 5 × $50 = $250

Total Infrastructure: ~$1,008,000/month

Operational Costs (Monthly)

Engineering: $50,000
Monitoring: $5,000
Support: $10,000
Total Operational: $65,000

Total Monthly Cost: $1,073,000

Cost Per Task

Monthly Executions: 10M × 30 = 300M tasks
Cost per task: $1,073,000 ÷ 300M = $0.0036
Cost per 1000 tasks: $3.60

Scaling Strategies

Horizontal Scaling

Add Schedulers:
- Linear scaling up to 100 instances
- Partition tasks by hash
- No coordination overhead

Add Workers:
- Linear scaling up to 100,000 workers
- Auto-scaling based on queue depth
- Elastic capacity

Add Database Shards:
- Shard by task ID
- 10 shards 100 shards
- Linear scaling

Vertical Scaling

Larger Schedulers:
- Up to 32 cores, 128 GB RAM
- 10x capacity per instance
- Reduce cluster size

Larger Workers:
- Up to 64 cores, 256 GB RAM
- Run more concurrent tasks
- Better resource utilization

Bottleneck Analysis

Primary Bottlenecks

1. Database Write Throughput:
   - 10K writes/second limit
   - Mitigation: Batch writes, sharding

2. Task Scanning:
   - O(n) scan of upcoming tasks
   - Mitigation: Time-based indexing, partitioning

3. Worker Communication:
   - Network latency to workers
   - Mitigation: Regional deployment, batching

4. Clock Synchronization:
   - Drift causes timing issues
   - Mitigation: NTP, monitoring, tolerance

5. Coordination Overhead:
   - Leader election, state sync
   - Mitigation: Raft, efficient protocols

This scale analysis provides the foundation for designing a task scheduler that handles millions of tasks while maintaining precise timing and reliability.