Scaling Considerations for Google Docs
Overview
Scaling a collaborative document editing system to support 100M+ daily active users with real-time synchronization requires careful consideration of operational transform servers, document distribution, and global infrastructure.
1. Operational Transform Server Scaling
Challenge: OT Computation Complexity
- Problem: O(n²) complexity for n concurrent operations
- Impact: 100 concurrent editors = 10,000 transform operations
- Bottleneck: Single OT server becomes CPU-bound at 50-100 concurrent editors
Solution: Hierarchical OT Architecture
┌─────────────────────────────────────────────────────────┐
│ Global OT Coordinator │
│ (Handles cross-region conflicts) │
└─────────────────────────────────────────────────────────┘
│
┌───────────────────┼───────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Regional OT │ │ Regional OT │ │ Regional OT │
│ Server │ │ Server │ │ Server │
│ (US-East) │ │ (EU-West) │ │ (AP-South) │
└──────────────┘ └──────────────┘ └──────────────┘
│ │ │
┌───┴───┐ ┌───┴───┐ ┌───┴───┐
▼ ▼ ▼ ▼ ▼ ▼
OT-1 OT-2 OT-3 OT-4 OT-5 OT-6
(50 (50 (50 (50 (50 (50
users) users) users) users) users) users)Implementation Strategy:
- Document Affinity: Pin document to specific OT server
- Load-Based Routing: Route new editors to least-loaded server
- Hot Document Detection: Identify documents with >50 concurrent editors
- Dynamic Splitting: Split hot documents across multiple OT servers with coordination layer
OT Server Capacity Planning
Single OT Server Capacity:
- 50 concurrent editors (optimal)
- 100 concurrent editors (maximum)
- 500 operations/second processing rate
- 16 CPU cores, 64GB RAM
Cluster Sizing:
- 100M DAU × 5% concurrent = 5M concurrent users
- Average 3 users per document = 1.67M active documents
- 1% hot documents (>10 editors) = 16,700 hot documents
- Hot documents need 16,700 × 2 = 33,400 OT servers
- Regular documents need 1.65M / 50 = 33,000 OT servers
- Total: ~66,000 OT servers globally2. Document Sharding and Distribution
Sharding Strategy: Document ID-Based
-- Shard key: document_id
-- Sharding function: HASH(document_id) % num_shards
Shard Distribution:
- 1 billion documents
- 10,000 shards
- 100,000 documents per shard
- Shard size: ~50GB (500KB avg document)Hot Shard Problem
Detection:
- Monitor QPS per shard (threshold: 10,000 QPS)
- Track concurrent editors per shard
- Identify celebrity documents (viral docs with 1000+ editors)
Mitigation:
┌─────────────────────────────────────────────────────────┐
│ Hot Document Handling │
└─────────────────────────────────────────────────────────┘
1. Read Replicas:
Primary Shard → Read Replica 1 (read-only)
→ Read Replica 2 (read-only)
→ Read Replica 3 (read-only)
2. Caching Layer:
- Cache document content in Redis (TTL: 5 minutes)
- Cache recent operations in memory
- Serve 90% of reads from cache
3. Rate Limiting:
- Limit new editors to 1000 per document
- Queue additional editors with "view-only" mode
- Upgrade to edit mode when slots available3. Global Distribution Architecture
Multi-Region Deployment
┌──────────────────────────────────────────────────────────┐
│ Global Load Balancer │
│ (GeoDNS + Anycast Routing) │
└──────────────────────────────────────────────────────────┘
│
┌───────────────────┼───────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ US Region │ │ EU Region │ │ Asia Region │
│ │ │ │ │ │
│ • 5 zones │ │ • 4 zones │ │ • 6 zones │
│ • 40% users │ │ • 30% users │ │ • 30% users │
└──────────────┘ └──────────────┘ └──────────────┘Cross-Region Synchronization
Challenge: Speed of light latency
- US-East to EU-West: ~80ms RTT
- US-East to Asia-Pacific: ~180ms RTT
- EU-West to Asia-Pacific: ~200ms RTT
Strategy: Regional OT with Global Coordination
User in US edits document:
1. Operation sent to US Regional OT Server (5ms)
2. Operation applied locally and acknowledged (10ms)
3. Operation replicated to Spanner (cross-region) (50-100ms)
4. Other regions pull updates asynchronously (100-200ms)
Result:
- Local users see changes in 15ms
- Remote users see changes in 100-200ms
- Acceptable for collaborative editing4. WebSocket Connection Scaling
Connection Distribution
Target: 5M concurrent WebSocket connections
Per Server Capacity:
- 50,000 connections per server (C5.4xlarge)
- 8 vCPUs, 32GB RAM
- Network bandwidth: 5 Gbps
Required Servers:
- 5M / 50K = 100 WebSocket servers per region
- 3 regions × 100 = 300 servers globally
- Add 50% buffer = 450 servers totalConnection Management Strategy
// Sticky session routing
const getWebSocketServer = (userId) => {
const hash = murmurhash(userId);
const serverIndex = hash % numServers;
return webSocketServers[serverIndex];
};
// Connection pooling
class WebSocketPool {
constructor(maxConnections = 50000) {
this.connections = new Map();
this.maxConnections = maxConnections;
}
addConnection(userId, ws) {
if (this.connections.size >= this.maxConnections) {
throw new Error('Connection limit reached');
}
this.connections.set(userId, ws);
}
removeConnection(userId) {
this.connections.delete(userId);
}
broadcast(documentId, operation) {
// Send to all users editing this document
const editors = this.getDocumentEditors(documentId);
editors.forEach(userId => {
const ws = this.connections.get(userId);
if (ws && ws.readyState === WebSocket.OPEN) {
ws.send(JSON.stringify(operation));
}
});
}
}5. Database Scaling
Spanner Scaling for Document State
Configuration:
- 3 regions (US, EU, Asia)
- 5 nodes per region
- 15 nodes total
- 10TB storage per node
- 150TB total capacity
Capacity:
- 1 billion documents × 100KB avg = 100TB
- Fits within 150TB capacity
- Headroom for growth: 50TB
Performance:
- 10,000 QPS per node
- 15 nodes × 10K = 150,000 QPS total
- Actual load: 50,000 QPS (33% utilization)Bigtable Scaling for Revision History
Configuration:
- 3 clusters (US, EU, Asia)
- 30 nodes per cluster
- 90 nodes total
- 8TB SSD per node
- 720TB total capacity
Capacity:
- 1 billion documents × 50 revisions avg = 50B revisions
- 50B × 10KB avg = 500TB
- Fits within 720TB capacity
Performance:
- 10,000 QPS per node (reads)
- 90 nodes × 10K = 900,000 QPS total
- Actual load: 100,000 QPS (11% utilization)Redis Scaling for OT State
Configuration:
- Redis Cluster with 100 shards
- 3 replicas per shard
- 300 Redis instances total
- 64GB RAM per instance
- 19.2TB total memory
Capacity:
- 1.67M active documents × 10KB state = 16.7GB
- Fits easily in 19.2TB capacity
- 99.9% memory available for caching
Performance:
- 100,000 ops/sec per instance
- 300 instances × 100K = 30M ops/sec
- Actual load: 5M ops/sec (17% utilization)6. Caching Strategy
Multi-Level Cache Hierarchy
L1 Cache (Client-Side):
- IndexedDB for offline document storage
- 100MB per user
- Cache hit rate: 60%
L2 Cache (Redis):
- Document content (TTL: 5 minutes)
- Recent operations (TTL: 1 minute)
- User presence (TTL: 30 seconds)
- Cache hit rate: 35%
L3 Cache (CDN):
- Static assets (images, fonts)
- Document exports (PDF, DOCX)
- Cache hit rate: 90%
Overall Cache Hit Rate: 95%
Database Load Reduction: 20xCache Invalidation Strategy
// Write-through cache for document updates
async function updateDocument(docId, operation) {
// 1. Apply operation to database
await spanner.applyOperation(docId, operation);
// 2. Invalidate cache
await redis.del(`doc:${docId}`);
// 3. Broadcast to active editors
await broadcastOperation(docId, operation);
// 4. Update cache with new version (optional)
const newDoc = await spanner.getDocument(docId);
await redis.setex(`doc:${docId}`, 300, JSON.stringify(newDoc));
}7. Autoscaling Configuration
OT Server Autoscaling
autoscaling:
metric: cpu_utilization
target: 70%
min_instances: 1000
max_instances: 100000
scale_up:
threshold: 80%
cooldown: 60s
increment: 10%
scale_down:
threshold: 50%
cooldown: 300s
decrement: 5%WebSocket Server Autoscaling
autoscaling:
metric: connection_count
target: 40000 # 80% of 50K capacity
min_instances: 100
max_instances: 1000
scale_up:
threshold: 45000
cooldown: 30s
increment: 20%
scale_down:
threshold: 30000
cooldown: 600s
decrement: 10%8. Performance Optimization
Operation Batching
// Batch multiple operations to reduce network overhead
class OperationBatcher {
constructor(flushInterval = 50) {
this.operations = [];
this.flushInterval = flushInterval;
this.timer = null;
}
addOperation(op) {
this.operations.push(op);
if (!this.timer) {
this.timer = setTimeout(() => this.flush(), this.flushInterval);
}
// Flush immediately if batch is large
if (this.operations.length >= 10) {
this.flush();
}
}
flush() {
if (this.operations.length === 0) return;
const batch = this.operations;
this.operations = [];
clearTimeout(this.timer);
this.timer = null;
// Send batch to server
this.sendBatch(batch);
}
}Compression
// Compress operations before transmission
const compressOperation = (operation) => {
const json = JSON.stringify(operation);
const compressed = pako.deflate(json);
return compressed;
};
// Typical compression ratios:
// - Text operations: 5:1
// - JSON metadata: 3:1
// - Network bandwidth savings: 70%9. Monitoring and Alerting
Key Metrics
Performance Metrics:
- Operation latency (p50, p95, p99)
- WebSocket connection count
- OT server CPU utilization
- Database query latency
- Cache hit rate
Business Metrics:
- Active documents
- Concurrent editors per document
- Operations per second
- Document creation rate
- User engagement time
Alerts:
- OT server CPU > 80% (scale up)
- WebSocket connections > 45K per server (scale up)
- Operation latency p99 > 500ms (investigate)
- Cache hit rate < 90% (tune cache)
- Database errors > 0.1% (critical)10. Cost Optimization
Infrastructure Costs (Monthly)
Compute:
- OT servers: 66,000 × $150 = $9.9M
- WebSocket servers: 450 × $150 = $67.5K
- API servers: 1,000 × $150 = $150K
Total Compute: $10.1M/month
Storage:
- Spanner: 150TB × $300/TB = $45K
- Bigtable: 720TB × $170/TB = $122K
- Redis: 19.2TB × $500/TB = $9.6M
Total Storage: $9.8M/month
Network:
- Egress: 10PB × $50/TB = $500K
- Cross-region: 5PB × $20/TB = $100K
Total Network: $600K/month
Grand Total: $20.5M/month
Cost per DAU: $0.205Optimization Strategies
- Reserved Instances: 40% savings on compute
- Spot Instances: Use for non-critical OT servers
- Compression: Reduce storage and network costs by 70%
- Tiered Storage: Move old revisions to cold storage
- Regional Optimization: Deploy more capacity in high-usage regions
Summary
Scaling Google Docs to 100M+ DAU requires:
- 66,000 OT servers for real-time collaboration
- Hierarchical OT architecture to handle hot documents
- Multi-region deployment with regional OT coordination
- 450 WebSocket servers for 5M concurrent connections
- Spanner + Bigtable for globally distributed storage
- Multi-level caching achieving 95% cache hit rate
- Autoscaling based on CPU and connection metrics
- $20.5M/month infrastructure cost ($0.205 per DAU)
The key to scaling is distributing OT computation, using regional coordination for global documents, and aggressive caching to reduce database load.