API Rate Limiter - Variations and Follow-up Questions
Common Variations
1. Distributed Rate Limiter Across Multiple Data Centers
Question: "How would you design a rate limiter that works consistently across multiple geographic regions?"
Key Considerations:
- Cross-region latency (100-300ms)
- Network partitions
- Clock synchronization
- Data consistency
Solution Approach:
Option 1: Global Counter (Strong Consistency)
- Single global Redis cluster
- All regions check same counter
- Pros: 100% accurate
- Cons: High latency (100-300ms)
Option 2: Regional Counters (Eventual Consistency)
- Each region has local counter
- Quota divided among regions
- Periodic sync and rebalancing
- Pros: Low latency (<5ms)
- Cons: 95-99% accurate
Option 3: Hybrid (Recommended)
- Local enforcement (fast path)
- Global coordination (slow path)
- Adaptive quota allocation
- Pros: Balance of speed and accuracy
Implementation:
1. Allocate quota per region (e.g., 40% US, 30% EU, 30% Asia)
2. Each region enforces locally
3. Background sync every 10 seconds
4. Rebalance quotas based on usage patterns
5. Handle edge cases (one region exhausts quota early)2. Hierarchical Rate Limiting
Question: "How would you implement rate limiting at multiple levels (user, organization, API key, endpoint)?"
Solution:
Hierarchy:
Global Limit (10M req/s)
└─ Organization Limit (100K req/s)
└─ User Limit (1K req/s)
└─ API Key Limit (100 req/s)
└─ Endpoint Limit (10 req/s)
Algorithm:
1. Check all applicable limits (bottom-up)
2. Apply most restrictive limit
3. Increment all counters
4. Return decision
Example:
Request from user_123, org_456, endpoint /api/users
Checks:
✓ Global: 8M/10M (allow)
✓ Org 456: 80K/100K (allow)
✓ User 123: 950/1000 (allow)
✗ Endpoint /api/users: 11/10 (deny)
Result: Deny (endpoint limit exceeded)
Optimization:
- Cache limit hierarchy
- Short-circuit on first failure
- Batch counter updates3. Dynamic Rate Limiting Based on System Load
Question: "How would you adjust rate limits dynamically based on backend system health?"
Solution:
Adaptive Rate Limiting:
System Health Metrics:
- CPU utilization
- Memory usage
- Database connection pool
- Response time (P99)
- Error rate
Adjustment Algorithm:
if system_health < 50%:
rate_limit = base_limit * 0.5 # Reduce by 50%
elif system_health < 70%:
rate_limit = base_limit * 0.75 # Reduce by 25%
elif system_health > 90%:
rate_limit = base_limit * 1.2 # Increase by 20%
else:
rate_limit = base_limit
Implementation:
1. Monitor system health every 10 seconds
2. Calculate health score (0-100)
3. Adjust rate limits proportionally
4. Propagate changes to all rate limiters
5. Gradual adjustment (avoid oscillation)
Circuit Breaker Integration:
- Open: Reduce limits to 10%
- Half-Open: Reduce limits to 50%
- Closed: Normal limits4. Rate Limiting with Priorities
Question: "How would you implement rate limiting that prioritizes premium users over free users?"
Solution:
Priority Tiers:
1. Enterprise (highest priority)
2. Premium
3. Free (lowest priority)
Weighted Fair Queuing:
- Enterprise: 50% of capacity
- Premium: 30% of capacity
- Free: 20% of capacity
Implementation:
class PriorityRateLimiter:
def check_limit(self, user):
tier = user.tier
# Check tier-specific limit
if not self.check_tier_limit(tier):
# Tier limit exceeded
if tier == 'free':
return deny()
elif tier == 'premium':
# Try to borrow from free tier
if self.borrow_capacity('free'):
return allow()
return deny()
elif tier == 'enterprise':
# Try to borrow from premium or free
if self.borrow_capacity('premium', 'free'):
return allow()
return deny()
return allow()
Benefits:
- Premium users rarely hit limits
- Free users absorb overflow
- Fair resource allocation
- Revenue protection5. Rate Limiting for Batch Operations
Question: "How would you rate limit batch API requests (e.g., bulk upload of 1000 items)?"
Solution:
Approaches:
Option 1: Count as Single Request
- Simple but unfair
- 1 batch = 1 request
- Allows abuse
Option 2: Count Each Item
- Fair but complex
- 1000 items = 1000 requests
- May exceed limit immediately
Option 3: Weighted Counting (Recommended)
- Balance fairness and usability
- Cost = base_cost + (items * item_cost)
- Example: 1 + (1000 * 0.1) = 101 requests
Implementation:
def calculate_cost(request):
if request.is_batch:
base_cost = 1
item_cost = 0.1
total_cost = base_cost + (request.item_count * item_cost)
return min(total_cost, max_batch_cost)
else:
return 1
# Check if user has enough quota
if user.remaining_quota >= calculate_cost(request):
allow_request()
else:
deny_request()
Limits:
- Max batch size: 10,000 items
- Max cost per batch: 1000 requests
- Prevents abuse while allowing legitimate useAdvanced Follow-up Questions
6. Rate Limiting with Quotas and Billing
Question: "How would you integrate rate limiting with a billing system where users pay for API calls?"
Solution:
Quota Management:
- Prepaid quota (e.g., 1M requests/month)
- Pay-as-you-go (e.g., $0.001 per request)
- Overage charges (e.g., 2x price after quota)
Implementation:
class BillingRateLimiter:
def check_limit(self, user):
quota = self.get_user_quota(user)
usage = self.get_current_usage(user)
if usage < quota.included:
# Within prepaid quota
return allow()
elif usage < quota.included + quota.overage_limit:
# Overage allowed, charge extra
self.charge_overage(user, quota.overage_rate)
return allow()
else:
# Hard limit reached
return deny_with_upgrade_prompt()
def charge_overage(self, user, rate):
# Async billing event
billing_queue.publish({
'user_id': user.id,
'charge_amount': rate,
'timestamp': now()
})
Quota Reset:
- Monthly: Reset on 1st of month
- Rolling: Reset 30 days from first request
- Flexible: User chooses reset date
Notifications:
- 80% quota used: Warning email
- 90% quota used: Urgent email
- 100% quota used: Upgrade prompt7. Rate Limiting for WebSocket Connections
Question: "How would you rate limit WebSocket connections and messages?"
Solution:
Two-Level Rate Limiting:
Level 1: Connection Rate Limiting
- Limit new connections per user
- Prevent connection spam
- Example: 10 connections per minute
Level 2: Message Rate Limiting
- Limit messages per connection
- Prevent message spam
- Example: 100 messages per second
Implementation:
class WebSocketRateLimiter:
def on_connection(self, user):
# Check connection rate limit
if not self.check_connection_limit(user):
return reject_connection()
# Track active connections
self.increment_connection_count(user)
return accept_connection()
def on_message(self, user, connection):
# Check message rate limit
if not self.check_message_limit(user, connection):
# Throttle message
return send_rate_limit_warning()
# Process message
return process_message()
def on_disconnect(self, user):
# Decrement connection count
self.decrement_connection_count(user)
Challenges:
- Long-lived connections
- Burst messages
- Reconnection storms
- Fair allocation across connections8. Rate Limiting with Machine Learning
Question: "How would you use ML to detect and prevent abuse patterns?"
Solution:
ML-Based Anomaly Detection:
Features:
- Request rate (requests per second)
- Request patterns (endpoints, timing)
- Geographic distribution
- User agent diversity
- Success/error ratio
- Payload characteristics
Model:
- Isolation Forest (anomaly detection)
- LSTM (sequence prediction)
- Clustering (behavior grouping)
Implementation:
class MLRateLimiter:
def check_limit(self, user, request):
# Traditional rate limiting
if not self.check_static_limit(user):
return deny()
# ML-based anomaly detection
features = self.extract_features(user, request)
anomaly_score = self.ml_model.predict(features)
if anomaly_score > threshold:
# Suspicious behavior
self.flag_for_review(user)
self.apply_stricter_limits(user)
return allow_with_monitoring()
return allow()
def extract_features(self, user, request):
return {
'request_rate': self.get_request_rate(user),
'endpoint_diversity': self.get_endpoint_diversity(user),
'geographic_spread': self.get_geo_spread(user),
'time_pattern': self.get_time_pattern(user),
'error_rate': self.get_error_rate(user)
}
Actions:
- Low risk (score < 0.3): Normal limits
- Medium risk (0.3-0.7): Reduced limits, monitoring
- High risk (> 0.7): Strict limits, CAPTCHA, manual review9. Rate Limiting for GraphQL APIs
Question: "How would you rate limit GraphQL queries with varying complexity?"
Solution:
Query Complexity Calculation:
Simple Query (cost = 1):
query {
user(id: 123) {
name
email
}
}
Complex Query (cost = 1000):
query {
users(limit: 100) { # 100 users
posts(limit: 10) { # 10 posts each = 1000 posts
comments(limit: 10) { # 10 comments each = 10,000 comments
author {
name
}
}
}
}
}
Complexity Calculation:
cost = base_cost + sum(field_costs * multipliers)
Implementation:
class GraphQLRateLimiter:
def calculate_complexity(self, query):
complexity = 0
for field in query.fields:
field_cost = self.get_field_cost(field)
multiplier = field.arguments.get('limit', 1)
complexity += field_cost * multiplier
# Recursive for nested fields
if field.has_children:
child_complexity = self.calculate_complexity(field.children)
complexity += child_complexity * multiplier
return complexity
def check_limit(self, user, query):
complexity = self.calculate_complexity(query)
if complexity > user.max_query_complexity:
return deny_with_error("Query too complex")
if not self.check_quota(user, complexity):
return deny_with_error("Rate limit exceeded")
self.consume_quota(user, complexity)
return allow()
Limits:
- Max query complexity: 10,000
- Quota: 1,000,000 complexity points per hour
- Prevents expensive queries from exhausting quota10. Rate Limiting During Migrations
Question: "How would you handle rate limiting during a system migration or upgrade?"
Solution:
Migration Strategy:
Phase 1: Dual Write
- Write to both old and new systems
- Read from old system
- Validate consistency
Phase 2: Dual Read
- Write to both systems
- Read from new system (with fallback)
- Monitor for issues
Phase 3: Cutover
- Write to new system only
- Read from new system
- Decommission old system
Rate Limiting During Migration:
class MigrationRateLimiter:
def check_limit(self, user):
try:
# Try new system first
result = self.new_rate_limiter.check(user)
# Dual write for validation
self.old_rate_limiter.check(user)
return result
except NewSystemError:
# Fallback to old system
logger.warning("New system failed, using old")
return self.old_rate_limiter.check(user)
def validate_consistency(self):
# Compare counters between systems
for user in sample_users:
old_count = self.old_rate_limiter.get_count(user)
new_count = self.new_rate_limiter.get_count(user)
if abs(old_count - new_count) > threshold:
alert("Inconsistency detected")
Rollback Plan:
- Keep old system running for 30 days
- Monitor error rates and latency
- Quick rollback if issues detected
- Gradual traffic shift (10% → 50% → 100%)These variations and follow-ups demonstrate deep understanding of rate limiting challenges and solutions in real-world scenarios.