Ad Click Aggregation System - Problem Statement
Overview
Design a real-time ad click aggregation system that processes millions of ad clicks per second for billing and analytics. The system must provide accurate counts, prevent fraud, handle deduplication, and support real-time dashboards while maintaining exactly-once semantics for billing.
Functional Requirements
Core Features
- Click Tracking: Record every ad click with metadata
- Real-time Aggregation: Aggregate clicks by advertiser, campaign, ad
- Billing Accuracy: Exactly-once counting for billing
- Fraud Detection: Identify and filter fraudulent clicks
- Analytics Dashboard: Real-time and historical analytics
- Click Attribution: Attribute clicks to conversions
Click Data
- Click ID: Unique identifier
- Timestamp: Click time (millisecond precision)
- Ad ID: Advertisement identifier
- Campaign ID: Campaign identifier
- Advertiser ID: Advertiser identifier
- User ID: User identifier (cookie/device ID)
- IP Address: Source IP
- User Agent: Browser/device info
- Referrer: Source page
- Landing Page: Destination URL
Aggregation Dimensions
- By Time: Minute, hour, day aggregates
- By Advertiser: Total clicks per advertiser
- By Campaign: Clicks per campaign
- By Ad: Clicks per individual ad
- By Geography: Clicks by country, region, city
- By Device: Desktop, mobile, tablet
- By Platform: iOS, Android, Web
Fraud Detection
- Click Fraud: Repeated clicks from same user/IP
- Bot Detection: Automated bot traffic
- Invalid Traffic: Non-human traffic
- Click Farms: Coordinated fraudulent activity
- Duplicate Clicks: Same click reported multiple times
Non-Functional Requirements
Performance
- Ingestion Rate: 10 million clicks/second peak
- Aggregation Latency: <5 seconds for real-time
- Query Latency: <1 second for dashboards
- Billing Accuracy: 100% accuracy (exactly-once)
- Fraud Detection: <100ms per click
Scalability
- Daily Clicks: 500 billion clicks/day
- Advertisers: 1 million active advertisers
- Campaigns: 10 million active campaigns
- Ads: 100 million active ads
- Storage: 100 TB for 90 days
Reliability
- Uptime: 99.99% availability
- Data Durability: 99.999999999% (11 9's)
- No Data Loss: Guaranteed click delivery
- Exactly-Once: For billing aggregates
- At-Least-Once: For analytics
Scale Constraints
Click Volume
Peak Clicks/Second: 10 million
Daily Clicks: 500 billion
Click Size: 200 bytes
Daily Data: 100 TB raw
Compressed: 20 TB (5:1)Aggregation Volume
Aggregation Windows: 1min, 1hour, 1day
Dimensions: 10 dimensions
Unique Combinations: 1 billion
Aggregates/Second: 100,000 updates/secSuccess Metrics
Accuracy
- Billing accuracy: 100%
- Fraud detection rate: >95%
- False positive rate: <1%
- Deduplication accuracy: >99.9%
Performance
- Ingestion latency: p95 <50ms
- Aggregation latency: p95 <5s
- Query latency: p95 <1s
- System uptime: >99.99%
This problem statement establishes the foundation for a high-accuracy, high-throughput ad click aggregation system.