Problem Statement

📖 3 min read 📄 Part 1 of 10

Ad Click Aggregation System - Problem Statement

Overview

Design a real-time ad click aggregation system that processes millions of ad clicks per second for billing and analytics. The system must provide accurate counts, prevent fraud, handle deduplication, and support real-time dashboards while maintaining exactly-once semantics for billing.

Functional Requirements

Core Features

  • Click Tracking: Record every ad click with metadata
  • Real-time Aggregation: Aggregate clicks by advertiser, campaign, ad
  • Billing Accuracy: Exactly-once counting for billing
  • Fraud Detection: Identify and filter fraudulent clicks
  • Analytics Dashboard: Real-time and historical analytics
  • Click Attribution: Attribute clicks to conversions

Click Data

  • Click ID: Unique identifier
  • Timestamp: Click time (millisecond precision)
  • Ad ID: Advertisement identifier
  • Campaign ID: Campaign identifier
  • Advertiser ID: Advertiser identifier
  • User ID: User identifier (cookie/device ID)
  • IP Address: Source IP
  • User Agent: Browser/device info
  • Referrer: Source page
  • Landing Page: Destination URL

Aggregation Dimensions

  • By Time: Minute, hour, day aggregates
  • By Advertiser: Total clicks per advertiser
  • By Campaign: Clicks per campaign
  • By Ad: Clicks per individual ad
  • By Geography: Clicks by country, region, city
  • By Device: Desktop, mobile, tablet
  • By Platform: iOS, Android, Web

Fraud Detection

  • Click Fraud: Repeated clicks from same user/IP
  • Bot Detection: Automated bot traffic
  • Invalid Traffic: Non-human traffic
  • Click Farms: Coordinated fraudulent activity
  • Duplicate Clicks: Same click reported multiple times

Non-Functional Requirements

Performance

  • Ingestion Rate: 10 million clicks/second peak
  • Aggregation Latency: <5 seconds for real-time
  • Query Latency: <1 second for dashboards
  • Billing Accuracy: 100% accuracy (exactly-once)
  • Fraud Detection: <100ms per click

Scalability

  • Daily Clicks: 500 billion clicks/day
  • Advertisers: 1 million active advertisers
  • Campaigns: 10 million active campaigns
  • Ads: 100 million active ads
  • Storage: 100 TB for 90 days

Reliability

  • Uptime: 99.99% availability
  • Data Durability: 99.999999999% (11 9's)
  • No Data Loss: Guaranteed click delivery
  • Exactly-Once: For billing aggregates
  • At-Least-Once: For analytics

Scale Constraints

Click Volume

Peak Clicks/Second: 10 million
Daily Clicks: 500 billion
Click Size: 200 bytes
Daily Data: 100 TB raw
Compressed: 20 TB (5:1)

Aggregation Volume

Aggregation Windows: 1min, 1hour, 1day
Dimensions: 10 dimensions
Unique Combinations: 1 billion
Aggregates/Second: 100,000 updates/sec

Success Metrics

Accuracy

  • Billing accuracy: 100%
  • Fraud detection rate: >95%
  • False positive rate: <1%
  • Deduplication accuracy: >99.9%

Performance

  • Ingestion latency: p95 <50ms
  • Aggregation latency: p95 <5s
  • Query latency: p95 <1s
  • System uptime: >99.99%

This problem statement establishes the foundation for a high-accuracy, high-throughput ad click aggregation system.