Distributed File System - Problem Statement
Overview
Design a distributed file system similar to HDFS (Hadoop Distributed File System) or GFS (Google File System) that can store and manage petabytes of data across thousands of machines with high throughput, fault tolerance, and scalability for big data processing workloads.
Functional Requirements
Core File Operations
- Write Files: Store large files (GB to TB) across multiple nodes
- Read Files: Retrieve files with high throughput
- Append Data: Append-only writes for log files
- Delete Files: Remove files and reclaim storage
- List Directory: List files and directories
- File Metadata: Store file size, timestamps, permissions, replication factor
Data Management
- Large File Support: Optimize for files >100MB (typical: 1GB-10TB)
- Block-Based Storage: Split files into blocks (64MB-128MB)
- Replication: Replicate blocks across nodes (default 3x)
- Data Locality: Co-locate computation with data
- Rack Awareness: Distribute replicas across racks
- Automatic Rebalancing: Balance storage across nodes
Fault Tolerance
- Node Failure Handling: Detect and recover from node failures
- Data Integrity: Checksums for corruption detection
- Automatic Re-replication: Restore replication factor on failures
- Heartbeat Monitoring: Continuous health checking
- Graceful Degradation: Continue operations with reduced capacity
Non-Functional Requirements
Performance
- Write Throughput: 100MB/s per client
- Read Throughput: 200MB/s per client
- Aggregate Throughput: 10GB/s+ cluster-wide
- Latency: <100ms for metadata operations
- Concurrent Clients: 10,000+ simultaneous clients
Scalability
- Storage Capacity: Petabytes of data
- Number of Files: Billions of files
- Cluster Size: 1,000-10,000 nodes
- Horizontal Scaling: Add nodes without downtime
- Namespace Scaling: Support deep directory hierarchies
Reliability
- Data Durability: 99.999999999% (11 9's)
- System Availability: 99.9% uptime
- No Single Point of Failure: Redundant metadata servers
- Automatic Recovery: Self-healing on failures
- Zero Data Loss: With proper replication
Consistency
- Write-Once-Read-Many: Immutable files after write
- Strong Consistency: Metadata operations are consistent
- Eventual Consistency: Block replication is eventual
- Atomic Operations: File creation and deletion are atomic
Success Metrics
- Storage Efficiency: >90% disk utilization
- Replication Overhead: 3x storage for 3x replication
- Failure Recovery: <5 minutes to detect and re-replicate
- Throughput: 100MB/s+ per node
- Cost: <$0.02 per GB per month
This problem statement establishes the foundation for designing a production-grade distributed file system for big data workloads.