Architecture

📖 2 min read 📄 Part 3 of 10

Distributed File System - Architecture

High-Level Architecture

Distributed File System Architecture — HDFS/GFS-style with NameNode metadata management and replicated DataNodes

NameNode (Master)

Namespace Management: Directory tree, file metadata
Block Mapping: File → Block locations
Replication Policy: Maintain replication factor
Heartbeat Processing: Monitor DataNode health
Block Reports: Track block locations
Metadata Persistence: Edit log + FSImage

DataNode (Worker)

Block Storage: Store file blocks on local disks
Block Serving: Serve blocks to clients
Heartbeat: Send periodic heartbeats to NameNode
Block Reports: Report block inventory
Replication: Replicate blocks to other DataNodes
Checksums: Verify data integrity

Write Flow

Client requests NameNode for block locations
NameNode allocates blocks and returns DataNode list
Client writes to first DataNode
First DataNode pipelines to second DataNode
Second DataNode pipelines to third DataNode
Acknowledgments flow back through pipeline
Client notifies NameNode of completion

Read Flow

Client requests NameNode for block locations
NameNode returns list of DataNodes with blocks
Client reads from nearest DataNode
Client verifies checksums
Client reads next block if needed

Replication Strategy

Rack-Aware: First replica on local rack, second on different rack, third on same rack as second
Load Balancing: Distribute replicas evenly
Network Topology: Minimize cross-rack traffic
Failure Domains: Protect against rack failures

This architecture provides high throughput, fault tolerance, and scalability for big data workloads.