Variations & Follow-ups

📖 2 min read 📄 Part 8 of 10

Distributed File System - Variations and Follow-ups

Common Variations

1. Object Storage (S3-like)

Key Differences:

Flat namespace (buckets + keys)
HTTP API instead of POSIX
Eventual consistency
Versioning support
Lifecycle policies

2. Distributed Block Storage

Key Differences:

Block-level access
Attached to VMs
Lower latency
Smaller block sizes (4KB-64KB)
Snapshot support

3. Distributed Database Storage

Key Differences:

Structured data
ACID transactions
Query support
Indexes
Smaller files

Interview Follow-up Questions

Q: How do you handle NameNode failure?

Answer:

Active-Standby HA setup
Shared edit log (NFS or QJM)
Automatic failover (<30s)
ZooKeeper for coordination
Clients retry on failover

Q: How do you ensure data locality?

Answer:

Track block locations in NameNode
Schedule tasks on nodes with data
Rack-aware placement
Network topology awareness
Fallback to remote reads if needed

Q: How do you handle small files?

Answer:

Combine small files into larger files
Use HAR (Hadoop Archive)
Sequence files for key-value pairs
Increase block size
Consider object storage instead

Q: How do you recover from disk failure?

Answer:

Detect via heartbeat/block report
Mark disk as failed
Re-replicate blocks from other replicas
Automatic recovery
Alert administrators

Q: How do you handle network partitions?

Answer:

NameNode uses majority quorum
DataNodes in minority partition marked dead
Blocks re-replicated from majority
Clients retry on failure
Eventual consistency after partition heals

Q: How do you optimize for read-heavy workloads?

Answer:

Increase replication factor
Cache frequently accessed blocks
Short-circuit local reads
Use SSDs for hot data
CDN for static content

This guide covers common variations and deep-dive questions for distributed file system interviews.