How Does Namenode Handles Datanode Failure in Hadoop Distributed File System?
Hadoop file system is a master/slave file system in which Namenode works as the master and Datanode work as a slave. Namenode is so critical term to Hadoop file system because it acts as a central component of HDFS. If Namenode gets down then the whole Hadoop cluster is inaccessible and considered dead. Datanode stores actual data and works as instructed by Namenode. A Hadoop file system can have multiple data nodes but only one active Namenode.
Basic operations of Namenode:
- Namenode maintains and manages the Data Nodes and assigns the task to them.
- Namenodde does not contain actual data of files.
- Namenode stores metadata of actual data like Filename, path, number of data blocks, block IDs, block location, number of replicas and other slave related informations.
- Namenode manages all the request(read, write) of client for actual data file.
- Namenode executes file system name space operations like opening/closing files, renaming files and directories.
Basic Operations of Datanode:
- Datanodes is responsible of storing actual data.
- Upon instruction from Namenode, it performs operations like creation/replication/deletion of data blocks.
- When one of Datanode gets down then it will not make any effect on Hadoop cluster due to replication.
- All Datanodes are synchronized in the Hadoop cluster in a way that they can communicate with each other for various operations.
What happens if one of the Datanodes gets failed in HDFS?
Namenode periodically receives a heartbeat and a Block report from each Datanode in the cluster. Every Datanode sends heartbeat message after every 3 seconds to Namenode. The health report is just information about a particular Datanode that is working properly or not. In the other words we can say that particular Datanode is alive or not.
A block report of a particular Datanode contains information about all the blocks on that resides on the corresponding Datanode. When Namenode doesn’t receive any heartbeat message for 10 minutes(ByDefault) from a particular Datanode then corresponding Datanode is considered Dead or failed by Namenode. Since blocks will be under replicated, the system starts the replication process from one Datanode to another by taking all block information from the Block report of corresponding Datanode. The Data for replication transfers directly from one Datanode to another without data passing through Namenode.