Trending September 2023 # Architecture Of Hdfs Federation With Benefits # Suggested October 2023 # Top 12 Popular |

Trending September 2023 # Architecture Of Hdfs Federation With Benefits # Suggested October 2023 # Top 12 Popular

You are reading the article Architecture Of Hdfs Federation With Benefits updated in September 2023 on the website We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested October 2023 Architecture Of Hdfs Federation With Benefits

What is HDFS Federation?

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

HDFS Federation Architecture

HDFS is a storage system to store large files and it is a file system for Hadoop which handles very large files. HDFS architecture follows legacy master/slave methodology where the master is name node and slaves are data nodes where name node stores the metadata with all the relevant information of data blocks, data, and data nodes. As the current architecture has a single point of failure as there is only a single name node or namespace which controls all the data nodes if it fails the entire system will be down. In order to overcome this issue, HDFS Federation architecture is introduced which is an enhanced version of existing HDFS architecture.  It enhances the existing architecture by supporting multiple name nodes so that even one name node fails other nodes will come online so that the entire cluster will be available. As current HDFC architecture has limitations like namespace availability, isolation, tightly coupled namespace, and data node and performance. HDFS Federation uses multiple name nodes so that it can expand the namespace horizontally. In this architecture, all data nodes will be down having all the data of the name nodes. Data nodes send will send signals, reports, and heartbeats to the name nodes frequently. Each namespace has a set of blocks called block pool.

HDFS Federation has two components block pool and namespace and explained as below.

1. Block Pool 2. Namespace

The namespace is nothing but it is a name node that will have metadata like information of directories, files and blocks in the file system and it supports all types of file system operations related to namespace like create, delete, rename and listing of the files in directory and file system.

3. Block Storage Service

Block storage service is nothing but a combination of name node where block pool management is being done and the storage. It handles data node registration, heartbeats and other signals from it. It stores block information like logs, a location where it is stored and operations on blocks like create, delete, modify, location of a block, etc. It manages replication of a name node, block replication for blocks which are failed and deletion of over replicated blocks too. Storage: It is provided by the block pools which are being stored in Data nodes and provides I/O access to files and folders.

4. Cluster-ID 5. Configuration of Federation

Federation is designed in such a way that it can be backward compatible. Even with the enhanced architecture, it allows an old model of having a single name node without any changes in the configuration as the main idea of it is to don’t change the configuration based on types of name nodes in the cluster. So that no needs to do different configurations for different nodes in the cluster. We can configure by having NameServiceID which will have all the information of nodes and backup nodes in the form of parameters. In the first step, we need to add parameter dfs.nameservices and configure with the list of NameServiceIDs and it will be used by data nodes to determine the name nodes in the cluster. In the second step, we need to append the secondary/backup node for each name node as configuration parameters to the common configuration file suffixed with NameServiceID.

Benefits of HDFS Federation

HDFS Federation overcomes the limitations in HDFS architecture such as isolation, tightly coupled nature, high availability, and performance. Its main benefits are as below:

Namespace scalability: In the federation, we can have more than one name node, so whenever a requirement arises it can be scalable horizontally by adding a namespace to the existing clusters.

Isolation: It offers isolation when multiple name nodes are there. It won’t provide isolation when only a single name node is there but multiple users are there. So when multiple name nodes with multiple users and applications can be isolated to different namespaces according to the mapping.

Performance: HDFS Federation offers provides I/O operations throughout the multiple name nodes hence performance increase as it is not limiting to single name node read/write operations.


Finally, it’s a wrapping of discussion regarding HDFS Federation architecture. We have discussed so many topics like why it came into the picture, what are its features or components, what are the benefits using this architecture over the existing architecture, how it overcomes the single point of failure etc. I hope you will have a better understanding and knowledge of where to apply, how to use and what are the benefits of HDFS Federation architecture after reading this article.

Recommended Articles

This is a guide to HDFS Federation. Here we discuss the introduction, HDFS Federation Architecture, Benefits of HDFS Federation. You can also go through our other suggested articles to learn more –

You're reading Architecture Of Hdfs Federation With Benefits

Update the detailed information about Architecture Of Hdfs Federation With Benefits on the website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!