How to Setup Multi Node Cluster Installation using CentOS v6.3 on HADOOP 2x

In this tutorial, we are using Centos 6.3 and we are going to install multi node cluster Hadoop. For this tutorial, we need at least three nodes. One of them is going to be a master node and the other nodes is going to be a slave node. I’m using three nodes in this tutorial to make this guide as simple as possible. We will be installing namenode and jobtracker on the master node and installing datanode, tasktracker, and secondary namenode on the slave nodes.


Setup Details:

Hadoop Master: 192.168.0.8 (namenode )
Hadoop Slave : 192.168.0.9 ( datanode1 )
Hadoop Slave : 192.168.0.10 ( datanode2 )

Step 1: Installing Java (On All Nodes:namenode,datanode1,datanode2)

Use below link to Download and Install Java in CentOS
Java is the primary requirement for running hadoop on any system, So make sure you have Java installed on your system using following command.

#java -version

Step 2: Creating Hadoop User (On All Nodes:namenode,datanode1,datanode2)

We recommend to create a user account for hadoop working. So create a system account using following command.

#adduser hadoop
#passwd hadoop
Step 3. Select sshd from Setup (namenode,datanode1,datanode2)
# Go to Terminal and type SETUP
# Click System Services
# Select SSHD.

# Click OK

Then type below command. [To make sshd service start.]
# service network restart

Step 4: Add FQDN Mapping (On All Nodes:namenode,datanode1,datanode2)

Edit /etc/hosts file on all master and slave servers and add following entries.

#vi /etc/hosts

192.168.0.8 namenode 
192.168.0.9  datanode1 
192.168.0.10 datanode2 


Step 5. Configuring Key Based Login(On All Nodes-namenode,datanode1,datanode2)

It’s required to set up hadoop user to ssh itself without password. Use following commands to configure auto login between all hadoop cluster servers..
NAMENODE :
$su – hadoop
$ssh-keygen –t rsa
$ cd .ssh
$ cp id_rsa.pub id_rsa_namenode.pub
$scp id_rsa_namenode.pub hadoop@datanode1:/home/hadoop/.ssh
$scp id_rsa_namenode.pub hadoop@datanode2:/home/hadoop/.ssh
$cat id_rsa_namenode.pub >> authorized_keys
$cat id_rsa_datanode1.pub >> authorized_keys
$cat id_rsa_datanode2.pub >> authorized_keys
$chmod 640 authorized_keys
$ cd ..
$chmod 700 .ssh
 
DATANODE1:
 
$su – hadoop
$ssh-keygen –t rsa
$ cd .ssh
$ cp id_rsa.pub id_rsa_datanode1.pub
$scp id_rsa_datanode1.pub hadoop@namenode:/home/hadoop/.ssh
$scp id_rsa_datanode1.pub hadoop@datanode2:/home/hadoop/.ssh
$cat id_rsa_namenode.pub >> authorized_keys
$cat id_rsa_datanode1.pub >> authorized_keys
$cat id_rsa_datanode2.pub >> authorized_keys
$chmod 640 authorized_keys
$ cd ..
$chmod 700 .ssh
 
DATANODE2:
 
$su – hadoop
$ssh-keygen –t rsa
$ cd .ssh
$ cp id_rsa.pub id_rsa_datanode1.pub
$scp id_rsa_datanode2.pub hadoop@namenode:/home/hadoop/.ssh
$scp id_rsa_datanode2.pub hadoop@datanode1:/home/hadoop/.ssh
$cat id_rsa_namenode.pub >> authorized_keys
$cat id_rsa_datanode1.pub >> authorized_keys
$cat id_rsa_datanode2.pub >> authorized_keys
$chmod 640 authorized_keys
$ cd ..
$chmod 700 .ssh
 
 
Step 6. Downloading Hadoop 2.7.1 (In namenode Only)
Select your desired Hadoop Version from below weblink and mind it select only “binary” link.
 
 
You can download setup Hadoop 2.7.1 download here
$cd Downloads
$tar xzf hadoop-2.7.1.tar.gz
$mv hadoop-2.7.1 /home/hadoop/hadoop
 
Step 7. Configure Hadoop Pseudo-Distributed Mode (On All namenode,datanode1,datanode2)
 

5.1. Setup Hadoop Environment Variables

First we need to set environment variable uses by hadoop. Edit ~/.bashrc file and append following values at end of file.
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS__HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
Now apply the changes in current running environment
$source ~/.bashrc
Now edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh file and set JAVA_HOME environment variable. Change the JAVA path as per install on your system. This path may vary as per your operating system version and installation source. So make sure you are using correct path.
export JAVA_HOME=/usr/java/jdk1.8.0_65
# Command to find Java path.
#update-alternatives –display java
Step 8: Configure Hadoop (namenode)
First edit hadoop configuration files and make following changes in master node which is namenode. 
Make a Directory First.
$ mkdir -p ~/hadoop_dat/hdfs/namenode
Then navigate to below location :
$cd $HADOOP_HOME/etc/hadoop
6.1 Edit hdfs-site.xml
<property>
              <name>dfs.replication</name>
              <value>3</value>
       </property>
       <property>
              <name>dfs.namenode.name.dir</name>
              <value>file:/home/hadoop/hadoop_data/hdfs/namenode</value>
</property>
6.2 Edit core-site.xml (Make changes in bold as per you hadoop master node name.)
  <property>
        <name>fs.defaultFS</name>
        <value>hdfs://namenode:9000</value>
 </property>
6.3 Edit mapred-site.xml.template 
<property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
</property>
6.4 Edit yarn-site.xml(Make changes in bold as per you hadoop master node name.)
<property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
       </property>
            <property>
           <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
                 <value>org.apache.hadoop.mapred.ShuffleHandler</value>
         </property>
         <property>
                 <name>yarn.resourcemanager.resource-tracker.address</name>
                 <value>namenode:8025</value>
         </property>
         <property>
                <name>yarn.resourcemanager.scheduler.address</name>
                 <value>namenode:8030</value>
         </property>
         <property>
                 <name>yarn.resourcemanager.address</name>
                 <value>namenode:8050</value>
         </property>
Step 9: Configure Hadoop on (namenode Only)
Go to hadoop source folder on hadoop-master and do following settings.
#su – hadoop
$ cd /home/hadoop/hadoop/etc/hadoop
$ vi masters
namenode
$vi slaves
datanode1
datanode2
Step 10: Copy Hadoop Source to Slave Servers (datanode1,datanode2)
#su – hadoop
$cd /home/hadoop
$scp –r hadoop datanode1:/home/hadoop
$scp –r hadoop datanode2:/home/hadoop
Step 11: Configure Hadoop (datanode1,datanode2)
Make a Directory First in both the datanodes.
$ mkdir -p ~/hadoop_dat/hdfs/datanode1
Then navigate to below location :

$cd $HADOOP_HOME/etc/hadoop
6.1 Edit hdfs-site.xml
   <property>
       <name>dfs.replication</name>
      <value>3</value>
   </property>
       <property>
              <name>dfs.namenode.name.dir</name>
              <value>file:/home/hadoop/hadoop_data/hdfs/datanode</value>
</property>
Step 12:Format Name Node on (namenode only)
#su – hadoop
$cd /home/hadoop/hadoop
$bin/hadoop namenode -format

13/07/13 10:58:07 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = hadoop-master/192.168.1.15
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.2.0
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1479473; compiled by 'hortonfo' on Mon May  6 06:59:37 UTC 2013
STARTUP_MSG:   java = 1.7.0_25
************************************************************/
13/07/13 10:58:08 INFO util.GSet: Computing capacity for map BlocksMap
13/07/13 10:58:08 INFO util.GSet: VM type       = 32-bit
13/07/13 10:58:08 INFO util.GSet: 2.0% max memory = 1013645312
13/07/13 10:58:08 INFO util.GSet: capacity      = 2^22 = 4194304 entries
13/07/13 10:58:08 INFO util.GSet: recommended=4194304, actual=4194304
13/07/13 10:58:08 INFO namenode.FSNamesystem: fsOwner=hadoop
13/07/13 10:58:08 INFO namenode.FSNamesystem: supergroup=supergroup
13/07/13 10:58:08 INFO namenode.FSNamesystem: isPermissionEnabled=true
13/07/13 10:58:08 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
13/07/13 10:58:08 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
13/07/13 10:58:08 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
13/07/13 10:58:08 INFO namenode.NameNode: Caching file names occuring more than 10 times
13/07/13 10:58:08 INFO common.Storage: Image file of size 112 saved in 0 seconds.
13/07/13 10:58:08 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/opt/hadoop/hadoop/dfs/name/current/edits
13/07/13 10:58:08 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/opt/hadoop/hadoop/dfs/name/current/edits
13/07/13 10:58:08 INFO common.Storage: Storage directory /opt/hadoop/hadoop/dfs/name has been successfully formatted.
13/07/13 10:58:08 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop-master/192.168.1.15
************************************************************/

Step 13: Start Hadoop Services

Lets start your hadoop cluster using the scripts provides by hadoop. Just navigate to your   hadoop sbin directory and execute scripts one by one.
$cd $HADOOP_HOME/sbin
Now run all the hadoop services start-all.sh or ./start-all.sh script.
$start-all.sh
I hope this tutorial will surely help you. If you have any questions or problems please let me know.
Happy Hadooping with Patrick..

Leave a Reply

Your email address will not be published. Required fields are marked *