How to Setup Hadoop 2.7.1 [Any Stable Version] on CentOS


This tutorial will guide you to setup Hadoop 2.7.1 [Any Updated Stable Version of Hadoop] as a single node cluster on CentOS/RHEL 7/6/5 or Ubuntu. This article has been tested with CentOS 6 and Ubuntu 14.04 LTS and it doesn’t includes overall configuration of hadoop, we have only basic configuration required to start working with hadoop.

Step 1: Installing Java

Use below link to Download and Install Java in you Linux Os (CentOS or Ubuntu)
Java is the primary requirement for running hadoop on any system, So make sure you have Java installed on your system using following command.
# java -version 

java version "1.8.0_66"
Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)

Step 2: Creating Hadoop User

We recommend to create a user account for hadoop working. So create a system account using following command.
# adduser hadoop
# passwd hadoop
Step 3. Select sshd from Setup (IN ROOT ACCOUNT)
# Go to Terminal and type SETUP
# Click System Services
# Select SSHD.

# Click OK.

Then type below command. [To make sshd service start.]

# service network restart

After creating account and setup sshd, it also required to set up key based ssh to its own account. To do this use execute following commands. ( IN  ROOT ACCOUNT)

  
$ su - hadoop 
$ ssh-keygen -t rsa
$ cd .ssh/ 
$ vi id_rsa.pub 
$ vi authorized_keys 
$ chmod 600 id_rsa.pub authorized_keys
$ ssh localhost
Explanation of below commands :  
$ su – hadoop [Switch to user hadoop]
 $ ssh-keygen -t rsa [Generate ssh key to allow password less login to hadoop user on localhost]
   — Click Enter for default values–
 $ vi id_rsa.pub [Copy the containts from id_rsa.pub (Public key) ]
 $ vi authorized_keys [Create a file authorized_keys and paste the copied contents]
 $ chmod 600 id_rsa.pub authorized_keys [Change permission using chmod 600]
 $ ssh localhost [ssh password less login established]    
# Verify key based login. If command $ ssh localhost should not ask for password but first time it will prompt for adding RSA to the list of known hosts.
Step 4. Downloading Hadoop 2.7.1 (IN HADOOP USER ACCOUNT)
Select your desired Hadoop Version from below weblink and mind it select only “binary” link.
 
 
You can download setup Hadoop 2.7.1 download here
 
$ cd Downloads
$ tar xzf hadoop-2.7.1.tar.gz
$ mv hadoop-2.7.1 home/hadoop hadoop
Step 5. Configure Hadoop Pseudo-Distributed Mode
 

5.1. Setup Hadoop Environment Variables

First we need to set environment variable uses by hadoop. Edit ~/.bashrc file and append following values at end of file.
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
Now apply the changes in current running environment
$ source ~/.bashrc
Now edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh file and set JAVA_HOME environment variable. Change the JAVA path as per install on your system. This path may vary as per your operating system version and installation source. So make sure you are using correct path.
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
# Command to find Java path.
# update-alternatives --display java

5.2 Setup Hadoop Configuration Files

Hadoop has many of configuration files, which need to configure as per requirements of your hadoop infrastructure. Let’s start with the configuration with basic hadoop single node cluster setup. first navigate to below location
$ cd $HADOOP_HOME/etc/hadoop

Edit core-site.xml

<configuration>
<property>
  <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
</property>
</configuration>

Edit hdfs-site.xml

<configuration>
<property>
 <name>dfs.replication</name>
 <value>1</value>
</property>

<property>
  <name>dfs.name.dir</name>
    <value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
</property>

<property>
  <name>dfs.data.dir</name>
    <value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
</property>
</configuration>

Edit mapred-site.xml.template

<configuration>
 <property>
  <name>mapreduce.framework.name</name>
   <value>yarn</value>
 </property>
</configuration>

Edit yarn-site.xml

<configuration>
 <property>
  <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
 </property>
</configuration>

5.3. Format Namenode

Now format the namenode using following command, make sure that Storage directory is
$ hdfs namenode -format
Sample output:
15/11/13 14:48:31 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = localhost/127.0.0.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.7.1
...
...
...
15/11/13 14:48:41 INFO common.Storage: Storage directory /home/hadoop/hadoopdata/hdfs/namenode has been successfully formatted.
15/11/13 14:48:42 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
15/11/13 14:48:42 INFO util.ExitUtil: Exiting with status 0
15/11/13 14:48:42 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1
************************************************************/

Step 6. Start Hadoop Cluster

Lets start your hadoop cluster using the scripts provides by hadoop. Just navigate to your hadoop sbin directory and execute scripts one by one.
$ cd $HADOOP_HOME/sbin
Now run all the hadoop services start-all.sh script.
$ start-all.sh

Step 7. Access Hadoop Services in Browser

Hadoop NameNode started on port 50070 default. Access your server on port 50070 in your favorite web browser.
http://localhost:50070/
hadoop-2-7-1-namenode
Now access port 8088 for getting the information about cluster and all applications
http://localhost:8088/
hadoop-2-7-1-namenode1
Access port 50090 for getting details about secondary namenode.
http://localhost:50090/
hadoop-2-7-1-secondry-namenode
Access port 50075 to get details about DataNode

Step 8. Test Hadoop Single Node Setup

7.1 – Make the HDFS directories required using following commands.
$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/hadoop
I hope this tutorial will surely help you. If you have any questions or problems please let me know.
Happy Hadooping with Patrick..

7 thoughts on “How to Setup Hadoop 2.7.1 [Any Stable Version] on CentOS

  1. This tutorial is for manual installation of Hadoop in CentOS in VirtualBox. Hortonworks sandbox is automatic installation in Virtual Box and its very easy to install.If you have any question related to that.You can mail me or mentioned here.I can explain you.

  2. Hi Vivek,

    To install hadoop in windows you need to download Unix command-line tool Cygwin.Through cygwin you can install hadoop in windows environment.

  3. What’s Going down i’m new to this, I stumbled upon this I have found It positively helpful and it has aided
    me out loads. I’m hoping to give a contribution & aid different
    customers like its helped me. Good job.

Leave a Reply

Your email address will not be published. Required fields are marked *