Overview
Hadoop is open source framework for running and writing distributed computing programs. This framework comprise of HDFS (Hadoop Distributed File system) and Map Reduce (Programming framework written in Java).In Hadoop 1, Only Map Reduce program (either written in Java or Python ) can be run on the data stored in HDFS. Therefore, it only fit for batch processing computations.
In Hadoop 2, the YARN (Yet Another Resource Negotiator) was introduced which provide API to work on requesting and allocating resource in cluster. These API facilitate application such as Spark, Tez, Storm program to process large scale fault tolerant data of HDFS. Thus, hadoop ecosystem now fits in for all batch or near real time or real time processing computation.
Today, I will be discussing about the steps to set up Hadoop 2 in pseudo mode on Ubuntu machine.
Prerequisites
- Hardware requirement
- Check java version
You can version the java version with below command.
$ java -version
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
Steps for Hadoop Set up on Single Machine.
Step 1 : Create a dedicated hadoop user.
- Create a group hadoop
pooja@prod01:~$ sudo groupadd hadoop
pooja@prod01:~$ sudo useradd -G hadoop -m hduser
Note:-m will create the home directory
1.3 Make sure home directory with hduser created.
pooja@prod01:~$ sudo ls -ltr /home/
total 8
drwxr-xr-x 28 pooja pooja 4096 Aug 24 09:23 pooja
drwxr-xr-x 2 hduser hduser 4096 Aug 24 13:34 hduser
Enter new UNIX password:
1.5 Log-in as hduser
hduser@prod01:~$ pwd
2.3 Provide read and write permission to authorized keys.
2.4 Verify if password less ssh is working.
Note: In continue question, please specify yes as shown below
3.2 Untar the downloaded tar file.
hduser@prod01:~$ tar -xvf hadoop-2.8.1.tar.gz
...Snippet
hadoop-2.8.1/share/doc/hadoop/images/external.png
hadoop-2.8.1/share/doc/hadoop/images/h5.jpg
hadoop-2.8.1/share/doc/hadoop/index.html
hadoop-2.8.1/share/doc/hadoop/project-reports.html
hadoop-2.8.1/include/
hadoop-2.8.1/include/hdfs.h
hadoop-2.8.1/include/Pipes.hh
hadoop-2.8.1/include/TemplateFactory.hh
hadoop-2.8.1/include/StringUtils.hh
hadoop-2.8.1/include/SerialUtils.hh
hadoop-2.8.1/LICENSE.txt
hadoop-2.8.1/NOTICE.txt
hadoop-2.8.1/README.txt
3.3 Create the soft link.
hduser@prod01:~$ ln -s hadoop-2.8.1 hadoop
Here, first we will copy the mapred-site.xml.template to mapred-site.xml and then will add property to it.
6.4 Run the PI Mapreduce job from the hadoop-examples jar.
hduser@prod1:~$ yarn jar hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.1.jar pi 4 1000
hduser@prod01:~$ stop-yarn.sh
stopping yarn daemons
stopping resourcemanager
localhost: stopping nodemanager
localhost: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
no proxyserver to stop
8.2 Stop HDFS processes
hduser@prod01:~$ stop-dfs.sh
17/08/24 17:11:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Stopping namenodes on [localhost]
localhost: stopping namenode
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
17/08/24 17:12:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Hope you are able to follow my instructions on Hadoop Pseudo Mode Setup. Please write to me if any of you are still facing problem.
- 1.2 Create a user hduser in group hadoop.
pooja@prod01:~$ sudo useradd -G hadoop -m hduser
Note:-m will create the home directory
pooja@prod01:~$ sudo ls -ltr /home/
total 8
drwxr-xr-x 28 pooja pooja 4096 Aug 24 09:23 pooja
drwxr-xr-x 2 hduser hduser 4096 Aug 24 13:34 hduser
1.4 Define password for hduser.
pooja@prod01:~$ sudo passwd hduser
pooja@prod01:~$ sudo passwd hduser
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
1.5 Log-in as hduser
pooja@prod01:~$ su - hduser
Password:
hduser@prod01:~$ pwd
/home/hduser
Step 2: Set up Passwordless SSH
2.1 Generate the ssh-keygen without password
hduser@prod01:~$ ssh-keygen -t rsa -P ""
hduser@prod01:~$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa):
Created directory '/home/hduser/.ssh'.
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
The key fingerprint is:
6c:c0:f4:c2:d1:d8:40:41:2b:e8:7b:8d:d4:c7:2c:62 hduser@prod01
The key's randomart image is:
+--[ RSA 2048]----+
| oB* |
| . +.+o |
| . . * . |
| . o * |
| . E o S |
| + ++ |
| . o . |
| . |
| |
+-----------------+
2.2 Add the public ssh-key generated to authorized keys
hduser@prod01:~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
hduser@prod01:~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
2.3 Provide read and write permission to authorized keys.
hduser@prod01:~$ chmod 0600 ~/.ssh/authorized_keys
2.4 Verify if password less ssh is working.
Note: In continue question, please specify yes as shown below
hduser@prod01:~$ ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is ad:3c:12:c3:b1:d2:60:a4:8f:76:00:1e:15:b3:f4:41.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 14.04.4 LTS (GNU/Linux 4.2.0-27-generic x86_64)
...Snippet
$
Step 3: Download Hadoop 2.8.1
3.1 Download Hadoop 2.8.1 tar file from Apache Download images or using below commands
hduser@prod01:~$ wget http://apache.claz.org/hadoop/common/hadoop-2.8.1/hadoop-2.8.1.tar.gz
--2017-08-24 14:01:31-- http://apache.claz.org/hadoop/common/hadoop-2.8.1/hadoop-2.8.1.tar.gz
Resolving apache.claz.org (apache.claz.org)... 74.63.227.45
Connecting to apache.claz.org (apache.claz.org)|74.63.227.45|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 424555111 (405M) [application/x-gzip]
Saving to: ‘hadoop-2.8.1.tar.gz’
100%[=====================================================================================================>] 424,555,111 1.51MB/s in 2m 48s
2017-08-24 14:04:19 (2.41 MB/s) - ‘hadoop-2.8.1.tar.gz’ saved [424555111/424555111]
Resolving apache.claz.org (apache.claz.org)... 74.63.227.45
Connecting to apache.claz.org (apache.claz.org)|74.63.227.45|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 424555111 (405M) [application/x-gzip]
Saving to: ‘hadoop-2.8.1.tar.gz’
100%[=====================================================================================================>] 424,555,111 1.51MB/s in 2m 48s
2017-08-24 14:04:19 (2.41 MB/s) - ‘hadoop-2.8.1.tar.gz’ saved [424555111/424555111]
3.2 Untar the downloaded tar file.
hduser@prod01:~$ tar -xvf hadoop-2.8.1.tar.gz
...Snippet
hadoop-2.8.1/share/doc/hadoop/images/external.png
hadoop-2.8.1/share/doc/hadoop/images/h5.jpg
hadoop-2.8.1/share/doc/hadoop/index.html
hadoop-2.8.1/share/doc/hadoop/project-reports.html
hadoop-2.8.1/include/
hadoop-2.8.1/include/hdfs.h
hadoop-2.8.1/include/Pipes.hh
hadoop-2.8.1/include/TemplateFactory.hh
hadoop-2.8.1/include/StringUtils.hh
hadoop-2.8.1/include/SerialUtils.hh
hadoop-2.8.1/LICENSE.txt
hadoop-2.8.1/NOTICE.txt
hadoop-2.8.1/README.txt
3.3 Create the soft link.
hduser@prod01:~$ ln -s hadoop-2.8.1 hadoop
Step 4: Configure Hadoop Pseudo Distributed mode.
In the hadoop configuration, we only added the minimum required property, you can add more properties to it as well.
4.1 Set up the environment variable.
4.1.1 Edit bashrc and add hadoop in path as shown below:
hduser@pooja:~$ vi .bashrc
#Add below lines to .bashrc
export HADOOP_HOME=/home/hduser/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
4.1.2 Source .bashrc in current login session
hduser@pooja:~$ source ~/.bashrc
4.2 Hadoop configuration file changes
4.2.1 Changes to hadoop-env.sh (set $JAVA_HOME to installation directory)
4.2.1.1 Find JAVA_HOME on machine.
hduser@pooja:~$ which java
/usr/bin/java
hduser@pooja:~$ readlink -f /usr/bin/java
/usr/lib/jvm/java-8-oracle/jre/bin/java
Note: /usr/lib/jvm/java-8-oracle is JAVA_HOME diretory
4.2.1.2 Edit hadoop-env.sh and set $JAVA_HOME.
hduser@prod01:~$ vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh
Edit file and change
JAVA_HOME = ${JAVA_HOME}
to
JAVA_HOME = /usr/lib/jvm/java-8-oracle
Note: JAVA_HOME=path fetched in step 4.2.1.1
4.2.2 Changes to core-site.xml
hduser@prod01:~$ vi $HADOOP_HOME/etc/hadoop/core-site.xml
Add the configuration property (NameNode property: fs.dafault.name).
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
4.2.3 Changes to hdfs-site.xml
hduser@prod01:~$ vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml
Add the configuration property (NameNode property: dfs.name.dir, DataNode property: dfs.data.dir).
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hduser/hadoopdata/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hduser/hadoopdata/hdfs/datanode</value>
</property>
<name>dfs.name.dir</name>
<value>file:///home/hduser/hadoopdata/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hduser/hadoopdata/hdfs/datanode</value>
</property>
</configuration>
4.2.3 Changes to mapred-site.xml
Here, first we will copy the mapred-site.xml.template to mapred-site.xml and then will add property to it.
hduser@prod01:~$ cp $HADOOP_HOME/etc/hadoop/mapred-site.xml.template $HADOOP_HOME/etc/hadoop/mapred-site.xml
hduser@prod01:~$ vi $HADOOP_HOME/etc/hadoop/mapred-site.xml
hduser@prod01:~$ vi $HADOOP_HOME/etc/hadoop/mapred-site.xml
Add the configuration property (mapreduce.framework.name)
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Note: If you didn't specify this then Resource Manager UI (http://localhost:8088) will not show any jobs.
4.2.4 Changes to yarn-site.xml
4.2.4 Changes to yarn-site.xml
hduser@prod01:~$ vi $HADOOP_HOME/etc/hadoop/yarn-site.xml
Add the configuration property
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Step 5: Verify and format HDFS File system
5.1 Format HDFS file system
hduser@pooja:~$ hdfs namenode -format
...Snippet
17/08/24 16:08:36 INFO util.GSet: capacity = 2^15 = 32768 entries
17/08/24 16:08:36 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1601791069-127.0.1.1-1503616
17/08/24 16:08:37 INFO common.Storage: Storage directory /home/hduser/hadoopdata/hdfs/namenode has been successfully formatted.
17/08/24 16:08:37 INFO namenode.FSImageFormatProtobuf: Saving image file /home/hduser/hadoopdata/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 using no compression
17/08/24 16:08:37 INFO namenode.FSImageFormatProtobuf: Image file /home/hduser/hadoopdata/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 of size 323 bytes saved in 0 seconds.
17/08/24 16:08:37 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
17/08/24 16:08:37 INFO util.ExitUtil: Exiting with status 0
17/08/24 16:08:37 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at pooja/127.0.1.1
************************************************************/
5.2 Verify the format (Make sure hadoopdata/hdfs/* folder created)
hduser@prod01:~$ ls -ltr hadoopdata/hdfs/
total 4
drwxrwxr-x 3 hduser hduser 4096 Aug 24 16:09 namenode
Note: This is same path as specify in hdfs-site.xml property dfs-name-dir
Step 6: Start single node cluster
We will start the hadoop cluster using the hadoop start-up script.
6.1 Start HDFS
hduser@prod01:~$ start-dfs.sh
17/08/24 16:38:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/hduser/hadoop-2.8.1/logs/hadoop-hduser-namenode-prod01.out
localhost: starting datanode, logging to /home/hduser/hadoop-2.8.1/logs/hadoop-hduser-datanode-prod01.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is be:b3:7d:41:89:03:15:04:1c:84:e3:d9:69:1f:c8:5d.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /home/hduser/hadoop-2.8.1/logs/hadoop-hduser-secondarynamenode-prod01.out
17/08/24 16:39:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
6.2 Start yarn
hduser@prod01:~$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hduser/hadoop-2.8.1/logs/yarn-hduser-resourcemanager-prod01.out
localhost: starting nodemanager, logging to /home/hduser/hadoop-2.8.1/logs/yarn-hduser-nodemanager-prod01.out
6.3 Verify if all process started
hduser@prod01:~$ jps
6775 DataNode
7209 ResourceManager
7017 SecondaryNameNode
6651 NameNode
7339 NodeManager
7663 Jps
6.4 Run the PI Mapreduce job from the hadoop-examples jar.
hduser@prod1:~$ yarn jar hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.1.jar pi 4 1000
Step 7: Hadoop Web Interface
Web UI of NameNode(http://localhost:50070)
Resource Manager UI (http://localhost:8088).
It will show all jobs running and resources on cluster information.This will help monitor the jobs running and progress report of the same.
It will show all jobs running and resources on cluster information.This will help monitor the jobs running and progress report of the same.
Step 8: Stopping the hadoop
8.1 Stop Yarn processeshduser@prod01:~$ stop-yarn.sh
stopping yarn daemons
stopping resourcemanager
localhost: stopping nodemanager
localhost: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
no proxyserver to stop
hduser@prod01:~$ stop-dfs.sh
17/08/24 17:11:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Stopping namenodes on [localhost]
localhost: stopping namenode
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
17/08/24 17:12:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Hope you are able to follow my instructions on Hadoop Pseudo Mode Setup. Please write to me if any of you are still facing problem.
Happy Coding!!!!
No comments:
Post a Comment