Big Data: Debugging Apache Hadoop (NameNode,DataNode,SNN,ResourceManager,NodeManager) using IntelliJ

In the previous blogs, I discuss the set up the environment and then download Apache Hadoop code and then build it and also set it up in IDE (IntelliJ).

In this blog, I will focus on debugging Apache Hadoop code for understanding.

I used remote debugging to connect and debug any of the Hadoop processes (NameNode,DataNode, SecondaryNameNode,ResourceManager,NodeManager).

Prerequisites

1. Apache Hadoop code on local machine.

2. Code is build (look for hadoop/hadoop-dist created)

3. Set up of the code in IntelliJ.

Let dive into the step to understand the debug process.

Step 1: Look for hadoop-dist directory in hadoop main directory.

Once hadoop code is build, the directory hadoop-dist is created in Hadoop main directory as shown below.

Step 2: Move in the target directory.

[pooja@localhost hadoop]$ cd hadoop-dist/target/hadoop-3.0.0-alpha2-SNAPSHOT

The directory structure looks as below (It same as Apache Download tar)

Step 3: Now, setup Hadoop configuration.

a. Change hadoop-env file to add JAVA_HOME path

[pooja@localhost hadoop-3.0.0-alpha2-SNAPSHOT]$ vi etc/hadoop/hadoop-env.sh

Add the below line.

JAVA_HOME=$JAVA_HOME

b. Add configuration paramters (Note: I am doing minimum set up for running hadoop processes)

[pooja@localhost hadoop-3.0.0-alpha2-SNAPSHOT]$ vi etc/hadoop/core-site.xml

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<configuration>

[pooja@localhost hadoop-3.0.0-alpha2-SNAPSHOT]$ vi etc/hadoop/hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/pooja/hadoopdata/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/pooja/hadoopdata/hdfs/datanode</value>
</property>
</configuration>

[pooja@localhost hadoop-3.0.0-alpha2-SNAPSHOT]$ vi etc/hadoop/yarn-site.xml

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

</configuration>

Place the enviornment property in ~/.bashrc

export HADOOP_HOME=<hadoop source code directory>/hadoop/hadoop-dist/target/hadoop-3.0.0-alpha2-SNAPSHOT

export HADOOP_INSTALL=$HADOOP_HOME

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

Step 4: Run all hadoop process

[pooja@localhost hadoop-3.0.0-alpha2-SNAPSHOT]$ sbin/start-dfs.sh

Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [localhost.localdomain]
2017-01-17 20:27:44,335 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

[pooja@localhost hadoop-3.0.0-alpha2-SNAPSHOT]$ sbin/start-yarn.sh

Starting resourcemanager
Starting nodemanagers

[pooja@localhost hadoop-3.0.0-alpha2-SNAPSHOT]$ jps

25232 SecondaryNameNode
26337 Jps
24839 DataNode
24489 NameNode
25914 NodeManager
25597 ResourceManager

Step 5: Now stop all the processes.

[pooja@localhost hadoop-3.0.0-alpha2-SNAPSHOT]$ sbin/stop-yarn.sh

[pooja@localhost hadoop-3.0.0-alpha2-SNAPSHOT]$ sbin/stop-dfs.sh

Step 6: Debug a Hadoop process (eg. NameNode) by performing below change in hadoop-env.sh or hdfs-env.sh.

[pooja@localhost hadoop-3.0.0-alpha2-SNAPSHOT]$ vi etc/hadoop/hadoop-env.sh

Add below line.

export HDFS_NAMENODE_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,address=5000,server=y,suspend=n"

Simlarly, we can debug below processes:

YARN_RESOURCEMANAGER_OPTS
YARN_NODEMANAGER_OPTS
HDFS_NAMENODE_OPTS
HDFS_DATANODE_OPTS

HDFS_SECONDARYNAMENODE_OPTS

Step 7: Enable remote debugging in IDE (IntelliJ) as shown below.

Note: Identify the main class for NameNode process by looking in startup script.

Open NameNode.java class ->Run/Debug Configuration (+)->Remote-> Change 'port' to 5000 (textbox) ->Apply button

Step 8: Now start the namenode process and put the break point in NameNode.java class as shown below.

Start the process:

[pooja@localhost hadoop-3.0.0-alpha2-SNAPSHOT]$ sbin/start-dfs.sh

Start the debugger(Shift+9):

And now can debug the code as shown below.

I hope everyone is able to set up the code, if any problem. Please do write, I will be happy to help you.

In the next blog, will be writing about the steps for making patch for Apache Hadoop Contribution.

Happy Coding and Keep Learning !!!!

Big Data

Tuesday, January 17, 2017

Debugging Apache Hadoop (NameNode,DataNode,SNN,ResourceManager,NodeManager) using IntelliJ

No comments:

Post a Comment