Wednesday, January 25, 2017

Configure IntelliJ for Android Development on CentOS

Mobile Application

In today world, the application development for mobile has increased magnificently. The application from online payment to e-shopping to digital assistance to interactive messaging to many more operations are now just click away using mobile.
Mobile application user interface can be developed using a foray of technologies such as HTML 5, CSS,Javascript, Java, Android or iOS.

In this post, I will be discussing about setting up Android environment on existing IntelliJ.  

IntelliJ set up for Android Development

Perform below steps for setup.

Step 1. Install Java 8 or Java 7 JDK

$ java -version
java version "1.8.0_72"
Java(TM) SE Runtime Environment (build 1.8.0_72-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.72-b15, mixed mode)

Step 2. Install Android SDK

[user@localhost ~]$ cd /opt
[user@localhost opt]$ sudo wget http://dl.google.com/android/android-sdk_r24.4.1-linux.tgz
[sudo] password for pooja: 
--2017-01-24 22:25:23--  http://dl.google.com/android/android-sdk_r24.4.1-linux.tgz
Resolving dl.google.com (dl.google.com)... 172.217.6.46, 2607:f8b0:4005:805::200e
Connecting to dl.google.com (dl.google.com)|172.217.6.46|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 326412652 (311M) [application/x-tar]
Saving to: ‘android-sdk_r24.4.1-linux.tgz’

100%[============================================================================================================>] 326,412,652  148KB/s   in 29m 58s

2017-01-24 22:55:21 (177 KB/s) - ‘android-sdk_r24.4.1-linux.tgz’ saved [326412652/326412652]

[user@localhost opt]$ sudo tar zxvf android-sdk_r24.4.1-linux.tgz
[user@localhost opt]$ sudo chown -R root:root android-sdk_r24.4.1-linux 
[user@localhost opt]$ sudo ln -s android-sdk_r24.4.1-linux android-sdk-linux 

#If not change ownership, you will get error "selected directory is not a valid home for android SDK" while setting Andriod SDK path in IntelliJ



[user@localhost opt]$ sudo chown -R user:group /opt/android-sdk-linux/

#sudo vim /etc/profile.d/android-sdk-env.sh

export ANDROID_HOME=/opt/android-sdk-linux
export PATH=$ANDROID_HOME/tools:$ANDROID_HOME/platform-tools:$PATH
# source /etc/profile.d/android-sdk-env.sh

Step 3: Open SDK Manager under SDK Android Tool

[user@localhost opt]sudo android-sdk-linux/tools/android


Now, Select All Tools option and press "Install 23 packages". Then the license screen is open as shown below.

Finally, select 'Install' button that will start download of packages.


Step 4: Install IntelliJ (if not exists)

Download IntelliJ Community Edition is free, download it and untar the file.

Step 5: Open IntelliJ or close project will open up below screen.
Now, select 'Create New Project' and then select Project type as "Android" as shown below


Now, Select option "Application Module" and select 'Next'.


Now, Select option 'New' button. 

Then the browser window will open up, Now select /opt/android-sdk-linux and press 'OK'

Lastly, the android version popup window will be shown as below



This way, we have configured existing IntelliJ for Android Development project. Now press 'Finish' button to create project.

I hope you are also able to configure your existing IntelliJ for Android development. If any problems, please write back and I love to hear from you.

Tuesday, January 17, 2017

Debugging Apache Hadoop (NameNode,DataNode,SNN,ResourceManager,NodeManager) using IntelliJ

In the previous blogs, I discuss the set up the environment and then download  Apache Hadoop code and then build it and also set it up in IDE (IntelliJ).

In this blog, I will focus on debugging Apache Hadoop code for understanding. 

I used remote debugging to connect and debug any of the Hadoop processes (NameNode,DataNode, SecondaryNameNode,ResourceManager,NodeManager).

Prerequisites
1. Apache Hadoop code on local machine.
2. Code is build (look for hadoop/hadoop-dist created)
3. Set up of the code in IntelliJ.

Let dive into the step to understand the debug process.

Step 1: Look for hadoop-dist directory in hadoop main directory.
Once hadoop code is build, the directory hadoop-dist is created in Hadoop main directory as shown below.

 Step 2: Move in the target directory. 
 [pooja@localhost hadoop]$ cd hadoop-dist/target/hadoop-3.0.0-alpha2-SNAPSHOT

The directory structure looks as below (It same as Apache Download  tar)

Step 3: Now, setup Hadoop configuration.
a. Change hadoop-env file to add JAVA_HOME path 

[pooja@localhost hadoop-3.0.0-alpha2-SNAPSHOT]$ vi etc/hadoop/hadoop-env.sh 
Add the below line. 
JAVA_HOME=$JAVA_HOME

b. Add configuration paramters (Note: I am doing minimum set up for running hadoop processes)

[pooja@localhost hadoop-3.0.0-alpha2-SNAPSHOT]$ vi etc/hadoop/core-site.xml
<configuration>
<property>
   <name>fs.default.name</name>
   <value>hdfs://localhost:9000</value>
 </property>
<configuration>

[pooja@localhost hadoop-3.0.0-alpha2-SNAPSHOT]$ vi etc/hadoop/hdfs-site.xml 
<configuration>
 <property>
 <name>dfs.replication</name>
  <value>1</value>
</property>
  <property>
   <name>dfs.name.dir</name>
   <value>file:///home/pooja/hadoopdata/hdfs/namenode</value>
  </property>
  <property>
   <name>dfs.data.dir</name>
     <value>file:///home/pooja/hadoopdata/hdfs/datanode</value>
 </property>
</configuration>

[pooja@localhost hadoop-3.0.0-alpha2-SNAPSHOT]$ vi etc/hadoop/yarn-site.xml
<configuration>
<property>
    <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
 </property>
</configuration>

Place the enviornment property in ~/.bashrc
export HADOOP_HOME=<hadoop source code directory>/hadoop/hadoop-dist/target/hadoop-3.0.0-alpha2-SNAPSHOT
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

  Step 4: Run all hadoop process

[pooja@localhost hadoop-3.0.0-alpha2-SNAPSHOT]$ sbin/start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [localhost.localdomain]
2017-01-17 20:27:44,335 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

[pooja@localhost hadoop-3.0.0-alpha2-SNAPSHOT]$ sbin/start-yarn.sh
Starting resourcemanager
Starting nodemanagers

[pooja@localhost hadoop-3.0.0-alpha2-SNAPSHOT]$ jps
25232 SecondaryNameNode
26337 Jps
24839 DataNode
24489 NameNode
25914 NodeManager
25597 ResourceManager

Step 5: Now stop all the processes.
[pooja@localhost hadoop-3.0.0-alpha2-SNAPSHOT]$ sbin/stop-yarn.sh
[pooja@localhost hadoop-3.0.0-alpha2-SNAPSHOT]$ sbin/stop-dfs.sh

Step 6: Debug a Hadoop process (eg. NameNode) by performing below change in hadoop-env.sh or hdfs-env.sh.

[pooja@localhost hadoop-3.0.0-alpha2-SNAPSHOT]$ vi etc/hadoop/hadoop-env.sh  
Add below line.
export HDFS_NAMENODE_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,address=5000,server=y,suspend=n"

Simlarly, we can debug below processes:
YARN_RESOURCEMANAGER_OPTS
YARN_NODEMANAGER_OPTS
HDFS_NAMENODE_OPTS
HDFS_DATANODE_OPTS
HDFS_SECONDARYNAMENODE_OPTS 

Step 7: Enable remote debugging in IDE (IntelliJ) as shown below.
Note: Identify the main class for NameNode process by looking in startup script.

Open NameNode.java class ->Run/Debug Configuration (+)->Remote-> Change 'port' to 5000 (textbox) ->Apply button


Step 8: Now start the namenode process and put the break point in NameNode.java class as shown below.

Start the process:
[pooja@localhost hadoop-3.0.0-alpha2-SNAPSHOT]$ sbin/start-dfs.sh

Start the debugger(Shift+9):


And now can debug the code as shown below.

I hope everyone is able to set up the code, if any problem. Please do write, I will be happy to help you.
In the next blog, will be writing about the steps for making patch for Apache Hadoop Contribution. 

Happy Coding and Keep Learning !!!!

Importing Apache Hadoop (HDFS,Yarn) module to IntelliJ

In previous blog, I wrote about the steps to set the environment and download Apache Hadoop code on our machine for understanding and contributing. In this blog, I will walk through the code set up on IDE (IntelliJ here).

By now, I presume to Apache Hadoop code is on our machine and also code is compiled. If not follow the blog.

Please follow below steps for importing  HDFS module on IntelliJ

Step 1: Open IntelliJ (either using short-link or idea.sh) and then close project if already open as shown below


Step 2: In below screen, choose Import project as shown below.

Step 3: Now, you have to browse to the folder you want to import. Select Hadoop/hadoop-hdfs-project/hadoop-hdfs folder directory and press 'OK'.


Step 4: The below screen will be shown. Please select the option "Import project from external model" and Click 'next'.



Step 5: Now, Keep pressing next->next and then finish. The project will be imported in IntelliJ as shown below.


Now, Apache Hadoop HDFS module is imported in IntelliJ. You can import other module (YARN,Common) similarly.

I hope all viewers are able to import the Apache Hadoop project successfully in IntelliJ. If facing any issues, please discuss as I will be happy assisting you all.

In the next tutorial, I will discuss the steps of debugging Hadoop.
   

Thursday, January 12, 2017

Contribute to Apache Hadoop

From long time, I had desire to contribute to open source Apache Hadoop. Today, I was free so worked on setup of Hadoop code on my local machine for development. I am documenting the steps as it may be useful for any newcomers.

Below are the steps to set up the Hadoop code for development

Step 1:  Install Java JDK 8 and above

$ java -version
java version "1.8.0_72"
Java(TM) SE Runtime Environment (build 1.8.0_72-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.72-b15, mixed mode)

Step 2: Install Apache Maven version 3 or later

mvn -version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T08:41:47-08:00)
Maven home: /usr/local/apache-maven
Java version: 1.8.0_72, vendor: Oracle Corporation
Java home: /usr/java/jdk1.8.0_72/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.10.0-514.2.2.el7.x86_64", arch: "amd64", family: "unix"

Step 3:  Install Google protocol buffer (version 2.5.0)
Make sure protocol buffer version is 2.5.0
I have installed the Google protocol buffer higher version 3.1.0, but when compiling code got below error code.
[ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:3.0.0-alpha2-SNAPSHOT:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: protoc version is 'libprotoc 3.1.0', expected version is '2.5.0' -> [Help 1]
[ERROR] 

Step 4:  Download the hadoop source code

We can either clone the directory or create a fork of directory and then clone it.

a) Directly cloning the directory.
 git clone git://git.apache.org/hadoop.git
b) Create fork as shown below:


And then download the code as shown below:

 $ git clone https://github.com/poojagpta/hadoop

Syn the fork project with current project changes
Add the remote link:
 $git remote add upstream https://github.com/apache/hadoop

 $ git remote -v
origin https://github.com/poojagpta/hadoop (fetch)
origin https://github.com/poojagpta/hadoop (push)
upstream https://github.com/apache/hadoop (fetch)
upstream https://github.com/apache/hadoop (push)

Now, if want to fetch latest code:
$ git fetch upstream
$ git checkout trunk

Step 5: Compile the downloaded code
$ cd hadoop
$ mvn clean install -Pdist -Dtar -Ptest-patch -DskipTests -Denforcer.skip=true

Snippet Output:
[INFO] --- maven-install-plugin:2.5.1:install (default-install) @ hadoop-client-modules ---
[INFO] Installing /home/pooja/dev/hadoop/hadoop-client-modules/pom.xml to /home/pooja/.m2/repository/org/apache/hadoop/hadoop-client-modules/3.0.0-alpha2-SNAPSHOT/hadoop-client-modules-3.0.0-alpha2-SNAPSHOT.pom
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Hadoop Main ................................. SUCCESS [  1.780 s]
[INFO] Apache Hadoop Build Tools .......................... SUCCESS [  2.560 s]
[INFO] Apache Hadoop Project POM .......................... SUCCESS [  2.236 s]
[INFO] Apache Hadoop Annotations .......................... SUCCESS [  4.824 s]
[INFO] Apache Hadoop Assemblies ........................... SUCCESS [  0.314 s]
[INFO] Apache Hadoop Project Dist POM ..................... SUCCESS [  1.834 s]
[INFO] Apache Hadoop Maven Plugins ........................ SUCCESS [  9.167 s]
[INFO] Apache Hadoop MiniKDC .............................. SUCCESS [  5.918 s]
[INFO] Apache Hadoop Auth ................................. SUCCESS [ 20.083 s]
[INFO] Apache Hadoop Auth Examples ........................ SUCCESS [  7.650 s]
[INFO] Apache Hadoop Common ............................... SUCCESS [02:03 min]
[INFO] Apache Hadoop NFS .................................. SUCCESS [ 12.138 s]
[INFO] Apache Hadoop KMS .................................. SUCCESS [ 13.088 s]
[INFO] Apache Hadoop Common Project ....................... SUCCESS [  0.138 s]
[INFO] Apache Hadoop HDFS Client .......................... SUCCESS [ 54.973 s]
[INFO] Apache Hadoop HDFS ................................. SUCCESS [01:51 min]
[INFO] Apache Hadoop HDFS Native Client ................... SUCCESS [  1.323 s]
[INFO] Apache Hadoop HttpFS ............................... SUCCESS [ 41.081 s]
[INFO] Apache Hadoop HDFS-NFS ............................. SUCCESS [ 12.680 s]
[INFO] Apache Hadoop HDFS Project ......................... SUCCESS [  0.070 s]
[INFO] Apache Hadoop YARN ................................. SUCCESS [  0.073 s]
[INFO] Apache Hadoop YARN API ............................. SUCCESS [ 35.955 s]
[INFO] Apache Hadoop YARN Common .......................... SUCCESS [01:38 min]
[INFO] Apache Hadoop YARN Server .......................... SUCCESS [  0.089 s]
[INFO] Apache Hadoop YARN Server Common ................... SUCCESS [ 22.489 s]
[INFO] Apache Hadoop YARN NodeManager ..................... SUCCESS [ 32.492 s]
[INFO] Apache Hadoop YARN Web Proxy ....................... SUCCESS [  8.606 s]
[INFO] Apache Hadoop YARN ApplicationHistoryService ....... SUCCESS [ 20.153 s]
[INFO] Apache Hadoop YARN Timeline Service ................ SUCCESS [02:26 min]
[INFO] Apache Hadoop YARN ResourceManager ................. SUCCESS [ 55.442 s]
[INFO] Apache Hadoop YARN Server Tests .................... SUCCESS [  5.479 s]
[INFO] Apache Hadoop YARN Client .......................... SUCCESS [ 17.122 s]
[INFO] Apache Hadoop YARN SharedCacheManager .............. SUCCESS [  8.654 s]
[INFO] Apache Hadoop YARN Timeline Plugin Storage ......... SUCCESS [  8.234 s]
[INFO] Apache Hadoop YARN Timeline Service HBase tests .... SUCCESS [02:51 min]
[INFO] Apache Hadoop YARN Applications .................... SUCCESS [  0.044 s]
[INFO] Apache Hadoop YARN DistributedShell ................ SUCCESS [  8.076 s]
[INFO] Apache Hadoop YARN Unmanaged Am Launcher ........... SUCCESS [  5.937 s]
[INFO] Apache Hadoop YARN Site ............................ SUCCESS [  0.077 s]
[INFO] Apache Hadoop YARN Registry ........................ SUCCESS [ 11.366 s]
[INFO] Apache Hadoop YARN UI .............................. SUCCESS [  1.832 s]
[INFO] Apache Hadoop YARN Project ......................... SUCCESS [  8.590 s]
[INFO] Apache Hadoop MapReduce Client ..................... SUCCESS [  0.225 s]
[INFO] Apache Hadoop MapReduce Core ....................... SUCCESS [ 43.115 s]
[INFO] Apache Hadoop MapReduce Common ..................... SUCCESS [ 27.865 s]
[INFO] Apache Hadoop MapReduce Shuffle .................... SUCCESS [  9.009 s]
[INFO] Apache Hadoop MapReduce App ........................ SUCCESS [ 24.415 s]
[INFO] Apache Hadoop MapReduce HistoryServer .............. SUCCESS [ 14.692 s]
[INFO] Apache Hadoop MapReduce JobClient .................. SUCCESS [ 29.361 s]
[INFO] Apache Hadoop MapReduce HistoryServer Plugins ...... SUCCESS [  4.828 s]
[INFO] Apache Hadoop MapReduce NativeTask ................. SUCCESS [ 10.299 s]
[INFO] Apache Hadoop MapReduce Examples ................... SUCCESS [ 12.238 s]
[INFO] Apache Hadoop MapReduce ............................ SUCCESS [  4.336 s]
[INFO] Apache Hadoop MapReduce Streaming .................. SUCCESS [ 17.591 s]
[INFO] Apache Hadoop Distributed Copy ..................... SUCCESS [ 13.083 s]
[INFO] Apache Hadoop Archives ............................. SUCCESS [  6.314 s]
[INFO] Apache Hadoop Archive Logs ......................... SUCCESS [  6.982 s]
[INFO] Apache Hadoop Rumen ................................ SUCCESS [ 12.048 s]
[INFO] Apache Hadoop Gridmix .............................. SUCCESS [ 12.327 s]
[INFO] Apache Hadoop Data Join ............................ SUCCESS [  5.819 s]
[INFO] Apache Hadoop Extras ............................... SUCCESS [  5.794 s]
[INFO] Apache Hadoop Pipes ................................ SUCCESS [  0.036 s]
[INFO] Apache Hadoop OpenStack support .................... SUCCESS [  8.138 s]
[INFO] Apache Hadoop Amazon Web Services support .......... SUCCESS [ 53.458 s]
[INFO] Apache Hadoop Azure support ........................ SUCCESS [ 20.452 s]
[INFO] Apache Hadoop Aliyun OSS support ................... SUCCESS [ 11.273 s]
[INFO] Apache Hadoop Client Aggregator .................... SUCCESS [  3.698 s]
[INFO] Apache Hadoop Mini-Cluster ......................... SUCCESS [  1.618 s]
[INFO] Apache Hadoop Scheduler Load Simulator ............. SUCCESS [ 12.085 s]
[INFO] Apache Hadoop Azure Data Lake support .............. SUCCESS [ 27.289 s]
[INFO] Apache Hadoop Tools Dist ........................... SUCCESS [  5.002 s]
[INFO] Apache Hadoop Kafka Library support ................ SUCCESS [  7.041 s]
[INFO] Apache Hadoop Tools ................................ SUCCESS [  0.052 s]
[INFO] Apache Hadoop Client API ........................... SUCCESS [02:09 min]
[INFO] Apache Hadoop Client Runtime ....................... SUCCESS [01:21 min]
[INFO] Apache Hadoop Client Packaging Invariants .......... SUCCESS [  3.431 s]
[INFO] Apache Hadoop Client Test Minicluster .............. SUCCESS [03:13 min]
[INFO] Apache Hadoop Client Packaging Invariants for Test . SUCCESS [  0.329 s]
[INFO] Apache Hadoop Client Packaging Integration Tests ... SUCCESS [  1.542 s]
[INFO] Apache Hadoop Distribution ......................... SUCCESS [ 42.013 s]
[INFO] Apache Hadoop Client Modules ....................... SUCCESS [  0.105 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 32:42 min
[INFO] Finished at: 2017-01-12T11:30:58-08:00
[INFO] Final Memory: 131M/819M
[INFO] ------------------------------------------------------------------------


I hope you are also to set up Hadoop project and ready to contribute like me. Please let me know if you are still facing issues, I love to help you.
In the next tutorial, I will set up the code in IntelliJ and steps to debug the code.

Thanks and happy coding !!!

Problem Encounter will compiling code:

1. Some of the junit are failing.

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.hadoop.minikdc.TestMiniKdc
Tests run: 3, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 3.451 sec <<< FAILURE! - in org.apache.hadoop.minikdc.TestMiniKdc
testKeytabGen(org.apache.hadoop.minikdc.TestMiniKdc)  Time elapsed: 1.314 sec  <<< ERROR!
java.lang.RuntimeException: Unable to parse:includedir /etc/krb5.conf.d/
at org.apache.kerby.kerberos.kerb.common.Krb5Parser.load(Krb5Parser.java:72)
at org.apache.kerby.kerberos.kerb.common.Krb5Conf.addKrb5Config(Krb5Conf.java:47)
at org.apache.kerby.kerberos.kerb.client.ClientUtil.getDefaultConfig(ClientUtil.java:94)
at org.apache.kerby.kerberos.kerb.client.KrbClientBase.<init>(KrbClientBase.java:51)
at org.apache.kerby.kerberos.kerb.client.KrbClient.<init>(KrbClient.java:38)
at org.apache.kerby.kerberos.kerb.server.SimpleKdcServer.<init>(SimpleKdcServer.java:54)
at org.apache.hadoop.minikdc.MiniKdc.start(MiniKdc.java:280)
at org.apache.hadoop.minikdc.KerberosSecurityTestcase.startMiniKdc(KerberosSecurityTestcase.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)

testMiniKdcStart(org.apache.hadoop.minikdc.TestMiniKdc)  Time elapsed: 1.002 sec  <<< ERROR!
java.lang.RuntimeException: Unable to parse:includedir /etc/krb5.conf.d/
at org.apache.kerby.kerberos.kerb.common.Krb5Parser.load(Krb5Parser.java:72)
at org.apache.kerby.kerberos.kerb.common.Krb5Conf.addKrb5Config(Krb5Conf.java:47)
at org.apache.kerby.kerberos.kerb.client.ClientUtil.getDefaultConfig(ClientUtil.java:94)
at org.apache.kerby.kerberos.kerb.client.KrbClientBase.<init>(KrbClientBase.java:51)
at org.apache.kerby.kerberos.kerb.client.KrbClient.<init>(KrbClient.java:38)
at org.apache.kerby.kerberos.kerb.server.SimpleKdcServer.<init>(SimpleKdcServer.java:54)
at org.apache.hadoop.minikdc.MiniKdc.start(MiniKdc.java:280)
at org.apache.hadoop.minikdc.KerberosSecurityTestcase.startMiniKdc(KerberosSecurityTestcase.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)

testKerberosLogin(org.apache.hadoop.minikdc.TestMiniKdc)  Time elapsed: 1.008 sec  <<< ERROR!
java.lang.RuntimeException: Unable to parse:includedir /etc/krb5.conf.d/
at org.apache.kerby.kerberos.kerb.common.Krb5Parser.load(Krb5Parser.java:72)
at org.apache.kerby.kerberos.kerb.common.Krb5Conf.addKrb5Config(Krb5Conf.java:47)
at org.apache.kerby.kerberos.kerb.client.ClientUtil.getDefaultConfig(ClientUtil.java:94)
at org.apache.kerby.kerberos.kerb.client.KrbClientBase.<init>(KrbClientBase.java:51)
at org.apache.kerby.kerberos.kerb.client.KrbClient.<init>(KrbClient.java:38)
at org.apache.kerby.kerberos.kerb.server.SimpleKdcServer.<init>(SimpleKdcServer.java:54)
at org.apache.hadoop.minikdc.MiniKdc.start(MiniKdc.java:280)
at org.apache.hadoop.minikdc.KerberosSecurityTestcase.startMiniKdc(KerberosSecurityTestcase.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)

Running org.apache.hadoop.minikdc.TestChangeOrgNameAndDomain
Tests run: 3, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 3.289 sec <<< FAILURE! - in org.apache.hadoop.minikdc.TestChangeOrgNameAndDomain
testKeytabGen(org.apache.hadoop.minikdc.TestChangeOrgNameAndDomain)  Time elapsed: 1.18 sec  <<< ERROR!
java.lang.RuntimeException: Unable to parse:includedir /etc/krb5.conf.d/
at org.apache.kerby.kerberos.kerb.common.Krb5Parser.load(Krb5Parser.java:72)
at org.apache.kerby.kerberos.kerb.common.Krb5Conf.addKrb5Config(Krb5Conf.java:47)
at org.apache.kerby.kerberos.kerb.client.ClientUtil.getDefaultConfig(ClientUtil.java:94)
at org.apache.kerby.kerberos.kerb.client.KrbClientBase.<init>(KrbClientBase.java:51)
at org.apache.kerby.kerberos.kerb.client.KrbClient.<init>(KrbClient.java:38)
at org.apache.kerby.kerberos.kerb.server.SimpleKdcServer.<init>(SimpleKdcServer.java:54)
at org.apache.hadoop.minikdc.MiniKdc.start(MiniKdc.java:280)
at org.apache.hadoop.minikdc.KerberosSecurityTestcase.startMiniKdc(KerberosSecurityTestcase.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)

testMiniKdcStart(org.apache.hadoop.minikdc.TestChangeOrgNameAndDomain)  Time elapsed: 1.014 sec  <<< ERROR!
java.lang.RuntimeException: Unable to parse:includedir /etc/krb5.conf.d/
at org.apache.kerby.kerberos.kerb.common.Krb5Parser.load(Krb5Parser.java:72)
at org.apache.kerby.kerberos.kerb.common.Krb5Conf.addKrb5Config(Krb5Conf.java:47)
at org.apache.kerby.kerberos.kerb.client.ClientUtil.getDefaultConfig(ClientUtil.java:94)
at org.apache.kerby.kerberos.kerb.client.KrbClientBase.<init>(KrbClientBase.java:51)
at org.apache.kerby.kerberos.kerb.client.KrbClient.<init>(KrbClient.java:38)
at org.apache.kerby.kerberos.kerb.server.SimpleKdcServer.<init>(SimpleKdcServer.java:54)
at org.apache.hadoop.minikdc.MiniKdc.start(MiniKdc.java:280)
at org.apache.hadoop.minikdc.KerberosSecurityTestcase.startMiniKdc(KerberosSecurityTestcase.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)

testKerberosLogin(org.apache.hadoop.minikdc.TestChangeOrgNameAndDomain)  Time elapsed: 1.009 sec  <<< ERROR!
java.lang.RuntimeException: Unable to parse:includedir /etc/krb5.conf.d/
at org.apache.kerby.kerberos.kerb.common.Krb5Parser.load(Krb5Parser.java:72)
at org.apache.kerby.kerberos.kerb.common.Krb5Conf.addKrb5Config(Krb5Conf.java:47)
at org.apache.kerby.kerberos.kerb.client.ClientUtil.getDefaultConfig(ClientUtil.java:94)
at org.apache.kerby.kerberos.kerb.client.KrbClientBase.<init>(KrbClientBase.java:51)
at org.apache.kerby.kerberos.kerb.client.KrbClient.<init>(KrbClient.java:38)
at org.apache.kerby.kerberos.kerb.server.SimpleKdcServer.<init>(SimpleKdcServer.java:54)
at org.apache.hadoop.minikdc.MiniKdc.start(MiniKdc.java:280)
at org.apache.hadoop.minikdc.KerberosSecurityTestcase.startMiniKdc(KerberosSecurityTestcase.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)


Results :

Tests in error: 
  TestMiniKdc>KerberosSecurityTestcase.startMiniKdc:49 » Runtime Unable to parse...
  TestMiniKdc>KerberosSecurityTestcase.startMiniKdc:49 » Runtime Unable to parse...
  TestMiniKdc>KerberosSecurityTestcase.startMiniKdc:49 » Runtime Unable to parse...
  TestChangeOrgNameAndDomain>KerberosSecurityTestcase.startMiniKdc:49 » Runtime ...
  TestChangeOrgNameAndDomain>KerberosSecurityTestcase.startMiniKdc:49 » Runtime ...
  TestChangeOrgNameAndDomain>KerberosSecurityTestcase.startMiniKdc:49 » Runtime ...

Tests run: 6, Failures: 0, Errors: 6, Skipped: 0

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Hadoop Main ................................. SUCCESS [  2.060 s]
[INFO] Apache Hadoop Build Tools .......................... SUCCESS [  1.584 s]
[INFO] Apache Hadoop Project POM .......................... SUCCESS [  2.018 s]
[INFO] Apache Hadoop Annotations .......................... SUCCESS [  4.161 s]
[INFO] Apache Hadoop Assemblies ........................... SUCCESS [  0.253 s]
[INFO] Apache Hadoop Project Dist POM ..................... SUCCESS [  1.803 s]
[INFO] Apache Hadoop Maven Plugins ........................ SUCCESS [  9.047 s]
[INFO] Apache Hadoop MiniKDC .............................. FAILURE [ 11.461 s]
[INFO] Apache Hadoop Auth ................................. SKIPPED
[INFO] Apache Hadoop Auth Examples ........................ SKIPPED
......
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on project hadoop-minikdc: There are test failures.
[ERROR] 
[ERROR] Please refer to /home/pooja/dev/hadoop/hadoop-common-project/hadoop-minikdc/target/surefire-reports for the individual test results.
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :hadoop-minikdc

Solution:
I fixed the problem by skipping the Junit (-DskipTests) for entire build and run Junit only for module you want to start fixing code.

2. Got below error for module  'Hadoop HDFS'. 

[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Hadoop Main ................................. SUCCESS [  1.612 s]
[INFO] Apache Hadoop Build Tools .......................... SUCCESS [  1.507 s]
[INFO] Apache Hadoop Project POM .......................... SUCCESS [  2.033 s]
[INFO] Apache Hadoop Annotations .......................... SUCCESS [  4.937 s]
[INFO] Apache Hadoop Assemblies ........................... SUCCESS [  0.312 s]
[INFO] Apache Hadoop Project Dist POM ..................... SUCCESS [  1.670 s]
[INFO] Apache Hadoop Maven Plugins ........................ SUCCESS [  8.432 s]
[INFO] Apache Hadoop MiniKDC .............................. SUCCESS [  5.359 s]
[INFO] Apache Hadoop Auth ................................. SUCCESS [ 14.786 s]
[INFO] Apache Hadoop Auth Examples ........................ SUCCESS [  5.682 s]
[INFO] Apache Hadoop Common ............................... SUCCESS [02:04 min]
[INFO] Apache Hadoop NFS .................................. SUCCESS [ 12.310 s]
[INFO] Apache Hadoop KMS .................................. SUCCESS [ 14.775 s]
[INFO] Apache Hadoop Common Project ....................... SUCCESS [  0.074 s]
[INFO] Apache Hadoop HDFS Client .......................... SUCCESS [01:19 min]
[INFO] Apache Hadoop HDFS ................................. FAILURE [  5.697 s]
[INFO] Apache Hadoop HDFS Native Client ................... SKIPPED
.........................................
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 04:48 min
[INFO] Finished at: 2017-01-11T15:43:27-08:00
[INFO] Final Memory: 90M/533M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project hadoop-hdfs: Could not resolve dependencies for project org.apache.hadoop:hadoop-hdfs:jar:3.0.0-alpha2-SNAPSHOT: The following artifacts could not be resolved: org.apache.hadoop:hadoop-kms:jar:classes:3.0.0-alpha2-SNAPSHOT, org.apache.hadoop:hadoop-kms:jar:tests:3.0.0-alpha2-SNAPSHOT: Could not find artifact org.apache.hadoop:hadoop-kms:jar:classes:3.0.0-alpha2-SNAPSHOT in apache.snapshots.https (https://repository.apache.org/content/repositories/snapshots) -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :hadoop-hdfs

Solution:

The problem suggest that problem in intallation of Openssl.
I want to install OpenSSL to be able to use the HTTPS protocol in HDFS or curl or different application.
openssl (which is the binary) is installed, but OpenSSL (which is required for the HTTPS protocol is not installed).

You can install openssl using below command
$sudo yum install openssl openssl-devel

$ which openssl
/usr/bin/openssl

$ openssl version
OpenSSL 1.0.1e-fips 11 Feb 2013
We can solve the problem using 2 approaches

   a. Create a link to openssl path as shown below

      ln -s /usr/bin/openssl /usr/local/openssl

or
    b. Download OpenSSL and compile it as shown below

$wget https://www.openssl.org/source/openssl-1.0.1e.tar.gz
$tar -xvf openssl-1.0.1e.tar.gz
$cd openssl-1.0.1e
$./config --prefix=/usr/local/openssl --openssldir=/usr/local/openssl
$ make
$ sudo make install

3. Error with enforcer
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce (depcheck) on project hadoop-hdfs-client: Some Enforcer rules have failed. Look above for specific messages explaining why the rule failed. -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException

Solution:
I skipped enforcer (-Denforcer.skip=true) as this constrains allow only unix, mac machine. 

Tuesday, January 10, 2017

Spark Kafka Integration use case

Apache Spark

Apache Spark is distributed computing platform that provides near real time processing of data from various data sources. The  data sources can vary from HDFS file system or Kafka or Flume or Relational Database.

There are many spark components which facilitate the integration with various data sources such as Spark SQL, Spark Streaming, Mlib, GraphX.

Apache Kafka

Apache Kafka is distributed fault tolerant streaming platform that used to build the real-time data pipeline. It works on publisher and subscriber model.

Use Case

Recently, I worked on Kafka Spark integration for a simple fraud detection real time data pipeline. In this, we were tracking the Customer Activity and purchase events of Customer on e-Commerce site.Then, based on purchase events we were categorizing suspiciously fraudulent Customers. Now, we were filtering the Customer Activity for interested customer and then performing operations which is consumed by another stream for further processing.

We have consider many implementation plan and one of them is explained below.

Data Model (Just an example)

Suspicious Fraudulent Customer (demo1):



Customer_Id
Receive Flag
Name
Sex
Age
City
1
A
AAA
F
23
Union City
2
A
BBB
M
77
San Mateo
3
F
NNN
F
33
San Francisco

Customer Activity (demo2)


Customer_Id
Page Visit
Product
1
store/5426/whats-new
1
ip/product-page/product-Desc
16503225
2
ip/product-page/product-Desc
9988334
3
search/?query=battery
3
cp/Gift-Cards
3
account/trackorder



We need to process above data and filter only active data. So sample output data will be as follows.


Cus_Id
Flag
Name
Sex
Age
City
Page Visit
Product
1
A
AAA
F
23
Union City
store/5426/whats-new
1
A
AAA
F
23
Union City
ip/product-page/product-Desc
16503225
2
A
BBB
M
77
San Mateo
ip/product-page/product-Desc
9988334



Implementation Strategy

Kafka streaming:

In this data pipeline, we were receiving 2 Kafka stream and output stream  as described below.

  1. Suspicious Fraudulent Customer (demo1)

  2. Customer Activity (demo2)

  3. Output (test-output)

Spark Streaming component:

The Spark Streaming API will integrate with Kafka topics (demo1, demo2). Now, the demo1 data will be cached in memory and update for any change in active customer or add new customer. The data from demo1, demo2 is joined together and filter for active customer which is output to 'test-output'.

  1. Subscribe Suspicious Fraudulent Customer  (demo1).

  2. Subscribe to Customer Activity (demo2).

  3. Update Suspicious Fraudulent Customer in memory (so as to reflect the update in demo1).

  4. Join data from demo1 and demo2, then filter based on flag.

  5. Perform operation on the data.

  6.  Output the result to test-output for further processing.

I have implement the demo code in scala.

The working model

Let start the Spark server and submit the Spark Job to Spark  cluster as shown below

screenshot-from-2017-01-10-20-46-20

Note: Application Id: app-20170110204548-0000 is started and running.

Now, start Kafka server and start 3 topics- demo1 (Producer), demo2(Producer), test-output(consumer).

For this tutorial, to show our use case we will be showing manual data entry.

Kafka Topic (demo1):

Screenshot from 2017-01-10 20-58-51.pngKafka Topic(demo2):

Screenshot from 2017-01-10 20-59-08.png

Kafka Topic (test-output): Receive output as shown below:

Screenshot from 2017-01-10 21-02-32.pngNote: customer 3 is inactive so it will not be shown.

Now, there are changes in demo1 and will add new active customer 4 and update customer 2  to inactive and also change customer 3 to active as shown below:

[kafka@localhost kafka_2.11-0.10.1.0]$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic demo1
4,A NNN F 33 San Francisco
2,F TTT F 22 XXX
3,A HHH M 56 MMM

Then, input some Customer Activity (demo2)

[kafka@localhost kafka_2.11-0.10.1.0]$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic demo2
1,store/5426/whats-new
2,ip/product-page/product-Desc 16503225
3,ip/product-page/product-Desc 9988334
4,search/?query=battery
4,cp/Gift-Cards
3,account/trackorder

Finally, output will show transaction of all active customer  in memory Customer 1,3,4.

[kafka@localhost kafka_2.11-0.10.1.0]$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test-output
(4,(A NNN F 33 San Francisco,search/?query=battery ))
(4,(A NNN F 33 San Francisco,cp/Gift-Cards ))
(3,(A HHH M 56 MMM,ip/product-page/product-Desc 9988334))
(3,(A HHH M 56 MMM,account/trackorder))
(1,(A AAA F 23 Union City,store/5426/whats-new ))

I hope you follow the use case. In case of any questions, please mail me, I would be glad to help you.