Showing posts with label Java. Show all posts
Showing posts with label Java. Show all posts

Tuesday, August 7, 2018

Pass config to Spark Hadoop

In one of my project, we needed to migrate the Hadoop Java code to Spark. The spark code was submitted via boto3 on EMR. The configs like start_date, end_date was required by InputFormat. We followed the below steps for reading the files with CustomInputFormat.

public class MyProcessor {

  private static final Logger logger = LoggerFactory.getLogger(MyProcessor.class);

  public static String isNull(String param) {
    return (param != null && !param.trim().isEmpty()) ?  param.trim() : null;
  }

  public static void main(String[] args) {

    logger.info("MyProcessor");

    SparkSession spark = SparkSession
        .builder()
        .appName("MyProcessor")
        .getOrCreate();

    SparkContext sc = spark.sparkContext();

    logger.info("input.format.start_date : "+sc.hadoopConfiguration().get("input.format.start_date"));
    logger.info("input.format.end_date : "+sc.hadoopConfiguration().get("input.format.end_date"));

    JavaSparkContext jsc = new JavaSparkContext(sc);

    JavaPairRDD<LongWritable, CustomWritable> rdd = jsc.newAPIHadoopRDD(sc.hadoopConfiguration(), CustomInputFormat.class,
        LongWritable.class, CustomWritable.class);

    JavaRDD rowsRdd = rdd.map(x -> RowFactory.create(isNull(x._2.getA()), isNull(x._2.getB()))).repartition(partitions);

    StructType schema = new StructType(new StructField[]{
        new StructField("a_str", DataTypes.StringType, true, Metadata.empty()),
        new StructField("b_str", DataTypes.StringType, true, Metadata.empty()),
    });

    Dataset<Row> df = spark.createDataFrame(rowsRdd, schema);
    df.cache();

    List<DQResult> results = new ArrayList();

    long recCount = df.count();
    logger.info("recCount : " + recCount);
    spark.close();
  }
}

Create a Uber jar using shade plugin or assembly plugin.

Pass the config by prepending the config spark.hadoop. The below command

spark-submit --class MyProcessor --conf spark.hadoop.input.format.start_date=1532131200000 --conf spark.hadoop.input.format.end_date=1532131200000 --master yarn --deploy-mode cluster /home/hadoop/jars/myjar.jar

Happy Coding !

Include config files in shade plugin

We were working on some project where we have to include `config` folder as the `src/main/resources` in maven project.

Add below lines to pom.xml

<build>
    <resources>
        <resource>
            <directory>${project.basedir}/conf</directory>
        </resource>
    </resources>
    <testResources>
        <testResource>
            <directory>${project.basedir}/conf</directory>
        </testResource>
        <testResource>
            <directory>${project.basedir}/src/test/resources</directory>
        </testResource>
    </testResources>
</build>

Sunday, September 20, 2015

Install JDK 7 on CentOS

This tutorial will guide you to install JDK (Java Development Kit) on CentOS 6.5 (also 5,7). Java is open source, platform independent programming language.

Now, we have many different platform and versions of java available, for now we will be installing Java SE(Standard Editions) and below version:

1. OpenJDK7
2. Oracle Java 7

Prerequisites
You are admin or non-admin user with sudo rights.

Uninstall existing JDK versions
You can have multiple version of JDK installed on machine but you can use one at a time.
If you want, you can remove existing installed java on the system using the below steps:

Note: This step is optional.

[code language="text"]
#List all existing jdk
$ rpm -aq | grep -i jdk
java-1.6.0-openjdk-1.6.0.36-1.13.8.1.el6_7.x86_64
java-1.6.0-openjdk-devel-1.6.0.36-1.13.8.1.el6_7.x86_64
java-1.7.0-openjdk-1.7.0.85-2.6.1.3.el6_7.x86_64

# Now, uninstalled all java version as shown below:
sudo yum remove java-1.6.0-openjdk-1.6.0.36-1.13.8.1.el6_7.x86_64
sudo yum remove java-1.7.0-openjdk-1.7.0.85-2.6.1.3.el6_7.x86_64
[/code]

Install OpenJDK7 JDK

To install OpenJDK7 use the below commands:
Note:Rpm package "java-1.7.0-openjdk" will install jre not jdk.

[code language="text"]
$ sudo yum install java-1.7.0-openjdk-devel

Loaded plugins: fastestmirror, refresh-packagekit
Setting up Install Process
Loading mirror speeds from cached hostfile
* base: mirror.supremebytes.com
* epel: mirrors.syringanetworks.net
* extras: mirror.supremebytes.com
* nux-dextop: li.nux.ro
* updates: mirror.supremebytes.com
Resolving Dependencies
--> Running transaction check
---> Package java-1.7.0-openjdk-devel.x86_64 1:1.7.0.85-2.6.1.3.el6_7 will be installed
--> Processing Dependency: java-1.7.0-openjdk = 1:1.7.0.85-2.6.1.3.el6_7 for package: 1:java-1.7.0-openjdk-devel-1.7.0.85-2.6.1.3.el6_7.x86_64
--> Running transaction check
---> Package java-1.7.0-openjdk.x86_64 1:1.7.0.85-2.6.1.3.el6_7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================
Package Arch Version Repository
Size
================================================================================
Installing:
java-1.7.0-openjdk-devel x86_64 1:1.7.0.85-2.6.1.3.el6_7 updates 9.4 M
Installing for dependencies:
java-1.7.0-openjdk x86_64 1:1.7.0.85-2.6.1.3.el6_7 updates 26 M

Transaction Summary
================================================================================
Install 2 Package(s)

Total download size: 35 M
Installed size: 126 M
Is this ok [y/N]: y
Downloading Packages:
(1/2): java-1.7.0-openjdk-1.7.0.85-2.6.1.3.el6_7.x86_64. | 26 MB 00:07
(2/2): java-1.7.0-openjdk-devel-1.7.0.85-2.6.1.3.el6_7.x | 9.4 MB 00:02
--------------------------------------------------------------------------------
Total 3.8 MB/s | 35 MB 00:09
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
Installing : 1:java-1.7.0-openjdk-1.7.0.85-2.6.1.3.el6_7.x86_64 1/2
Installing : 1:java-1.7.0-openjdk-devel-1.7.0.85-2.6.1.3.el6_7.x86_64 2/2
Verifying : 1:java-1.7.0-openjdk-devel-1.7.0.85-2.6.1.3.el6_7.x86_64 1/2
Verifying : 1:java-1.7.0-openjdk-1.7.0.85-2.6.1.3.el6_7.x86_64 2/2

Installed:
java-1.7.0-openjdk-devel.x86_64 1:1.7.0.85-2.6.1.3.el6_7

Dependency Installed:
java-1.7.0-openjdk.x86_64 1:1.7.0.85-2.6.1.3.el6_7

Complete!
[/code]

Verify the installation using command:

[code language="text"]
$ java -version
java version "1.7.0_85"
OpenJDK Runtime Environment (rhel-2.6.1.3.el6_7-x86_64 u85-b01)
OpenJDK 64-Bit Server VM (build 24.85-b03, mixed mode)
[/code]

Install Oracle JDK 7
To install OracleJDK7 use the below commands

[code language="text"]
#Download the tarball of oracle jdk
$ sudo wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/7u71-b14/jdk-7u71-linux-x64.tar.gz"

# untar the downloaded tarball in /opt/ directory
$ sudo tar xvf jdk-7u71-linux-x64.tar.gz -C /opt/

# change the ownership of untar jdk directory
$ sudo chown -R root: /opt/jdk1.7.0_71

#install jdk
$ sudo alternatives --install /usr/bin/java java /opt/jdk1.7.0_71/bin/java 1
$ sudo alternatives --install /usr/bin/javac javac /opt/jdk1.7.0_71/bin/javac 1
$ sudo alternatives --install /usr/bin/jar jar /opt/jdk1.7.0_71/bin/jar 1
[/code]

Verify the installation using command:

[code language="text"]
$ java -version

java version "1.7.0_71"
Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode)
[/code]

Configure Environment Variable
Below are steps to configure java variables

[code language="text"]
$ sudo vi /etc/profile.d/java.sh

#Add the below line shell script "java.sh"

#If Oracle JDK 7 install
export JAVA_HOME=/opt/jdk1.7.0_71
# use below JAVA_HOME if OpenJDK7devel install
#export JAVA_HOME=/usr/lib/jvm/java-openjdk
export JAVA_PATH=$JAVA_HOME
export PATH=$PATH:$JAVA_HOME/bin
[/code]

Again log in the machine or use below to see the changes
[code language="text"]
$ source .bashrc
$ echo $JAVA_HOME
[/code]

References:
https://en.wikipedia.org/wiki/Java_(software_platform)
http://docs.oracle.com/javase/7/docs/webnotes/install/
http://openjdk.java.net/projects/jdk7/