Big Data

Friday, December 16, 2016

Remote Run Spark Job on Hadoop Yarn

Apache Spark

Apache Spark is one of the powerful analytical engine to process huge volume of data using distributed in-memory data storage.

Apache Hadoop Yarn

Hadoop is well-known as distributed computing system that consists of Distributed file system (HDFS), YARN (Resource management framework), Analytical computing job (such as Map Reduce, Hive,Pig, Spark etc).

Apache Spark analytical job can be run on Standalone Spark Cluster or YARN cluster or Mesos cluster.

In this tutorial, I will go through details steps and problem facing while setting up Spark job to run on remote YARN cluster. Since, I have just one computer, I have create 2 users (sparkuser & hduser). Now, Hadoop is installed as 'hduser' and Spark installed as 'sparksuser'.

Step 1: Install Hadoop 2.7.0 cluster with hduser

Please refer to tutorial for set up of Hadoop Standalone setup with hduser.

Step 2: Install Spark with sparkuser

[code language="java"]

#Login to sparkuser

[root@localhost ~]$ su - sparkuser

#Download the spark tar ball using below command or using URL <a href="http://spark.apache.org/downloads.html">http://spark.apache.org/downloads.html</a>

[sparkuser@localhost ~]$ wget http://d3kbcqa49mib13.cloudfront.net/spark-2.0.2-bin-hadoop2.7.tgz

#Untar above downloaded tar ball

[sparkuser@localhost ~]$ tar -xvf spark-2.0.2-bin-hadoop2.7

[/code]

Step 3. Copy hadoop configuration files

Move two hadoop configuration file core-site.xml, yarn-site.xml to spark set up machine as shown below.

[code language="java"]

# As both user 'hduser' and 'sparkuser' on same machine, we can copy using /tmp/ folder, if the machine is remote then we can even ftp the properties files.

[hduser@localhost hadoop]$ cp etc/hadoop/core-site.xml /tmp/

[hduser@localhost hadoop]$ cp etc/hadoop/yarn-site.xml /tmp/

# Copy the hadoop configuration to Spark machine

[sparkuser@localhost ~]$ mkdir hadoopConf

[sparkuser@localhost ~]$ cd hadoopConf

[sparkuser@localhost hadoopConf]$ cp /tmp/core-site.xml .

[sparkuser@localhost hadoopConf]$ cp /tmp/yarn-site.xml .

[/code]

Step 4: Set up HADOOP_CONF_DIR

In spark-env.sh, set the local path where hadoop configuration files are stored as shown below.

[code language="java"]

# In Spark set up machine, change the <Spark_home>/conf/spark-env.sh

[sparkuser@localhost spark-2.0.2-bin-hadoop2.7]$ nano conf/spark-env.sh

#Earlier stored the hadoop configuration file in hadoopConf

export HADOOP_CONF_DIR=/home/sparkuser/hadoopConf/

[/code]

Problem Faced: Earlier, I tried to avoid copying file to 'sparkuser' and provide the HADOOP_CONF_DIR as '/home/hduser/hadoop/etc/hadoop'.

But, when i submit the spark job I was facing below error. Its when I realized that 'sparkuser' is not able to access file in 'hduser'.

[code language="java"]

[sparkuser@localhost spark-2.0.2-bin-hadoop2.7]$ bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster examples/jars/spark-examples_2.11-2.0.2.jar 10

Failure Output:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
.....
16/12/16 16:19:38 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
16/12/16 16:19:41 INFO Client: Source and destination file systems are the same. Not copying file:/tmp/spark-6700e780-d8fa-443c-aead-7763ed18ca7d/__spark_libs__7158677467450857723.zip

....16/12/16 16:19:41 INFO SecurityManager: Changing view acls to: sparkuser
16/12/16 16:19:41 INFO Client: Submitting application application_1481925228457_0007 to ResourceManager
....16/12/16 16:19:45 INFO Client: Application report for application_1481925228457_0007 (state: FAILED)
16/12/16 16:19:45 INFO Client:
client token: N/A
diagnostics: Application application_1481925228457_0007 failed 2 times due to AM Container for appattempt_1481925228457_0007_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://localhost:8088/cluster/app/application_1481925228457_0007Then, click on links to logs of each attempt.
Diagnostics: File file:/tmp/spark-6700e780-d8fa-443c-aead-7763ed18ca7d/__spark_libs__7158677467450857723.zip does not exist
java.io.FileNotFoundException: File file:/tmp/spark-6700e780-d8fa-443c-aead-7763ed18ca7d/__spark_libs__7158677467450857723.zip does not exist
[/code]

Step 5: Change the Hadoop DFS access permission.

Now, when spark job is executed on Yarn cluster, it will place create directory on HDFS file system. Therefore, 'sparkuser' should have access right on it.

[code language="java"]

#Create /user/sparkuser directory on HDFS and also change permissions

[hduser@localhost ~]$ hadoop fs -mkdir /user/sparkuser

[hduser@localhost ~]$ hadoop fs -chmod 777 /user/sparkuser

# or you can disable permissions on HDFS, change hdfs.site.xml and add below

<property>
<name>dfs.permissions</name>
<value>false</value>
</property>

[/code]

Problem faced: When submit the spark job, I was getting permission issues as shown below

[code language="java"]

[sparkuser@localhost spark-2.0.2-bin-hadoop2.7]$ bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --driver-memory 1g --executor-memory 1g --num-executors 1 examples/jars/spark-examples_2.11-2.0.2.jar 10
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
....
....
16/12/16 17:11:30 INFO Client: Setting up container launch context for our AM
16/12/16 17:11:30 INFO Client: Setting up the launch environment for our AM container
16/12/16 17:11:30 INFO Client: Preparing resources for our AM container
Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=sparkuser, access=WRITE, inode="/user/sparkuser/.sparkStaging/application_1481925228457_0008":hduser:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
......[/code]

Step 6: Run the spark jobs

Now, run the spark job as shown below.

[code language="java"]

#Sumit the job

[sparkuser@localhost spark-2.0.2-bin-hadoop2.7]$ bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster examples/jars/spark-examples_2.11-2.0.2.jar 10

Output:

log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
...16/12/16 23:31:51 INFO Client: Submitting application application_1481959535348_0001 to ResourceManager
16/12/16 23:31:52 INFO YarnClientImpl: Submitted application application_1481959535348_0001
16/12/16 23:31:53 INFO Client: Application report for application_1481959535348_0001 (state: ACCEPTED)
...16/12/16 23:33:09 INFO Client: Application report for application_1481959535348_0001 (state: ACCEPTED)
16/12/16 23:33:10 INFO Client: Application report for application_1481959535348_0001 (state: RUNNING)
16/12/16 23:33:10 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 192.168.1.142
ApplicationMaster RPC port: 0
queue: default
start time: 1481959911811
final status: UNDEFINED
tracking URL: http://localhost:8088/proxy/application_1481959535348_0001/
user: pooja
16/12/16 23:33:21 INFO Client: Application report for application_1481959535348_0001 (state: RUNNING)
16/12/16 23:33:22 INFO Client: Application report for application_1481959535348_0001 (state: FINISHED)
16/12/16 23:33:22 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 192.168.1.142
ApplicationMaster RPC port: 0
queue: default
start time: 1481959911811
final status: SUCCEEDED
tracking URL: http://localhost:8088/proxy/application_1481959535348_0001/
user: pooja
16/12/16 23:33:23 INFO Client: Deleting staging directory hdfs://localhost:9000/user/pooja/.sparkStaging/application_1481959535348_0001
16/12/16 23:33:24 INFO ShutdownHookManager: Shutdown hook called
16/12/16 23:33:24 INFO ShutdownHookManager: Deleting directory /tmp/spark-d61b8ae1-3dec-4380-8a85-0c615c1e4be1

[/code]

Step 7: Verify job running on YARN

Make sure application shown on previous output is shown on console as well .

screenshot-from-2016-12-17-00-25-48

Hope you have successfully submit Spark jobs on YARN. Please put your comments if you are facing any issues.

Happy Coding !!!

Wednesday, December 14, 2016

Install Spark on Standalone Mode

Apache Spark

Apache Spark is cluster computing framework written in Scala language. It is gaining popularity as it provides real-time solutions to big data ecosystem.

Installation

Apache spark can be installed on stand alone mode by simply placing the compile version of spark on each node or build it yourself using the source code.

In this tutorial, I will provide details of installation using compile version of spark.

a. Install Java 7+ on machine (if not already installed)

b. Download the Spark tar ball

Download the Spark tar ball using http://spark.apache.org/downloads.html as shown below.

Screenshot from 2016-12-15 15-26-03.png

We need to select the below parameter for download.

Choose a Spark release. You can choose the latest version

Choose the package type. You can select with Hadoop version or with user provided hadoop.Note: Spark uses core Hadoop Library to communicate to HDFS and other Hadoop-supported storage system.Because the protocol changed for different version o HDFS therefore select that build against the same version as version hadoop cluster runs. I have selected the "Pre-build with Hadoop 2.7 and later".

Choose the download type. Select "Direct download".

Download Spark. Click on the link for download tar ball on local machine.

c. Unzip the downloaded tar file

$tar -xvf spark-2.0.2-bin-hadoop2.7.tgz

Below is the folder structure after you extract the tar file as shown below.

screenshot-from-2016-12-15-15-28-20

The description of the important folders:

Folder	Usage
sbin	Contain start, stop master and slave scripts
bin	Contain Scala and Python Spark shell
conf	Contain configuration files
data	Contain graph, machine leraning and streaming job data
jars	Contains jar included in Spark Classpath
examples	Contain example for Spark job
logs	Contain all log file

d. Start the spark stand alone cluster using below command
cd <Spark Root directory>
sbin/start-master.sh

e. Check if master node is working properly.

In the console, type in the URL http://localhost:8080, it should show up the screen as shown below.

screenshot-from-2016-12-15-15-31-13

f. Start worker node

Now, we will run script sbin/start-slave.sh as shown below.

cd <spark-root-directory>

sbin/start-slave.sh spark://localhost:7077

g. Verify if the worker node is running.

Make sure the http://localhost:8080, UI console, you can see a new Worker Id (worker-20161215153905-192.168.1.142-57869) as shown below.

Screenshot from 2016-12-15 15-44-41.png

h.Running a Spark example

We can run the Spark example job

$./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://localhost:7077 examples/jars/spark-examples_2.11-2.0.2.jar 1000

Verify if the console shows below output:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/12/15 21:36:41 INFO SparkContext: Running Spark version 2.0.2
16/12/15 21:36:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/12/15 21:36:42 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 192.168.1.142 instead (on interface wlp18s0b1)

...........

...........

16/12/15 21:36:51 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 4.098445 s
Pi is roughly 3.143019143019143
16/12/15 21:36:51 INFO SparkUI: Stopped Spark web UI at http://192.168.1.142:4040
16/12/15 21:36:51 INFO StandaloneSchedulerBackend: Shutting down all executors
16/12/15 21:36:51 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
16/12/15 21:36:51 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/12/15 21:36:51 INFO MemoryStore: MemoryStore cleared
16/12/15 21:36:51 INFO BlockManager: BlockManager stopped
16/12/15 21:36:51 INFO BlockManagerMaster: BlockManagerMaster stopped
16/12/15 21:36:51 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!

Running multiple instance of Spark Worker on Standalone Mode

In the conf/spark-env.sh, set SPARK_WORKER_INSTANCES to number of worker you want to start and start with start-slave.sh script as shown below

#Add the below line to <Spark_home/conf/spark-env.sh>
export SPARK_WORKER_INSTANCES=2
#Then start the worker threads
sbin/start-slave.sh spark://localhost:7077 --cores 2 --memory 2g

By now, I hope you are able to configure the Spark Stand Alone cluster successfully. If facing any issues, please reply on comments.

Keep Reading and Learning.

Happy Coding!!!

Tuesday, December 13, 2016

Installing Scala on CentOS

Scala

Scala is a programming language, which support object oriented and functional programming paradigm. This language is gaining popularity as it is becoming good choice for scalable distributed systems.

There are many ways the language can be configured on CentOS machine.

1. Install SBT

Refer to the article Spark Development using SBT in IntelliJ

2. Download Scala

Download the tar file and set the environment variables as shown below

[code language="java"]

#Download the tar file
wget http://downloads.lightbend.com/scala/2.12.1/scala-2.12.1.tgz
....
HTTP request sent, awaiting response... 200 OK
Length: 19700349 (19M) [application/octet-stream]
Saving to: ‘scala-2.12.1.tgz’

100%[======================================>] 19,700,349 214KB/s in 57s
2016-12-13 21:42:31 (336 KB/s) - ‘scala-2.12.1.tgz’ saved [19700349/19700349]

#Extract the file
tar xvf scala-2.12.1.tgz
.....
scala-2.12.1/doc/tools/css/style.css
scala-2.12.1/doc/tools/scalap.html
scala-2.12.1/doc/tools/scaladoc.html
scala-2.12.1/doc/License.rtf
scala-2.12.1/doc/licenses/
scala-2.12.1/doc/licenses/bsd_asm.txt
scala-2.12.1/doc/licenses/mit_jquery.txt
scala-2.12.1/doc/licenses/bsd_jline.txt
scala-2.12.1/doc/licenses/mit_sizzle.txt
scala-2.12.1/doc/licenses/mit_tools.tooltip.txt
scala-2.12.1/doc/licenses/apache_jansi.txt
scala-2.12.1/doc/LICENSE.md
scala-2.12.1/doc/README

#Move folder to /usr/local
sudo mv scala-2.12.1 /usr/local/scala

#Edit .bashrc file add the below lines
export SCALA_HOME=/usr/local/scala
export PATH=$PATH:$SCALA_HOME/bin

#Source .bashrc file
source ~/.bashrc

#Verify if installed properly
scala -version
Scala code runner version 2.12.1 -- Copyright 2002-2016, LAMP/EPFL and Lightbend, Inc.
[/code]

3. Install one of the Scala IDE plugin

Download one Scala IDE plugin, I have used Eclipse or IntelliJ. Both works great.

Hope this tutorial help you in installation. Please let me know if you face ran into any problems.

Keep Reading and Learning...

Happy Coding!!!

Friday, December 9, 2016

Install Maven on CentOS

Maven is open source (written in java) powerful build tool for java development projects. We can automate task such as compile,clean, build, deploy and also dependency management.

In order to install Maven on CentOS, follow the below steps:

1.Download the Maven tar ball

Download the tar in the folder you want to extract to using below command

[code]

wget http://mirror.reverse.net/pub/apache/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz

[/code]

2. Extract the tar ball

[code]

tar xvf apache-maven-3.3.9-bin.tar.gz

[/code]

3. Move Maven to /usr/local directory. This is a optional step.

[code]
sudo mv apache-maven-3.3.9 /usr/local/apache-maven

[/code]

4. Edit ~/.bashrc to set env. properties

[code]

# Add below lines to .bashrc

export M2_HOME=/usr/local/apache-maven
export PATH=$M2_HOME/bin:$PATH

[/code]

4. Execute below command to set the env. properties

[code]

source ~/.bashrc

[/code]

5. Verify Maven is installed

[code]

$ mvn -version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T08:41:47-08:00)
Maven home: /usr/local/apache-maven
Java version: 1.8.0_72, vendor: Oracle Corporation
Java home: /usr/java/jdk1.8.0_72/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.10.0-327.el7.x86_64", arch: "amd64", family: "unix"

[/code]

Please let me know if you face any problem.

Happy Coding!!!!

Functional vs Object Oriented Programming

There are many programming model exists today such as Procedural Programming, Structural Programming,Event Driven Programming, Object Oriented Programming, Functional Programming and many more. In this tutorial, we will be focus on just two programming models.

Overview of Object Oriented Programming

In this model, there is a concept of Object which consists of set of data and operations on it.

The object is basically the blue print of any real thing such as aeroplane, car,chair, table, any living person and so on. Or object can be represents any conceptual term such as Company, Bank, Account,Customer and so on.

Each object will have attributes called data associated to it. And we can manipulate the data in the object using the method/operations defined in the it. This concept is called Data encapsulation.

Also, the object can inherit attributes and operations from other object. This terminology is called inheritance. For eg. Horse is a 'kind of' Animal. Therefore, it can inherit the data and operations from Animal.

Many programming language support object oriented model such as Java, C++, PHP,Python, Scala and many more.

Lets take a example of the Scala code wherein we will demonstrate a class called Account and inherited class Saving Account.

[code language="scala"]
package com.xxx

class Account(val AccountId:String, var AccountBalance:Double,val CustomerName: String,val AccountType:String ="ACC") {

def accId=AccountId
def accBal=AccountBalance
def custName=CustomerName
def acctType=AccountType
def setacctType(bal:Double):Unit= {AccountBalance=bal}
override def toString() =
"" + accId + (if (accBal < 0) "" else "+") + accBal + custName+ "i"
}
[/code]

[code language="scala"]

package com.xxx

class SavingsAccount(AccountId:String, AccountBalance:Double, CustomerName: String,var MinimumBalance:Double,AccountType:String ="SAV") extends Account(AccountId,AccountBalance,CustomerName,AccountType){

def minBal=MinimumBalance

override def toString() =
"" + accId + (if (accBal < 0) "" else "+") + accBal + custName+minBal+ "i"

}

[/code]

Overview of Functional Programming

In this model, the computation are evaluated as mathematical operations. Thus, output of the mathematical function f(x) depends only on the input data 'x' and not any external variables. For same input data 'x' there should be same output irrespective of the number of times the function is executes.

The mathematical function can be defined as high order function, lambda function, pure function, recursive function and many more.

Mostly, the programming language is hybrid of one or more models. The programming languages that support functional programming are Lisp, Scala, Clojure ,Standard ML.

Lets take the example of Scala code to define mathematical function

High order function: In this function can be passed as a parameter to a function as shown below

[code language="scala"]

def cube(a:Int):Int={
return a*a*a
}

def sum(f:Int=>Int,a:Int):Int={
if(a<=0) return 0
f(a)+sum(f,a-1)
}

//Now,execute the method as sum of cube

sum(cube,3)

[/code]

Recursive function

[code language="scala"]

//Recursive function
def fibonacci(a:Int):Int={
if(a <= 0) return 0
if ( a == 1) return 1
else return fibonacci(a-1)+fibonacci(a-2)

}
//Now, execute function and get fabonacci series

fibonacci(6)

[/code]

Difference between two with an example

We now understand some computation problem which can be solved using OOPs concept and some using functional based programming.

Suppose we have a list and we need to compute the total deposit of the bank.

In Object Oriented approach, we will loop through the list of account and find the results as shown below(using Scala Language)

[code language="scala"]

//Call the above method using for loop as shown below
var accList=List(new Account("Acc1",100,"XXX1"),new SavingsAccount("Sav",500,"XXX1",250))

var sum:Double = 0
for(i<-0 to accList.length-1){
sum+=accList(i).accBal
}

print(sum)

[/code]

In Functional programming approach, we can call list function "foldLeft" to perform operations on list.

[code language="scala"]

print(accList.foldLeft(0.0){(z,f)=> z+f.accBal})

[/code]

Spark Development using SBT in IntelliJ

Apache Spark

Apache Spark is open source big data computational system. It is developed using Scala programming language which run on JVM ( Java Virtual Machine) platform. Today, popularity of Spark is increasing due to it's in-memory data storage and real time processing capabilities. This computational system provides high level API in Java, Scala and Python. Therefore, we can run data analytical queries using these high level API on Spark system and get desire insights. Spark can deployed to standalone cluster, Hadoop 2 (YARN) or Mesos.

SBT Overview

SBT is Simple Build Tool. A build tool help in automation of tasks like build,compile, test, package, run, deploy. Other build tools are like Maven, Ant, Gradle, Ivy. SBT is also one othe build tool that focus mainly on Scala projects.

Today, I am going to explore to write a basic query using Spark high level API in Scala 2.10. Also, I will be using IntelliJ as IDE for development.

Now, all set. Let get our hands dirty with some actual coding.

Prerequisite (Make sure your machine has below components already installed):

Install Java JDK 7+.

Install SBT.

Unzip IntelliJ.

Working on Code

a. Creating project structure

There are different ways project structure can be created. We can even use the existing project templates to create it automatically.Today, we are going to create the project structure manually. In below code, we have create the root directory/project name ( scalaProjectDemo) and folder src/main/scala inside it as shown below:

[code language="java"]

mkdir scalaProjectDemo

cd scalaProjectDemo
mkdir project
mkdir -p src/main/scala
mkdir -p src/main/resources
touch project/build.properties
touch project/plugins.sbt
touch project/assembly.sbt
[/code]

b. Creating a build file

We will be creating the build file "build.sbt" in the root directory as shown below:

[code language="java"]

import AssemblyKeys._

assemblySettings

name := "scalaProjectDemo"

version := "1.0"

scalaVersion := "2.10.4"

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.1.0" % "provided"

libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.1.0"

libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.0.0-mr1-cdh4.2.0" % "provided"

resolvers ++= Seq(
"Cloudera Repository" at "https://repository.cloudera.com/artifactory/cloudera-repos/",
"Akka Repository" at "http://repo.akka.io/releases/",
"Spray Repository" at "http://repo.spray.cc/")

[/code]

The project can be imported here into IntelliJ. Please refer the item " Importing the code into IntelliJ "

c. Creating a Scala file for testing

Next, we create a sample Scala file which just a single print statement as shown below:

[code language="text"]

package com.xxx

object HelloWorld {
def main(args: Array[String]){
println("Hello World")
}
}

[/code]

d. Run the code

Next, we will run the code and make sure the code compile successfully as shown below:

[code language="java"]
$ cd scalaProjectDemo
$ sbt run
Getting org.scala-sbt sbt 0.13.6 ...

downloading http://repo.typesafe.com/typesafe/ivy-releases/org.scala-sbt/sbt/0.13.6/jars/sbt.jar ...
[SUCCESSFUL ] org.scala-sbt#sbt;0.13.6!sbt.jar (1481ms)
downloading http://repo.typesafe.com/typesafe/ivy-releases/org.scala-sbt/main/0.13.6/jars/main.jar ...
[SUCCESSFUL ] org.scala-sbt#main;0.13.6!main.jar (3868ms)
downloading http://repo.typesafe.com/typesafe/ivy-releases/org.scala-sbt/compiler-interface/0.13.6/jars/compiler-interface-bin.jar ...
[SUCCESSFUL ] org.scala-sbt#compiler-interface;0.13.6!compiler-interface-bin.jar (1653ms)
......
[SUCCESSFUL ] org.scala-sbt#test-agent;0.13.6!test-agent.jar (1595ms)
downloading http://repo.typesafe.com/typesafe/ivy-releases/org.scala-sbt/apply-macro/0.13.6/jars/apply-macro.jar ...
[SUCCESSFUL ] org.scala-sbt#apply-macro;0.13.6!apply-macro.jar (1619ms)
:: retrieving :: org.scala-sbt#boot-app
confs: [default]
44 artifacts copied, 0 already retrieved (13750kB/320ms)
[info] Loading project definition from /home/xxx/dev/scalaProjectDemo/project
[info] Updating {file:/home/xxx/dev/scalaProjectDemo/project/}scalaprojectdemo-build...
[info] Resolving org.scala-sbt.ivy#ivy;2.3.0-sbt-14d4d23e25f354cd296c73bfff40554[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] downloading https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.eed3si9n/sbt-assembly/scala_2.10/sbt_0.13/0.11.2/jars/sbt-assembly.jar ...
[info] [SUCCESSFUL ] com.eed3si9n#sbt-assembly;0.11.2!sbt-assembly.jar (2136ms)
[info] downloading https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.10/1.1.0/spark-core_2.10-1.1.0.jar ...
[info] [SUCCESSFUL ] org.slf4j#slf4j-log4j12;1.7.5!slf4j-log4j12.jar (78ms)
1.14.v20131031!jetty-webapp.jar (103ms)
............
[info] downloading https://repo1.maven.org/maven2/com/google/protobuf/protobuf-java/2.4.0a/protobuf-java-2.4.0a.jar ...
[info] [SUCCESSFUL ] com.google.protobuf#protobuf-java;2.4.0a!protobuf-java.jar (387ms)
[info] downloading https://repo1.maven.org/maven2/asm/asm/3.2/asm-3.2.jar ...
[info] [SUCCESSFUL ] asm#asm;3.2!asm.jar (195ms)
[info] Done updating.
[info] Compiling 1 Scala source to /home/pooja/dev/scalaProjectDemo/target/scala-2.10/classes...
[info] Running com.jbksoft.HelloWorld
Hello World
[success] Total time: 97 s, completed Dec 8, 2016 2:58:29 PM
[/code]

e. Importing the code into IntelliJ

Edit the file plugins.sbt

[code]

addSbtPlugin("com.github.mpeltonen" % "sbt-idea" % "1.5.2")

[/code]

Edit the assembly.sbt

[code]

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.11.2")

[/code]

Edit the build.properties

[code]sbt.version=0.13.6[/code]

Open the IntelliJ

screenshot-from-2016-12-09-13-43-36

Then from Menu choose File> Open

screenshot-from-2016-12-09-14-01-01

The open dialog opens

screenshot-from-2016-12-09-14-01-34

choose the project path and click OK

screenshot-from-2016-12-09-14-01-46

Leave default and click OK

screenshot-from-2016-12-09-14-02-00

The project will take time as it downloads the dependencies. Click OK

Screenshot from 2016-12-09 14-02-27.png

Select the new window as it open the project in new window.

screenshot-from-2016-12-09-14-02-34

The project will be imported in IntelliJ

screenshot-from-2016-12-09-14-02-48

Expand the src > main > scala

Screenshot from 2016-12-09 14-03-27.png

You can now add more files, run and debug the code in IntelliJ

screenshot-from-2016-12-09-14-20-21

The code will get compiled by scala compiler and with then be executed.

screenshot-from-2016-12-09-14-21-08

The output is run the window.

Let me know i you face any issues.

Happy Coding

Tuesday, December 6, 2016

CentOS 6 Installation using pen drive

On my laptop Dell Inspiron 5010, operating system windows 7 got crashed and I was not able to restore it. Another problem was that it was running too slow so decided to format it and freshly install CentOS 6.

Below are the steps for installation of CentOS:

1. Created a bootable pen drive using Mac machine (used dd command)

Prerequisite:
Pen drive should have ample space (350 MB for minimal boot media and 4.5 GB for full installation media).I have done minimal installation. It will be formatted before iso image is copied to it.

a. Download your favourite iso image using the url https://www.centos.org/download/ (Latest CentOS) or http://isoredirect.centos.org/centos/6/isos/x86_64/ (Centos 6).

b. Figure out the device of USB stick.
First, list all the disk attached to Mac machine

[code language="text"]
diskutil list
[/code]

The one at the bottom is my USB drive (look at its memory or name to identify the USB stick)

c. Detached USB stick from current accessible filesystem
The USB stick data will be earsed before iso image, therefore if it's file is accessed by user then for dd command resource will keep on waiting. Therefore, it is required to umount pen drive using command below.

[code language="text"]
diskutil unmountDisk /dev/disk2
Unmount of all volumes on disk2 was successful
[/code]

d. Copy the iso image over USB Stick
Lets type in dd command to copy the iso images to USB stick. It may take time but don't terminate the session.

[code language="text"]
sudo dd if=CentOS-6.7-x86_64-minimal.iso of=/dev/disk2
[/code]

For i copied Centos 6 minimal iso image and it took me around 1 min.

2. Make the laptop boot from pen drive.
Go into laptop set up configuration (Press F2) and then select boot tab and then change the 1st boot priority to USB Storage Device. Exit and save configuration.

3. Installing from Pen drive to hard disk

Plug-in the bootable usb in the laptop and start on the machine. The screen will show you the installation step for CentOS.
a. Welcome to CentOS screen choose "Install or upgrade an existing system"
Library installation with start and it will take few minutes.
b. Then choose the language (chosen english). Then type of keyboard (chosen us). Then choose installation method as "Hard Drive"
c. Select Partition screen will show partition on the disk drive holding the CentOS iso image.
d. What type of devices will your installation involve screen show up. I have choose "Basic Storage Device" and press Next button.
e. Now, "Please name this computer. The hostname identifies the computer on network." screen shows up. Didn't touch localhost name. On the screen there is "Configure Network" button also. Just configure network and press next.
f. Next, screen shows Select your nearest city in the timezone (Select your city). And press Next.
g. Next screen show up the root password. Specify it and press next.
h. Next screen show "What type of installation would you like". "Use All Space" chosen (Not selected any checkbox "Encrypt System" or "Review and modify partitioning layout" and press next.
i. Then warning popup message shows up "Writing storage configuration to disk". Press "Write changes to disk" button.Result in formatting the hard drives.
j. The next screen will show up the "Boot loader operating system list" (you can add/delete device). Just press next.
k. The next screen show up with message "The default installation of CentOS is a minimum install. You can optionally to select different set of Software now".Chosen "Minimal" only and press next

Now the "CentOS installation starting" popup message will show up.

Lastly, "Congratulations, your CentOS installation is complete" screen shows up.

Conclusion

In this tutorial, I had written the step followed to install the centOS 7 minimal operating system on Dell machine.