Friday, December 9, 2016

Spark Development using SBT in IntelliJ

Apache Spark

Apache Spark is open source big data computational system. It is developed using Scala programming language which run on JVM ( Java Virtual Machine) platform. Today, popularity of Spark is increasing due to it's in-memory data storage  and real time processing capabilities. This computational system provides high level API in Java, Scala and Python. Therefore, we can run data analytical queries using these high level API on Spark system and get desire insights. Spark can deployed to standalone cluster, Hadoop 2 (YARN) or Mesos.

SBT Overview

SBT is Simple Build Tool. A build tool help in automation of tasks like build,compile, test, package, run, deploy. Other build tools are like Maven, Ant, Gradle, Ivy. SBT is also one othe build tool that focus mainly on Scala projects.

Today, I am going to explore to write a basic query using Spark high level API in Scala 2.10. Also, I will be using IntelliJ as IDE for development.

Now, all set. Let get our hands dirty with some actual coding.

Prerequisite (Make sure your machine has below components already installed):

  1. Install  Java JDK 7+.

  2. Install SBT.

  3. Unzip IntelliJ.

Working on Code

a. Creating project structure

There are different ways project structure can be created. We can even use the existing project templates to create it automatically.Today, we are going to create the project structure  manually. In below code, we have create the root directory/project name ( scalaProjectDemo) and folder src/main/scala inside it as shown below:

[code language="java"]

mkdir scalaProjectDemo

cd scalaProjectDemo
mkdir project
mkdir -p src/main/scala
mkdir -p src/main/resources
touch project/
touch project/plugins.sbt
touch project/assembly.sbt

b. Creating a build file

We will be creating the build file "build.sbt" in the root directory as shown below:

[code language="java"]

import AssemblyKeys._


name := "scalaProjectDemo"

version := "1.0"

scalaVersion := "2.10.4"

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.1.0" % "provided"

libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.1.0"

libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.0.0-mr1-cdh4.2.0" % "provided"

resolvers ++= Seq(
"Cloudera Repository" at "",
"Akka Repository" at "",
"Spray Repository" at "")


The project can be imported here into IntelliJ. Please refer the item " Importing the code into IntelliJ "

c. Creating a Scala file for testing

Next, we create a sample Scala file which just a single print statement as shown below:

[code language="text"]


object HelloWorld {
def main(args: Array[String]){
println("Hello World")


d. Run the code

Next, we will run the code and make sure the code compile successfully as shown below:

[code language="java"]
$ cd scalaProjectDemo
$ sbt run
Getting org.scala-sbt sbt 0.13.6 ...

downloading ...
[SUCCESSFUL ] org.scala-sbt#sbt;0.13.6!sbt.jar (1481ms)
downloading ...
[SUCCESSFUL ] org.scala-sbt#main;0.13.6!main.jar (3868ms)
downloading ...
[SUCCESSFUL ] org.scala-sbt#compiler-interface;0.13.6!compiler-interface-bin.jar (1653ms)
[SUCCESSFUL ] org.scala-sbt#test-agent;0.13.6!test-agent.jar (1595ms)
downloading ...
[SUCCESSFUL ] org.scala-sbt#apply-macro;0.13.6!apply-macro.jar (1619ms)
:: retrieving :: org.scala-sbt#boot-app
confs: [default]
44 artifacts copied, 0 already retrieved (13750kB/320ms)
[info] Loading project definition from /home/xxx/dev/scalaProjectDemo/project
[info] Updating {file:/home/xxx/dev/scalaProjectDemo/project/}scalaprojectdemo-build...
[info] Resolving org.scala-sbt.ivy#ivy;2.3.0-sbt-14d4d23e25f354cd296c73bfff40554[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] downloading ...
[info] [SUCCESSFUL ] com.eed3si9n#sbt-assembly;0.11.2!sbt-assembly.jar (2136ms)
[info] downloading ...
[info] [SUCCESSFUL ] org.slf4j#slf4j-log4j12;1.7.5!slf4j-log4j12.jar (78ms)
1.14.v20131031!jetty-webapp.jar (103ms)
[info] downloading ...
[info] [SUCCESSFUL ];2.4.0a!protobuf-java.jar (387ms)
[info] downloading ...
[info] [SUCCESSFUL ] asm#asm;3.2!asm.jar (195ms)
[info] Done updating.
[info] Compiling 1 Scala source to /home/pooja/dev/scalaProjectDemo/target/scala-2.10/classes...
[info] Running com.jbksoft.HelloWorld
Hello World
[success] Total time: 97 s, completed Dec 8, 2016 2:58:29 PM

e. Importing the code into IntelliJ

Edit the file plugins.sbt


addSbtPlugin("com.github.mpeltonen" % "sbt-idea" % "1.5.2")


Edit the assembly.sbt


addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.11.2")


Edit the


Open the IntelliJ


Then from Menu choose File> Open


The open dialog opens


choose the project path and click OK


Leave default and click OK


The project will take time as it downloads the dependencies. Click OK

Screenshot from 2016-12-09 14-02-27.png

Select the new window as it open the project in new window.


The project will be imported in IntelliJ


Expand the src > main > scala

Screenshot from 2016-12-09 14-03-27.png

You can now add more files, run and debug the code in IntelliJ


The code will get compiled by scala compiler and with then be executed.


The output is run the window.

Let me know i you face any issues.

Happy Coding

