Thursday, August 31, 2017

Installing Apache Hive


Apache Hive

Apache Hive  is data warehouse software build on top of Hadoop for analyzing distributed HDFS data using HQL (SQL like commands).

In this tutorial, we will discuss steps for  Hive installation with local embedded datastore.

Prerequisites

  • If need to configure hive in cluster, then you must have same version of Hadoop installed on local machine as hadoop version installed on cluster machine.
  • If configure hive in pseudo mode, then hadoop must be configured properly. If not, use the article to configure it.

Steps for Hive Installation 

In this tutorial, we will discuss Hive installation where meta data reside on local machine using default Derby. This is easy way to start but its limitation is that only one embedded Derby database can access the data file. Therefore, only one hive session open at time can access database or second session will produce error.

Download and extract the binary tarball 

Download the binary file from Apache mirror or use wget as shown below.
wget http://mirrors.sonic.net/apache/hive/hive-2.2.0/apache-hive-2.2.0-bin.tar.gz


Extract the tarball

tar -xvf apache-hive-2.2.0-bin.tar.gz

Create symbolic link

ln -s apache-hive-2.2.0-bin hive



Configuration Change


edit ~/.bashrc and add below line.

export HIVE_HOME=<path where hive tar file extracted >
export PATH=$PATH:$HIVE_HOME/bin



Create Hive Derby Schema


As previously mentioned, we are using embedded database derby but in production we install mysql db as metastore and provide the config for mysql in hive-site.xml.


schematool -initSchema -dbType derby



Verify if database schema create.

We created the schema on folder ~/hivedata. Please look for derby database in same folder.



Hadoop Changes 

Hive will create store data on HDFS folder /user/hive/warehouse. Therefore, we need to create the folder on HDFS.

hduser@pooja:~/hivedata$ hadoop fs -mkdir /user/hive/warehouse/

17/08/24 21:38:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


HQL statements
Finally, we will be creating and inserting data using HQL. We are perform data aggregations and filtering and generate insights using simple HQL (with is similar to SQL).

hduser@pooja:~/hivedata$hive

create table demo ( id int, firstname string) row format delimited fields terminated by ','

create table product ( id int, name string, price float);




insert into product values ( 1, "product 1", 10.99);



the table files can be viewed on the HDFS as below



Hope you are able to install Hive without any troubles. If any problems, please write to me.

Happy Coding !!!












No comments:

Post a Comment