Friday, September 1, 2017

Install Hive with MySQL MetaStore


Apache Hive Metastore

Apache Hive Metastore is central for storing Hive metadata. It consists of relational database for store the data (such as Hive tables schema,partition, bucket) and Metastore Service API for accessing information stored in relational database.

Metastore Deployment modes

Embedded Metastore


This is a default Metastore deployment  modes. In this mode, the Metastore service run in the same JVM as Hive service  and contains an embedded Derby database instance backed by local disk. This mode required least configuration but support only 1 session at a time. Therefore not suited for production. If you need steps to configure this mode, then refer to article.

Local Metastore


In this mode, Metastore service run in the same JVM as Hive service, but Metastore database run on separate process.

Remote Metastore


In this mode, Metastore service run on its own JVM. This brings better manageability and security because the database tier can be completely fire walled off, and the clients no longer need the database credentials. In this, Metastore service communicate with database over JDBC. Hadoop ecosystem software can communicate with Hive using Thrift service.

Assumption

Hive will be installed on machine where Hadoop is configured either in pseudo mode or active node of hadoop cluster.
If not, you need to configure pseudo mode  mode, refer to article
If not, you need to configure hadoop cluster  refer to article

Installing Steps

In this tutorial, we will deploy Hive Metastore in local mode using MySQL as relational database.

Download and extract the binary tarball

Download the binary file from Apache mirror or use wget as shown below.

wget http://mirrors.sonic.net/apache/hive/hive-2.2.0/apache-hive-2.2.0-bin.tar.gz

Extract the tarball

tar -xvf apache-hive-2.2.0-bin.tar.gz

Create symbolic link

ln -s apache-hive-2.2.0-bin hive

Edit ~/.bashrc and add below line.

export HIVE_HOME=<path where hive tar file extracted >
export PATH=$PATH:$HIVE_HOME/bin

Install MySQL and Java Connector

Install MySQL server on Ubuntu server
sudo apt-get install mysql-server

Install MySQL Java Connector

pooja@pooja:~$ sudo apt-get install libmysql-java
[sudo] password for pooja:Reading package lists... Done
Building dependency treeReading state information... Done
The following NEW packages will be installed:libmysql-java0 upgraded, 
1 newly installed, 0 to remove and 358 not upgradedNeed to get 894 kB of archives.
After this operation, 1,060 kB of additional disk space will be used.
Get:1 http://us.archive.ubuntu.com/ubuntu/ trusty/universe libmysql-java all 5.1.28-1 [894 kB]Fetched 894 kB in 0s (1,092 kB/s)
Selecting previously unselected package libmysql-java.
(Reading database ... 187550 files and directories currently installed.)
Preparing to unpack .../libmysql-java_5.1.28-1_all.deb ...
Unpacking libmysql-java (5.1.28-1) ...
Setting up libmysql-java (5.1.28-1) ..


Create soft link
sudo ln -s /usr/share/java/mysql-connector-java.jar $HIVE_HOME/lib/mysql-connector-java.jar
pooja@pooja:~$ sudo ln -s /usr/share/java/mysql-connector-java.jar /home/hduser/hive/lib/mysql-connector-java.jar
[sudo] password for pooja: 
pooja@pooja:~$
Verify the soft link created
pooja@pooja:~$ ls -l /home/hduser/hive/lib/mysql-connector-java.jar
Note: You can also copy the file mysql-connector-java.jar in $HIVE_HOME/lib/ folder.
Create initial database and user

Create initial database using hive-schema file as shown below:
$ mysql -u root -p
Enter password:
mysql> CREATE DATABASE metastore;
mysql> USE metastore;
mysql> SOURCE $HIVE_HOME/scripts/metastore/upgrade/mysql/hive-schema-2.1.1000.mysql.sql;
Create a MySQL user account for Hive to access the Metastore database
mysql> CREATE USER 'hiveuser'@'%' IDENTIFIED BY 'password';
mysql> GRANT all on *.* to 'hiveuser'@localhost identified by 'password';
mysql> flush privileges;
Configure Metadata service to communicate with MySQL
This steps will show the configuration property yo need to set in hive-site.xml ($HIVE_HOME/conf/hive-site.xml). If the file not present then create a file.
You need to specify property Connection URL,ConnectionDriverName,ConnectionUserName,ConnectionPassword as shown below.
<configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value> <description>metadata is stored in a MySQL server</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>MySQL JDBC driver class</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hiveuser</value> <description>user name for connecting to mysql server</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>password</value> <description>password for connecting to mysql server</description> </property> </configuration>
Note: You can also specify property hive.metastore.uris if need to configure hive in remote metastore.
Verify the setup

Start Hive console and create table product
hduser@pooja:~$hive hive> create table product(id int, price float, name string);
On MySQL database, you will see the Hive table 'product' schema
I hope you are able to set up Hive successfully. If still face issue, I will be glad to help you.
Happy Coding !!!

No comments:

Post a Comment