Hive is a prominent open source data warehouse built on Hadoop’s Distributed File System (HDFS). It is used for – data storage, data summarization, data query and analysis on large data systems. Hive is easy to use for novice, as you need to simply write SQL like query. Hive converts SQL queries into Mapreduce / Tez / Spark job depending on admin configured settings.
System Architecture Snapshot:
Organizations big or small use varied range of BI tools which are available in the market such as PowerBI, Tableau, Qlikview etc. Hive can be integrated with these BI tools, caveat being this integration process is not straight forward.
In the subsequent section of the blog post, we are mentioning the steps to be followed while connecting Microsoft Power BI with Hive installed on Hortonworks HDP 3.1.
- Downloading and installing ODBC drivers
- Configuring the ODBC driver
- Configuring PowerBI
- Finally, Connecting Hive with Power BI
Steps to Connect Power BI with Hive:
1. Downloading & installing ODBC drivers
There are two versions of the drivers available for Windows:
HortonworksHiveODBC32.msi for 32-bit processors or HortonworksHiveODBC64.msi for 64-bit. This totally depends on PowerBI desktop system configuration, so choose the driver wisely. For instance, if the application running is 64-bit then you should install the 64-bit driver else go for 32-bit.
After downloading, install the ODBC drivers.
Please note: We are referring to these steps from a 64-bit processor perspective, however the same procedure holds true for 32-bit processor.
2. Configuring the ODBC driver
First thing first, create a Data Source Name (DSN)
1. Click on the Start button
2. Click on All Programs
3. Click on the Hortonworks Hive ODBC Driver 1.2 (64-bit) onprogram group.
4. Click on the 64-bit ODBC Administrator. The ODBC Data Source Administrator window opens.
A window as shown below opens:
In the User DSN tab click on the Add button.
A window opens as shown below.
Select Hortonworks Hive ODBC Driver as highlighted in blue and the click on the finish button.
For configuration purpose, follow the procedure mentioned below:
- Give data source a name / title
- Description (optional though suggested)
- Select ‘hive server 2’
- In service discovery mode select ‘No service Discovery’
- In the host section provide an IP address of the host machine on which hive server is running
- For hive – port is 10000
- In the database section specify the database you want to connect. (optional)
- Choose hive username
- Choose SASL in Thrift Transport
Now click on the test buttonto check the connection is valid or not. If the connection is successful, then it will show the following window:
After successful test click OK to save the configuration. It will show the DSN in the User DSN (User data sources).
3. Configuring PowerBI
Now open Power BI tool and select ODBC from Get data. Select the DSN we defined from the DSN drop down and click on OK.
It will navigate you to a different panel where you have to select the Hive.
4. Connecting Hive with Power BI
Now all the hive database & schema will be visible.
Now you can connect & load the data for analysis & create powerful visualizations. Make sure Hive users have an appropriate permission else you may get permission error.
Thank for reading. Hope you have enjoyed our steps; and of course, this will help you to connect Power BI or any other ODBC client with the Hive easily.