Configure HDP Hive, Tez, and Sqoop

This page describes how to configure Hive, Tez, and Sqoop services in HDP clusters to enable Pulse to collect query statistics, performance metrics, and lineage information.

Configure HDP Hive for Pulse

This section describes how to configure Hive so that Pulse can collect query data and metrics.

Configure HiveServer2 JMX

In the Ambari UI:

  1. Navigate to Hive > Configs > Advanced hive-env > hive-env template.
  2. In hive-envtemplate``, choose and add one of the following JMX configurations based on your security requirements.

Enable JMX without Security on JMX Remote Port

To enable JMX port without any security, add the following parameters at the end of the file:

Bash
Copy

Avoid JMX Changes for Hive 1.x Using the MR Engine

Do not enable JMX for Hive 1.x when using the MapReduce (MR) execution engine. A known bug can cause query failures if JMX is enabled.

Enable Basic Authentication on JMX Remote Port (Optional)

To enable basic authentication on the JMX remote port, add the following parameters:

Bash
Copy

Enable TLS/SSL on JMX Remote Port (Optional)

To enable TLS/SSL authentication on the JMX remote port, add the following parameters:

Bash
Copy

Configure Hive Metastore JMX

In the Ambari UI:

  1. Navigate to Hive > Configs > Advanced hive-env > hive-env template.
  2. In hive-envtemplate``, choose and add one of the following JMX configurations based on your security requirements.

Enable JMX without Security on JMX Remote Port

To enable JMX port without any security, add the following parameters at the end of the file:

Bash
Copy

Enable Basic Authentication on JMX Remote Port (Optional)

To enable basic authentication on the JMX remote port, add the following parameters:

Bash
Copy

Enable TLS/SSL on JMX Remote Port (Optional)

To enable TLS/SSL authentication on the JMX remote port, add the following parameters:

Bash
Copy

Place Hive Hook JARs

Pulse uses Hive hooks to capture query statistics.

Hook JAR Mapping

Distro VersionHive VersionTez VersionPulse Hook Jar Name
HDP 2.x1.2.x0.7.xad-hive-hook__hdp__1.2.x-assembly-2.0.0.jar
HDP 2.x2.1.x (LLAP)0.7.xad-hive-hook__hdp__2.1.x-assembly-2.0.0.jar
HDP 3.1.0.x3.1.x0.9.xad-hive-hook__hdp__3.1.0.3.1.0.0-78-assembly-2.0.0.jar
HDP 3.1.4.x3.1.x0.9.xad-hive-hook__hdp__3.1.0.3.1.4.0-315-assembly-2.0.0.jar

Steps to Place Hive Hook JARs

  1. Get the Hive hook JARs from Acceldata (see mapping below).
  2. Place the JARs on all edge nodes, HiveServer2, and Hive interactive nodes in:
Bash
Copy
  1. Ensure the hook directory is readable and executable by all users.
  2. Add the hook JARs to AUX_CLASSPATH:

Update the hook JAR name in the following properties to match your installed HDP distribution version.

  • For HiveServer2, navigate to Hive > Configs > Advanced hive-env, add the following hook jar.

For example:

Bash
Copy
  • For Hive Interactive, navigate to Hive > Configs > Advanced hive-interactive-env, add the following hook jar.

For example:

Bash
Copy
  1. Add the Pulse configurations in Hive > Configs > Custom hive-site and Custom hive-interactive-site.

    1. ad.events.streaming.servers=(<Pulse IP>:19009)
    2. ad.cluster=(cluster name as specified in Pulse installation)
  2. Add io.acceldata.hive.AdHiveHook (use a comma if needed) to the following properties in Hive > Configs > General.

Bash
Copy

Configure HDP Tez for Pulse

This section describes how to configure Tez so that Pulse can collect query events.

Prerequisites

  • Obtain the Hive hook JARs from the Acceldata team.
  • Review the mapping table to identify the correct JAR version for your HDP distribution. For details, see the section Hook JAR Mapping on this page.

Steps for Configuration

  1. Identify the hook JAR location.
  • For Hive 3.x on HDP 3.x (example: HDP 3.1.4):

Use the JAR ad-hive-hook_hdp_3.1.0.3.1.4.0-315-assembly-2.0.0.jar on the HDFS path:

Bash
Copy
  • For Hive 1.x on HDP 2.6.x: Use the JAR ad-hive-hook_hdp_1.2.x-assembly-2.0.0.jar on the HDFS path.
  • For Hive 2.x (LLAP) on HDP 2.6.x:

Use the JAR ad-hive-hook_hdp_2.1.x-assembly-2.0.0.jar on the HDFS path:

Bash
Copy
  1. Update the Tez tarball with the Pulse hook JAR.
Bash
Copy
  1. Update the tez-site properties.
    • In the Ambari UI, navigate to: Tez > Configs > Custom tez-site.
    • Add or update the following properties:
Bash
Copy
  • Optional (Hive 3.x only): The property ad.hdfs.sink is set to true by default. If set to false, Tez will not publish query metadata proto logs to HDFS.

Configure HDP Sqoop for Pulse

This section describes how to configure Sqoop so that Pulse can collect query data.

  1. Copy and place the Hive hook JARs (from the table above) into the Sqoop classpath directory:

Example path:

Bash
Copy
  1. For clusters with LLAP (Hive Interactive) enabled, place both Hive 1.2.x and Hive 2.1.x hook JARs in the classpath.

Result

  • Hive exposes metrics over the configured port.
  • Hive, Tez, and Sqoop are configured with JMX monitoring, hook JARs, and Pulse integration for query and job statistics.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard