This page describes how to configure Hive, Tez, and Sqoop services in HDP clusters to enable Pulse to collect query statistics, performance metrics, and lineage information.
Configure HDP Hive for Pulse
This section describes how to configure Hive so that Pulse can collect query data and metrics.
Configure HiveServer2 JMX
In the Ambari UI:
- Navigate to Hive > Configs >
Advanced hive-env>hive-env template. - In
hive-envtemplate``, choose and add one of the following JMX configurations based on your security requirements.
Enable JMX without Security on JMX Remote Port
To enable JMX port without any security, add the following parameters at the end of the file:
if [ "$SERVICE" = "hiveserver2" ]; thenexport HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8008"fiAvoid JMX Changes for Hive 1.x Using the MR Engine
Do not enable JMX for Hive 1.x when using the MapReduce (MR) execution engine. A known bug can cause query failures if JMX is enabled.
Enable Basic Authentication on JMX Remote Port (Optional)
To enable basic authentication on the JMX remote port, add the following parameters:
if [ "$SERVICE" = "hiveserver2" ]; thenexport HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8008"fiEnable TLS/SSL on JMX Remote Port (Optional)
To enable TLS/SSL authentication on the JMX remote port, add the following parameters:
if [ "$SERVICE" = "hiveserver2" ]; thenexport HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=true -Dcom.sun.management.jmxremote.registry.ssl=true -Djavax.net.ssl.keyStore=</path/to/keystore.jks/file> -Djavax.net.ssl.keyStorePassword=<Keystore Password> -Djavax.net.ssl.trustStore=</path/to/truststore.jks/file> -Djavax.net.ssl.trustStorePassword=<Truststore Password> -Dcom.sun.management.jmxremote.port=8008"fiConfigure Hive Metastore JMX
In the Ambari UI:
- Navigate to Hive > Configs >
Advanced hive-env>hive-env template. - In
hive-envtemplate``, choose and add one of the following JMX configurations based on your security requirements.
Enable JMX without Security on JMX Remote Port
To enable JMX port without any security, add the following parameters at the end of the file:
if [ "$SERVICE" = "metastore" ]; thenexport HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8009"fiEnable Basic Authentication on JMX Remote Port (Optional)
To enable basic authentication on the JMX remote port, add the following parameters:
if [ "$SERVICE" = "metastore" ]; thenexport HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8008"fiEnable TLS/SSL on JMX Remote Port (Optional)
To enable TLS/SSL authentication on the JMX remote port, add the following parameters:
if [ "$SERVICE" = "metastore" ]; thenexport HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=true -Dcom.sun.management.jmxremote.registry.ssl=true -Djavax.net.ssl.keyStore=</path/to/keystore.jks/file> -Djavax.net.ssl.keyStorePassword=<Keystore Password> -Djavax.net.ssl.trustStore=</path/to/truststore.jks/file> -Djavax.net.ssl.trustStorePassword=<Truststore Password> -Dcom.sun.management.jmxremote.port=8008"fiPlace Hive Hook JARs
Pulse uses Hive hooks to capture query statistics.
Hook JAR Mapping
| Distro Version | Hive Version | Tez Version | Pulse Hook Jar Name |
|---|---|---|---|
| HDP 2.x | 1.2.x | 0.7.x | ad-hive-hook__hdp__1.2.x-assembly-2.0.0.jar |
| HDP 2.x | 2.1.x (LLAP) | 0.7.x | ad-hive-hook__hdp__2.1.x-assembly-2.0.0.jar |
| HDP 3.1.0.x | 3.1.x | 0.9.x | ad-hive-hook__hdp__3.1.0.3.1.0.0-78-assembly-2.0.0.jar |
| HDP 3.1.4.x | 3.1.x | 0.9.x | ad-hive-hook__hdp__3.1.0.3.1.4.0-315-assembly-2.0.0.jar |
Steps to Place Hive Hook JARs
- Get the Hive hook JARs from Acceldata (see mapping below).
- Place the JARs on all edge nodes, HiveServer2, and Hive interactive nodes in:
/opt/acceldata- Ensure the hook directory is readable and executable by all users.
- Add the hook JARs to
AUX_CLASSPATH:
Update the hook JAR name in the following properties to match your installed HDP distribution version.
- For HiveServer2, navigate to Hive > Configs >
Advanced hive-env, add the following hook jar.
For example:
export AUX_CLASSPATH=/opt/acceldata/ad-hive-hook_hdp_1.2.x-assembly-2.0.0.jar- For Hive Interactive, navigate to Hive > Configs >
Advanced hive-interactive-env, add the following hook jar.
For example:
export AUX_CLASSPATH=/opt/acceldata/ad-hive-hook_hdp_2.1.x-assembly-2.0.0.jarAdd the Pulse configurations in Hive > Configs >
Custom hive-siteandCustom hive-interactive-site.- ad.events.streaming.servers=(<Pulse IP>:19009)
- ad.cluster=(cluster name as specified in Pulse installation)
Add
io.acceldata.hive.AdHiveHook(use a comma if needed) to the following properties in Hive > Configs > General.
hive.exec.failure.hookshive.exec.pre.hookshive.exec.post.hooksConfigure HDP Tez for Pulse
This section describes how to configure Tez so that Pulse can collect query events.
Prerequisites
- Obtain the Hive hook JARs from the Acceldata team.
- Review the mapping table to identify the correct JAR version for your HDP distribution. For details, see the section Hook JAR Mapping on this page.
Steps for Configuration
- Identify the hook JAR location.
- For Hive 3.x on HDP 3.x (example: HDP 3.1.4):
Use the JAR ad-hive-hook_hdp_3.1.0.3.1.4.0-315-assembly-2.0.0.jar on the HDFS path:
/hdp/apps/${hdp.version}/tez/tez.tar.gz- For Hive 1.x on HDP 2.6.x:
Use the JAR
ad-hive-hook_hdp_1.2.x-assembly-2.0.0.jaron the HDFS path.
- For Hive 2.x (LLAP) on HDP 2.6.x:
Use the JAR ad-hive-hook_hdp_2.1.x-assembly-2.0.0.jar on the HDFS path:
/hdp/apps/${hdp.version}/tez_hive2/tez.tar.gz- Update the Tez tarball with the Pulse hook JAR.
# Create a directorymkdir -p tez_pack/ && cd tez_pack# Take backup of existing tez tarball in HDFS /tmphdfs dfs -cp /hdp/apps/<cluster_version>/tez/tez.tar.gz /tmp# Download tez tarball from HDFS to local, switch to accesible userhdfs dfs -get /hdp/apps/<cluster_version>/tez/tez.tar.gz .# Unpack the tarballtar -zxvf tez.tar.gz# Copy Pulse hook jar to tez libs/cp </location../../pulse_hook.jar> ./lib/# Package tez tarballtar -cvzf /tmp/tez.tar.gz .# Upload back and provide right permissions and ownershiphdfs dfs -put -f /tmp/tez.tar.gz /hdp/apps/<cluster_version>/tez/tez.tar.gzhdfs dfs -chown hdfs:hadoop /hdp/apps/<cluster_version>/tez/tez.tar.gzhdfs dfs -chmod 755 /hdp/apps/<cluster_version>/tez/tez.tar.gz- Update the
tez-siteproperties.- In the Ambari UI, navigate to: Tez > Configs >
Custom tez-site. - Add or update the following properties:
- In the Ambari UI, navigate to: Tez > Configs >
tez.history.logging.service.class=io.acceldata.hive.AdTezEventsNatsClientad.events.streaming.servers (PULSE_IP:19009)ad.cluster (your cluster name, ex: ad_hdp3_dev)- Optional (Hive 3.x only): The property
ad.hdfs.sinkis set totrueby default. If set tofalse, Tez will not publish query metadata proto logs to HDFS.
Configure HDP Sqoop for Pulse
This section describes how to configure Sqoop so that Pulse can collect query data.
- Copy and place the Hive hook JARs (from the table above) into the Sqoop classpath directory:
Example path:
/usr/hdp/current/sqoop-client/lib- For clusters with LLAP (Hive Interactive) enabled, place both Hive 1.2.x and Hive 2.1.x hook JARs in the classpath.
Result
- Hive exposes metrics over the configured port.
- Hive, Tez, and Sqoop are configured with JMX monitoring, hook JARs, and Pulse integration for query and job statistics.