This page describes how to configure Hive and Tez in CDP clusters so that Pulse can collect query statistics, performance metrics, and lineage information.
Configure CDP Hive for Pulse
View Hive Table Details in Pulse
To display Hive table details with data in the Pulse UI:
- Enable automatic statistics gathering. In the
hive-site.xmlfile, set the following properties totrue:
hive.stats.autogatherhive.stats.column.autogather- This allows Hive to compute table statistics automatically.
- Compute statistics manually (optional). You can also run the following Hive command to compute table statistics manually:
ANALYZE TABLE <table_name> COMPUTE STATISTICS;Enable JMX for Hive Metastore and HiveServer2
For Hive Metastore Server:
- Navigate to Hive > Java Configuration Options.
- Update the property with the following values.
Bash+ x {{JAVA_GC_ARGS}} -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8009For HiveServer2:
- Navigate to Tez > Java Configuration Options.
- Update the property with the following values.
Bash+ x {{JAVA_GC_ARGS}} -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8008Place Hive Hook JARs
- Get the Hive hook JARs from the Acceldata team (refer to the version mapping table).
- Place the JARs on all edge and HiveServer2 nodes in the local path:
/opt/acceldataHook Version Mapping
| Distro Version | Hive Version | Tez Version | Pulse Hook Jar Name |
|---|---|---|---|
| CDP | 3.1.3 | 0.9.1 | ad-hive-hook cdp 3.1.3-assembly-2.0.0.jar |
- Ensure the hook directory is readable and executable by all users.
Update Hive Environment with Hook JAR
In CM, search for:
- Hive Service Environment Advanced Configuration Snippet (Safety Valve)
- Hive on Tez Service Environment Advanced Configuration Snippet (Safety Valve)
Add the following property:
AUX_CLASSPATH=${AUX_CLASSPATH}:/opt/acceldata/ad-hive-hook_cdp_3.1.3-assembly-2.0.0.jarUpdate Hive Site Properties
In CM, search for:
- Hive Service Advanced Configuration Snippet (Safety Valve) for
hive-site.xml - Hive on Tez Service Advanced Configuration Snippet (Safety Valve) for
hive-site.xml
Switch to XML view and add the following properties:
<property><name>ad.cluster</name><value>[cluster_name]</value></property><property><name>ad.events.streaming.servers</name><value>[PULSE_IP]:19009</value></property><property><name>hive.exec.failure.hooks</name><value>io.acceldata.hive.AdHiveHook</value><description>for Acceldata APM</description></property><property><name>hive.exec.post.hooks</name><value>io.acceldata.hive.AdHiveHook</value><description>for Acceldata APM</description></property><property><name>hive.exec.pre.hooks</name><value>io.acceldata.hive.AdHiveHook</value><description>for Acceldata APM</description></property>Restart Hive Services
- Restart affected Hive components.
- Deploy the updated client configuration.
Configure CDP Tez for Pulse
Place Tez Hook JARs
- Obtain the Hive hook JARs from the Acceldata team (refer to the version mapping table).
- Log in to any HDFS client node.
Avoid clicking action available under Tez "Upload Tez tar file to HDFS"
Update Tez Tarball
Update the Tez tarball with the Pulse hook JAR.
During upgrades or hook updates, ensure the Tez tarball contains only the latest Acceldata hook JAR. Remove any older hook files to avoid conflicts.
# Create a directorymkdir -p tez_pack/ && cd tez_pack# Take backup of existing tez tarball in HDFS /tmphdfs dfs -cp /user/tez/<tez_version>/tez.tar.gz /tmp# Download tez tarball from HDFS to local, switch to accesible userhdfs dfs -get /user/tez/<tez_version>/tez.tar.gz .# Unpack the tarballtar -zxvf tez.tar.gz# Copy Pulse hook jar to tez libs/cp </location../../pulse_hook.jar> ./lib/# Package tez tarballtar -cvzf /tmp/tez.tar.gz .# Upload back and provide right permissions and ownershiphdfs dfs -put -f /tmp/tez.tar.gz /user/tez/<tez_version>/tez.tar.gzhdfs dfs -chown tez:hadoop /user/tez/<tez_version>/tez.tar.gzhdfs dfs -chmod 755 /user/tez/<tez_version>/tez.tar.gzUpdate Tez Site Properties
- In CM, search for:
Tez Client Advanced Configuration Snippet (Safety Valve) for tez-conf/tez-site.xml - Switch to XML view and add the required Pulse properties.
<property><name>ad.cluster</name><value>[cluster_name]</value></property><property><name>ad.events.streaming.servers</name><value>[PULSE_IP]:19009</value></property><property><name>tez.history.logging.service.class</name><value>io.acceldata.hive.AdTezEventsNatsClient</value></property>Result
The CDP cluster services are configured with Pulse hook JARs and JMX properties. Pulse can collect metrics, capture query data, and integrate with Hive and Tez.