Cluster Configuration Changes
Cluster setup adjustments are a necessity for multiple Pulse connectors and services. Maintenance restarts can be conducted prior to, during, or after Pulse installation to implement configuration changes.
To activate SSL and Basic Authentication on the remote JMX port, ensure that the jmxremote.password, jmxremote.access, truststore.jks, and keystore.jks files are already positioned in their correct directories.
Common Changes
HDFS
In the case of HDFS, if users are not permitted or do not have access to the Pulse server to request the Namenode API, the following options are available:
- Under HDFS configurations add property
dfs.cluster.administrators
in advanced or custom hdfs-site.xml with values such as Pulse Kerberos username. - Use and provide one namenode service keytab to Pulse server
- Restart all the affected components and deploy the new client configuration.
MapReduce
Configure ODP as below to show MapReduce jobs in YARN > Application Explorer.
Add the HDFS user to the properties listed below in Ambari > MapReduce configuration.
- mapreduce.cluster.administrators
- MapReduce.cluster.acls.enabled (By default, it’s enabled)
- mapreduce.job.acl-modify-job
- mapreduce.job.acl-view-job user
Add the HDFS user to the property listed below in Ambari > YARN configuration.
- Yarn.admin.acl
After completing the configuration, you need to restart the ad-connector service on Pulse Master.
Access privileges
For services managed by Ranger or by other authorization and requires permission privileges for non-HDFS user, follow the steps:
In case of a non-HDFS users, create a policy and provide read and executable permissions to the following HDFS path(s):
Spark v1, 2 and3 log directories, the following are the default locations, check on the respective cluster:
- HDP 2.x -
/spark2-history
- CDH, CDP -
/user/spark/applicationHistory
,/user/spark/spark2ApplicationHistory
- HDP 2.x -
Hive Query path, below are default locations, please check on the respective cluster:
- HDP 2.x, CDH 5.x or 6.x -
/tmp/ad
- HDP 3.x -
/warehouse/tablespace/external/hive/sys.db/dag_data
,/warehouse/tablespace/external/hive/sys.db/query_data
- CDP 7.x -
/warehouse/tablespace/managed/hive/sys.db/dag_data
,/warehouse/tablespace/managed/hive/sys.db/query_data
- HDP 2.x, CDH 5.x or 6.x -
In the case of a HDFS user being used to connect to Kafka service or any other user that does not have all the privileges for reading metadata information from all Kafka topics:
- Add user to the default access policy for Describe permissions under all topics
Add the SELECT privileges in Ranger to all databases, tables, and columns using the below steps:
- Log on to the Ranger UI.
- Navigate Hadoop SQL and click on it. The list of Hadoop SQL policies appear.
- On the List of Hadoop policies: Hadoop SQL, click on the edit button in Action for the all - databases, tables, and column Policy Name.
- On the Edit Policy page, add the SELECT privileges to the non-HDFS user.
HDP 2.x & 3.x
Kafka
- Login to the Ambari Admin Web UI.
- Navigate to the: Kafka > Configs > Advanced kafka-env > kafka-env.
- Go to the end of the file and add the following line:
export JMX_PORT=${JMX_PORT:-9999}
- To enable Basic Authentication in the JMX Remote Port, use the following parameters:
export KAFKA_JMX_OPTS="$KAFKA_JMX_OPTS -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file>"
Change the values of <>
with appropriate values.
- To enable TLS/SSL on the JMX Remote Port, use the following parameters:
export KAFKA_JMX_OPTS="$KAFKA_JMX_OPTS -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=true -Dcom.sun.management.jmxremote.registry.ssl=true -Djavax.net.ssl.keyStore=</path/to/keystore.jks/file> -Djavax.net.ssl.keyStorePassword=<Keystore Password> -Djavax.net.ssl.trustStore=</path/to/truststore.jks/file> -Djavax.net.ssl.trustStorePassword=<Truststore Password>"
Change the values of <>
with appropriate values.
Kafka 3
- Login to the Ambari Admin Web UI.
- Navigate to the: Kafka > Configs > Advanced kafka3-env > kafka3-env.
- Go to the end of the file and add the following line:
For Kafka 3 with Zookeeper:
export JMX_PORT=${JMX_PORT:-8987}
For Kafka 3 with KRaft:
export JMX_PORT=${JMX_PORT:-8988}
- To enable Basic Authentication in the JMX Remote Port, use the following parameters:
export KAFKA_JMX_OPTS="$KAFKA_JMX_OPTS -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file>"
Change the values of <>
with appropriate values.
- To enable TLS/SSL on the JMX Remote Port, use the following parameters:
export KAFKA_JMX_OPTS="$KAFKA_JMX_OPTS -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=true -Dcom.sun.management.jmxremote.registry.ssl=true -Djavax.net.ssl.keyStore=</path/to/keystore.jks/file> -Djavax.net.ssl.keyStorePassword=<Keystore Password> -Djavax.net.ssl.trustStore=</path/to/truststore.jks/file> -Djavax.net.ssl.trustStorePassword=<Truststore Password>"
Change the values of <>
with appropriate values.
Set ACLs
To set ACLs, run the following commands.
./kafka-acls.sh --bootstrap-server <broker ip> --command-config client-kerb.prop --add --allow-principal User:hdfs --allow-host '*' --operation All --topic '*'
./kafka-acls.sh --bootstrap-server <broker ip> --command-config client-kerb.prop --add --allow-principal User:hdfs --allow-host '*' --operation All --group '*'
Enable Scram-based authentication
Follow the steps to enable scram-based authentication for Kafka 3.
- Navigate to
/data01/acceldata/config/docker/addons/kafka3-connector.yml
. - Update the kafka3-connector.yml file with the following properties.
sasl.mechanism = "SCRAM-SHA-256"
sasl.mechanism = ${?SASL_MECHANISM}
auth.type = ""
auth.type = ${?AUTH_TYPE}
scram {
jaas.login.conf = "/etc/security/jaas-scram.conf"
jaas.login.conf = ${?JAAS_LOGIN_CONF_LOCATION}
- Update auth.type to "SCRAM"
- Set sasl.mechanism to "SCRAM-SHA-256" or "SCRAM-SHA-512"
- Ensure jaas.login.conf points to the correct JAAS configuration file path.
Zookeeper
- Log into the Ambari Admin Web UI.
- Navigate to the: Zookeeper > Configs > Advanced zookeeper-env > zookeeper-env template.
- Go to the end of the file and add the following line:
Before including any of the following lines, ensure to add the JMXDISABLE
environment variable first.
export JMXDISABLE="true"
export SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=8989 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dzookeeper.jmx.log4j.disable=true"
- To enable Basic Authentication in the JMX Remote Port, use the following parameters:
export SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=8989 -Dzookeeper.jmx.log4j.disable=true -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file>"
Change the values of <>
with appropriate values.
- To enable SSL on the JMX Remote Port, use the following parameters:
export SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=8989 -Dzookeeper.jmx.log4j.disable=true -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=true -Dcom.sun.management.jmxremote.registry.ssl=true -Djavax.net.ssl.keyStore=</path/to/keystore.jks/file> -Djavax.net.ssl.keyStorePassword=<Keystore Password> -Djavax.net.ssl.trustStore=</path/to/truststore.jks/file> -Djavax.net.ssl.trustStorePassword=<Truststore Password>"
Change the values of <>
with appropriate values.
Spark
- Login to the Ambari Admin Web UI.
- Navigate to the: Spark > Configs > Advanced spark2-metrics-properties.
- Go to the end of the file and add the following lines:
# Graphite sink class
*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink
# Location of your graphite instance
*.sink.graphite.host=localhost
*.sink.graphite.port=12003
*.sink.graphite.protocol=tcp
*.sink.graphite.prefix=spark.metrics
*.sink.graphite.period=20
master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource
This change requires Pulse node agent running on all spark clients
Additional Note:
In case of edge nodes where spark clients are not being managed by Ambari append above properties to file
/etc/spark2/conf/metrics.properties
Make sure to have following properties enabled for any spark job (spark-defaults.conf):
spark.eventLog.enabled=true
spark.eventLog.dir=hdfs:///spark2-history/
Update same properties in managed configurations for applications running on Spark 1.x and 3.x
Hive
Hive Server 2
- Login to the Ambari Admin Web UI.
- Navigate to: Hive > Configs > Advanced hive-env > hive-env template.
- Go to the end of the file and add the following lines:
Avoid JMX changes for Hive 1.x using MR engine as it has bug that causes query failure with JMX enablement.
if [ "$SERVICE" = "hiveserver2" ]; then
export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8008"
fi
- To enable Basic Authentication in the JMX Remote Port, use the following parameters:
if [ "$SERVICE" = "hiveserver2" ]; then
export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8008"
fi
Change the values of <>
with appropriate values.
- To enable TLS/SSL on the JMX Remote Port, use the following parameters:
if [ "$SERVICE" = "hiveserver2" ]; then
export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=true -Dcom.sun.management.jmxremote.registry.ssl=true -Djavax.net.ssl.keyStore=</path/to/keystore.jks/file> -Djavax.net.ssl.keyStorePassword=<Keystore Password> -Djavax.net.ssl.trustStore=</path/to/truststore.jks/file> -Djavax.net.ssl.trustStorePassword=<Truststore Password> -Dcom.sun.management.jmxremote.port=8008"
fi
Change the values of <>
with appropriate values.
Hive Meta Store
- Login to the Ambari Admin Web UI.
- Navigate to: Hive > Configs > Advanced hive-env > hive-env template.
- Go to the end of the file add the following lines:
if [ "$SERVICE" = "metastore" ]; then
export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8009"
fi
- To enable Basic Authentication in the JMX Remote Port, use the following parameters:
if [ "$SERVICE" = "metastore" ]; then
export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8008"
fi
Change the values of <>
with appropriate values.
- To enable TLS/SSL on the JMX Remote Port, use the following parameters:
if [ "$SERVICE" = "metastore" ]; then
export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=true -Dcom.sun.management.jmxremote.registry.ssl=true -Djavax.net.ssl.keyStore=</path/to/keystore.jks/file> -Djavax.net.ssl.keyStorePassword=<Keystore Password> -Djavax.net.ssl.trustStore=</path/to/truststore.jks/file> -Djavax.net.ssl.trustStorePassword=<Truststore Password> -Dcom.sun.management.jmxremote.port=8008"
fi
Change the values of <>
with appropriate values.
Place Hook Jars
Distro Version | Hive Version | Tez Version | Pulse Hook Jar Name |
---|---|---|---|
HDP 2.x | 1.2.x | 0.7.x | ad-hive-hook__hdp__1.2.x-assembly-1.2.3.jar |
HDP 2.x | 2.1.x (LLAP) | 0.7.x | ad-hive-hook__hdp__2.1.x-assembly-1.2.3.jar |
HDP 3.1.0.x | 3.1.x | 0.9.x | ad-hive-hook__hdp__3.1.0.3.1.0.0-78-assembly-1.2.3.jar |
HDP 3.1.4.x | 3.1.x | 0.9.x | ad-hive-hook__hdp__3.1.0.3.1.4.0-315-assembly-1.2.3.jar |
For above Hive versions, we use hive hooks to capture query statistics, and it requires below config changes:
- Get hive-hook jars (Acceldata team will share) as mentioned in above tables
- Place the provided hook jars on all edge, HiveServer 2, and Hive interactive nodes on local path
/opt/acceldata.
- Hook directory should be readable and executable by all users
- Login to the Ambari Admin Web UI. Navigate to: Hive > Configs > Advanced hive-env, go to the end of the file and add the following lines:
Please change the hook jar name in below properties according to installed HDP distro version
export AUX_CLASSPATH=/opt/acceldata/ad-hive-hook_hdp_1.2.x-assembly-1.2.3.jar
- Navigate to: Hive > Configs > Advanced hive-interactive-env, go to the end of the file and add the following lines:
export AUX_CLASSPATH=/opt/acceldata/ad-hive-hook_hdp_2.1.x-assembly-1.2.3.jar
Navigate to: Hive > Configs > Custom hive-site and Custom hive-interactive-site and add below new property values:
- ad.events.streaming.servers=(<Pulse IP>:19009)
- ad.cluster=(cluster name as specified in Pulse installation)
Navigate to: Hive > Configs > General, append
io.acceldata.hive.AdHiveHook
with comma(if needed) for the following properties:- hive.exec.failure.hooks
- hive.exec.pre.hooks
- hive.exec.post.hooks
Tez
Get same hive-hook jars (Acceldata team will share) as mentioned in above mapping table
Login to any HDFS client node and follow below steps to add Pulse hook jar inside Tez tar
In case of Hive 3.x for HDP 3.x, use below locations to update respective hook jars for the version, example here for HDP 3.1.4 hook jar:
- Use
ad-hive-hook_hdp_3.1.0.3.1.4.0-315-assembly-1.2.3.jar
for Hive 3.x on HDFS path/hdp/apps/${hdp.version}/tez/tez.tar.gz
- Use
In case of both Hive 1.x and Hive 2.x (LLAP) such as HDP 2.6.x, use below locations to update respective hook jars for the version:
- Use
ad-hive-hook_hdp_1.2.x-assembly-1.2.3.jar
for Hive 1.x on HDFS path/hdp/apps/${hdp.version}/tez/tez.tar.gz
- Use
ad-hive-hook_hdp_2.1.x-assembly-1.2.3.jar
for Hive 2.x on HDFS path/hdp/apps/${hdp.version}/tez_hive2/tez.tar.gz
- Use
# Create a directory
mkdir -p tez_pack/ && cd tez_pack
# Take backup of existing tez tarball in HDFS /tmp
hdfs dfs -cp /hdp/apps/<cluster_version>/tez/tez.tar.gz /tmp
# Download tez tarball from HDFS to local, switch to accesible user
hdfs dfs -get /hdp/apps/<cluster_version>/tez/tez.tar.gz .
# Unpack the tarball
tar -zxvf tez.tar.gz
# Copy Pulse hook jar to tez libs/
cp </location../../pulse_hook.jar> ./lib/
# Package tez tarball
tar -cvzf /tmp/tez.tar.gz .
# Upload back and provide right permissions and ownership
hdfs dfs -put -f /tmp/tez.tar.gz /hdp/apps/<cluster_version>/tez/tez.tar.gz
hdfs dfs -chown hdfs:hadoop /hdp/apps/<cluster_version>/tez/tez.tar.gz
hdfs dfs -chmod 755 /hdp/apps/<cluster_version>/tez/tez.tar.gz
- Navigate to: Tez > Configs > Custom tez-site and add/update below property values:
- tez.history.logging.service.class=io.acceldata.hive.AdTezEventsNatsClient
- ad.events.streaming.servers (PULSE_IP:19009)
- ad.cluster (your cluster name, ex: ad_hdp3_dev)
- [Optional Step for Hive 3.x] ad.hdfs.sink is by default set to true, if false then TEZ will not publish query metadata proto logging details to HDFS
Sqoop
Copy and place the above specified hook jars on sqoop classpath directory (for example: /usr/hdp/current/sqoop-client/lib
). For LLAP (hive interactive) enabled cluster copy both Hive v1.2.x and 2.1.x jars on the classpath.
ODP 3.2.x
All Ambari changes will be available as part of release including hook jars and JMX changes except for few components, please validate following details once as part of general checks:
ODP Kafka
- Login to the Ambari Admin Web UI.
- Navigate to the: Kafka > Configs > Advanced kafka-env > kafka-env.
- Go to the end of the file and add the following line:
export JMX_PORT=${JMX_PORT:-9999}
- To enable Basic Authentication in the JMX Remote Port, use the following parameters:
export KAFKA_JMX_OPTS="$KAFKA_JMX_OPTS -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file>"
Change the values of <>
with appropriate values.
- To enable TLS/SSL on the JMX Remote Port, use the following parameters:
export KAFKA_JMX_OPTS="$KAFKA_JMX_OPTS -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=true -Dcom.sun.management.jmxremote.registry.ssl=true -Djavax.net.ssl.keyStore=</path/to/keystore.jks/file> -Djavax.net.ssl.keyStorePassword=<Keystore Password> -Djavax.net.ssl.trustStore=</path/to/truststore.jks/file> -Djavax.net.ssl.trustStorePassword=<Truststore Password>"
Change the values of <>
with appropriate values.
ODP Kafka 3
- Login to the Ambari Admin Web UI.
- Navigate to the: Kafka > Configs > Advanced kafka3-env > kafka3-env.
- Go to the end of the file and add the following line:
For Kafka 3 with Zookeeper:
export JMX_PORT=${JMX_PORT:-8987}
For Kafka 3 with KRaft:
export JMX_PORT=${JMX_PORT:-8988}
- To enable Basic Authentication in the JMX Remote Port, use the following parameters:
export KAFKA_JMX_OPTS="$KAFKA_JMX_OPTS -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file>"
Change the values of <>
with appropriate values.
- To enable TLS/SSL on the JMX Remote Port, use the following parameters:
export KAFKA_JMX_OPTS="$KAFKA_JMX_OPTS -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=true -Dcom.sun.management.jmxremote.registry.ssl=true -Djavax.net.ssl.keyStore=</path/to/keystore.jks/file> -Djavax.net.ssl.keyStorePassword=<Keystore Password> -Djavax.net.ssl.trustStore=</path/to/truststore.jks/file> -Djavax.net.ssl.trustStorePassword=<Truststore Password>"
Change the values of <>
with appropriate values.
Set ACLs
To set ACLs, run the following commands.
./kafka-acls.sh --bootstrap-server <broker ip> --command-config client-kerb.prop --add --allow-principal User:hdfs --allow-host '*' --operation All --topic '*'
./kafka-acls.sh --bootstrap-server <broker ip> --command-config client-kerb.prop --add --allow-principal User:hdfs --allow-host '*' --operation All --group '*'
Enable Scram-based authentication
Follow the steps to enable scram-based authentication for Kafka 3.
- Navigate to
/data01/acceldata/config/docker/addons/kafka3-connector.yml
. - Update the kafka3-connector.yml file with the following properties.
sasl.mechanism = "SCRAM-SHA-256"
sasl.mechanism = ${?SASL_MECHANISM}
auth.type = ""
auth.type = ${?AUTH_TYPE}
scram {
jaas.login.conf = "/etc/security/jaas-scram.conf"
jaas.login.conf = ${?JAAS_LOGIN_CONF_LOCATION}
- Update auth.type to "SCRAM"
- Set sasl.mechanism to "SCRAM-SHA-256" or "SCRAM-SHA-512"
- Ensure jaas.login.conf points to the correct JAAS configuration file path.
ODP Zookeeper
- Login to the Ambari Admin Web UI.
- Navigate to the: Zookeeper > Configs > Advanced zookeeper-env > zookeeper-env template.
- Go to the end of the file and add the following line:
Before including any of the following lines, ensure to add the JMXDISABLE
environment variable first.
export JMXDISABLE="true"
export SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=8989 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dzookeeper.jmx.log4j.disable=true"
- To enable Basic Authentication in the JMX Remote Port, use the following parameters:
export SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=8989 -Dzookeeper.jmx.log4j.disable=true -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file>"
Change the values of <>
with appropriate values.
- To enable SSL on the JMX Remote Port, use the following parameters:
export SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=8989 -Dzookeeper.jmx.log4j.disable=true -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=true -Dcom.sun.management.jmxremote.registry.ssl=true -Djavax.net.ssl.keyStore=</path/to/keystore.jks/file> -Djavax.net.ssl.keyStorePassword=<Keystore Password> -Djavax.net.ssl.trustStore=</path/to/truststore.jks/file> -Djavax.net.ssl.trustStorePassword=<Truststore Password>"
Change the values of <>
with appropriate values.
ODP Hive
To see the Hive table details with data on the UI, ensure to set the boolean value for hive.stats.autogather and hive.stats.column.autogather in the hive-site.xml file for it to compute the data automatically.
You can also run the following command manually to compute the table analysis.
ANALYZE TABLE <table name> COMPUTE STATISTICS
Hive Server 2
- Login to the Ambari Admin Web UI.
- Navigate to: Hive > Configs > Advanced hive-env > hive-env template.
- Go to the end of the file add the following lines:
Avoid JMX changes for Hive 1.x using MR engine as it has bug that causes query failure with JMX enablement.
if [ "$SERVICE" = "hiveserver2" ]; then
export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8008"
fi
- To enable Basic Authentication in the JMX Remote Port, use the following parameters:
if [ "$SERVICE" = "hiveserver2" ]; then
export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8008"
fi
Change the values of <>
with appropriate values.
- To enable TLS/SSL on the JMX Remote Port, use the following parameters:
if [ "$SERVICE" = "hiveserver2" ]; then
export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=true -Dcom.sun.management.jmxremote.registry.ssl=true -Djavax.net.ssl.keyStore=</path/to/keystore.jks/file> -Djavax.net.ssl.keyStorePassword=<Keystore Password> -Djavax.net.ssl.trustStore=</path/to/truststore.jks/file> -Djavax.net.ssl.trustStorePassword=<Truststore Password> -Dcom.sun.management.jmxremote.port=8008"
fi
Change the values of <>
with appropriate values.
Hive Meta Store
- Login to the Ambari Admin Web UI.
- Navigate to: Hive > Configs > Advanced hive-env > hive-env template.
- Go to the end of the file add the following lines:
if [ "$SERVICE" = "metastore" ]; then
export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8009"
fi
- To enable Basic Authentication in the JMX Remote Port, use the following parameters:
if [ "$SERVICE" = "metastore" ]; then
export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8008"
fi
Change the values of <>
with appropriate values.
- To enable TLS/SSL on the JMX Remote Port, use the following parameters:
if [ "$SERVICE" = "metastore" ]; then
export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=true -Dcom.sun.management.jmxremote.registry.ssl=true -Djavax.net.ssl.keyStore=</path/to/keystore.jks/file> -Djavax.net.ssl.keyStorePassword=<Keystore Password> -Djavax.net.ssl.trustStore=</path/to/truststore.jks/file> -Djavax.net.ssl.trustStorePassword=<Truststore Password> -Dcom.sun.management.jmxremote.port=8008"
fi
Change the values of <>
with appropriate values.
Place Hook Jars
Pulse hook JARs are included in the installation package. Additional configuration changes are required as described below.
JMX enablement should already be in place similar to HDP changes.
Update below properties as per the installation details:
- ad.events.streaming.servers=(<Pulse IP>:19009)
- ad.cluster=(cluster name as specified in Pulse installation)
Navigate to: Hive > Configs > General, check if
io.acceldata.hive.AdHiveHook
is appended with comma under following properties:- hive.exec.failure.hooks
- hive.exec.pre.hooks
- hive.exec.post.hooks
ODP Tez
Pulse hook JARs are included in the installation package. Additional configuration changes are required as described below.
Update below properties as per the installation details:
- ad.events.streaming.servers=(<Pulse IP>:19009)
- ad.cluster=(cluster name as specified in Pulse installation)
Navigate to: Tez > Configs check if property tez.history.logging.service.class is configured to
io.acceldata.hive.AdTezEventsNatsClient
ODP Spark 2 & 3
- Login to the Ambari Admin Web UI.
- Navigate to the: Spark > Configs > Advanced spark2-metrics-properties.
- Go to the end of the file and add the following lines:
# Graphite sink class
*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink
# Location of your graphite instance
*.sink.graphite.host=localhost
*.sink.graphite.port=12003
*.sink.graphite.protocol=tcp
*.sink.graphite.prefix=spark.metrics
*.sink.graphite.period=20
master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource
Additional Config changes (common for HDP & ODP)
Additional configuration changes are required if ACL is enabled at services or running an old service versions:
- YARN ACL- Check if ACL enabled for YARN (
yarn.acl.enable
), if yes add this propertyyarn.timeline-service.read.allowed.users=hdfs
in custom yarn-site.xml. Restart Yarn service. Herehdfs
is the default user being used and shared by the team, enter other specific users created for Pulse. - Kafka Protocol- Modify “PLAINTEXTSASL____ _** _ __** _** _ __** _** _ __** _** _ __** _** _ __** _** _ _ ** ”_ **Listeners and Interbroker protocol on Ambari > Kafka to _SASL_PLAINTEXT**_ ,also check for listeners and update it to
SASL_PLAINTEXT://localhost:6667
- Kafka ACL- Allow Kafka ACL(s) permission to
hdfs
user, run below command usingkafka
user:
/usr/hdp/current/kafka-broker/bin/kafka-acls.sh --authorizer-properties zookeeper.connect=<zk_hostname>:2181 --add --allow-principal User:hdfs --operation All --topic '*' --cluster
/usr/hdp/current/kafka-broker/bin/kafka-acls.sh --authorizer-properties zookeeper.connect=<zk_hostname>:2181 --add --allow-principal User:hdfs --operation All --group '*' --cluster
CDH 5.x & 6.x
Kafka and Zookeeper JMX are auto-enabled with CDH-based installation.
CDH Spark
- Under the Spark configurations search for
(Safety Valve) for spark-conf/spark-defaults.conf
. - Add the following properties:
spark.metrics.conf.*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink
spark.metrics.conf.*.sink.graphite.host=localhost
spark.metrics.conf.*.sink.graphite.port=12003
spark.metrics.conf.*.sink.graphite.protocol=tcp
spark.metrics.conf.*.sink.graphite.prefix=spark.metrics
spark.metrics.conf.*.sink.graphite.period=20
spark.metrics.conf.*.source.jvm.class=org.apache.spark.metrics.source.JvmSource
- Repeat the same steps for Spark2 configurations search for
Spark Client Advanced Configuration Snippet (Safety Valve) for spark2-conf/spark-defaults.conf
and add the preceding properties.
Refer to the special additional note in the HDP 2.x & 3.x section under spark configuration changes.
CDH Hive
Distro Version | Hive Version | Pulse Hook Jar Name |
---|---|---|
CDH 6.2.x | 2.1.x | ad-hive-hook__2.1.1__cdh6.2.1-assembly-1.2.3.jar |
CDH 6.3.4 | 2.1.x | ad-hive-hook__cdh__3.0.0-assembly-1.2.3.jar |
For above Hive versions, we use hive hooks to capture query statistics, and it requires below config changes:
- Get hive-hook jars (Acceldata team will share) as mentioned in above tables
- Place the provided hook jars on all edge, Hiveserver2 on local path
/opt/acceldata
- Hook directory should be readable and executable by all users
- Under Hive, configurations search for the Gateway Client Environment Advanced Configuration Snippet (Safety Valve) for hive-env.sh and add the following property:
AUX_CLASSPATH=${AUX_CLASSPATH}:/opt/acceldata/<AD HIVE 1.x or HIVE 2.x hook jar name>
- Under Hive, configurations search for
Hive Client Advanced Configuration Snippet (Safety Valve) for hive-site.xml
&HiveServer2 Advanced Configuration Snippet (Safety Valve) for hive-site.xml
, change view as xml _and add the following properties:
<property>
<name>hive.exec.failure.hooks</name>
<value>io.acceldata.hive.AdHiveHook</value>
<description>for Acceldata APM</description>
</property>
<property>
<name>hive.exec.post.hooks</name>
<value>io.acceldata.hive.AdHiveHook</value>
<description>for Acceldata APM</description>
</property>
<property>
<name>hive.exec.pre.hooks</name>
<value>io.acceldata.hive.AdHiveHook</value>
<description>for Acceldata APM</description>
</property>
Add below new properties under Advanced hive-site xml section:
- ad.events.streaming.servers=(<Pulse IP>:19009)
- ad.cluster=(cluster name as specified in Pulse installation)
Restart the affected Hive Components and deploy the new client configuration
CDH Sqoop
Place the hook jar in classpath libraries of Sqoop client on given edge nodes.
CDP 7.x
Kafka and Zookeeper JMX are auto enabled with CDH based installation. Make the same changes for Spark as mentioned for Spark in CDH section.
CDP Kafka
Under Additional Broker Java Options/broker_java_opts
, replace the
-Dcom.sun.management.jmxremote.host=127.0.0.1
with -Dcom.sun.management.jmxremote.host=0.0.0.0
CDP Hive
To see the Hive table details with data on the UI, ensure to set the boolean value for hive.stats.autogather and hive.stats.column.autogather in the hive-site.xml file for it to compute the data automatically.
You can also run the following command manually to compute the table analysis.
ANALYZE TABLE <table name> COMPUTE STATISTICS
Under the Hive -> Java Configuration Options for Hive Metastore Server
, update the property with following value:
{{JAVA_GC_ARGS}} -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8009
Under Hive on Tez -> Java Configuration Options for HiveServer2
, update the property with following value:
{{JAVA_GC_ARGS}} -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8008
Distro Version | Hive Version | Tez Version | Pulse Hook Jar Name |
---|---|---|---|
CDP | 3.1.3 | 0.9.1 | ad-hive-hook_cdp_ 3.1.3-assembly-1.2.3.jar |
For above Hive versions, we use hive hooks to capture query statistics, and it requires below config changes:
- Get hive-hook jars (Acceldata team will share) as mentioned in above tables
- Place the provided hook jars on all edge, Hiveserver2 on local path
/opt/acceldata
- Hook directory should be readable and executable by all users
- Under component Hive , search configuration
Hive Service Environment Advanced Configuration Snippet (Safety Valve)
, and under component Hive on Tez, search configurationHive on Tez Service Environment Advanced Configuration Snippet (Safety Valve)
and add the following property:
AUX_CLASSPATH=${AUX_CLASSPATH}:/opt/acceldata/ad-hive-hook_cdp_3.1.3-assembly-1.2.3.jar
- Under component Hive, search configuration for
Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml
, and under component Hive on Tez, search configuration forHive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml
, change view as xml and add the following properties:
<property>
<name>ad.cluster</name>
<value>[cluster_name]</value>
</property>
<property>
<name>ad.events.streaming.servers</name>
<value>[PULSE_IP]:19009</value>
</property>
<property>
<name>hive.exec.failure.hooks</name>
<value>io.acceldata.hive.AdHiveHook</value>
<description>for Acceldata APM</description>
</property>
<property>
<name>hive.exec.post.hooks</name>
<value>io.acceldata.hive.AdHiveHook</value>
<description>for Acceldata APM</description>
</property>
<property>
<name>hive.exec.pre.hooks</name>
<value>io.acceldata.hive.AdHiveHook</value>
<description>for Acceldata APM</description>
</property>
- Restart the affected Hive Components and deploy the new client configuration
CDP TEZ
- Get hive-hook jars (Acceldata team will share) as mentioned in above tables
- Login to any HDFS client node and follow below steps to add Pulse hook jar inside Tez tar:
Avoid clicking action available under Tez "Upload Tez tar file to HDFS"
# Create a directory
mkdir -p tez_pack/ && cd tez_pack
# Take backup of existing tez tarball in HDFS /tmp
hdfs dfs -cp /user/tez/<tez_version>/tez.tar.gz /tmp
# Download tez tarball from HDFS to local, switch to accesible user
hdfs dfs -get /user/tez/<tez_version>/tez.tar.gz .
# Unpack the tarball
tar -zxvf tez.tar.gz
# Copy Pulse hook jar to tez libs/
cp </location../../pulse_hook.jar> ./lib/
# Package tez tarball
tar -cvzf /tmp/tez.tar.gz .
# Upload back and provide right permissions and ownership
hdfs dfs -put -f /tmp/tez.tar.gz /user/tez/<tez_version>/tez.tar.gz
hdfs dfs -chown tez:hadoop /user/tez/<tez_version>/tez.tar.gz
hdfs dfs -chmod 755 /user/tez/<tez_version>/tez.tar.gz
- Under component Tez, search configuration for
Tez Client Advanced Configuration Snippet (Safety Valve) for tez-conf/tez-site.xml
, change view as xml and add the following properties:
<property>
<name>ad.cluster</name>
<value>[cluster_name]</value>
</property>
<property>
<name>ad.events.streaming.servers</name>
<value>[PULSE_IP]:19009</value>
</property>
<property>
<name>tez.history.logging.service.class</name>
<value>io.acceldata.hive.AdTezEventsNatsClient</value>
</property>