Cluster Configuration Changes

Cluster setup adjustments are a necessity for multiple Pulse connectors and services. Maintenance restarts can be conducted prior to, during, or after Pulse installation to implement configuration changes.

To activate SSL and Basic Authentication on the remote JMX port, ensure that the jmxremote.password, jmxremote.access, truststore.jks, and keystore.jks files are already positioned in their correct directories.

Common Changes

HDFS

In the case of HDFS, if users are not permitted or do not have access to the Pulse server to request the Namenode API, the following options are available:

  1. Under HDFS configurations add property dfs.cluster.administrators in advanced or custom hdfs-site.xml with values such as Pulse Kerberos username.
  2. Use and provide one namenode service keytab to Pulse server
  3. Restart all the affected components and deploy the new client configuration.

MapReduce

Configure ODP as below to show MapReduce jobs in YARN > Application Explorer.

  • Add the HDFS user to the properties listed below in Ambari > MapReduce configuration.

    • mapreduce.cluster.administrators
    • MapReduce.cluster.acls.enabled (By default, it’s enabled)
    • mapreduce.job.acl-modify-job
    • mapreduce.job.acl-view-job user
  • Add the HDFS user to the property listed below in Ambari > YARN configuration.

    • Yarn.admin.acl

After completing the configuration, you need to restart the ad-connector service on Pulse Master.

Access privileges

For services managed by Ranger or by other authorization and requires permission privileges for non-HDFS user, follow the steps:

In case of a non-HDFS users, create a policy and provide read and executable permissions to the following HDFS path(s):

  • Spark v1, 2 and3 log directories, the following are the default locations, check on the respective cluster:

    • HDP 2.x - /spark2-history
    • CDH, CDP - /user/spark/applicationHistory, /user/spark/spark2ApplicationHistory
  • Hive Query path, below are default locations, please check on the respective cluster:

    • HDP 2.x, CDH 5.x or 6.x - /tmp/ad
    • HDP 3.x - /warehouse/tablespace/external/hive/sys.db/dag_data, /warehouse/tablespace/external/hive/sys.db/query_data
    • CDP 7.x - /warehouse/tablespace/managed/hive/sys.db/dag_data, /warehouse/tablespace/managed/hive/sys.db/query_data
  • In the case of a HDFS user being used to connect to Kafka service or any other user that does not have all the privileges for reading metadata information from all Kafka topics:

    • Add user to the default access policy for Describe permissions under all topics
  • Add the SELECT privileges in Ranger to all databases, tables, and columns using the below steps:

    1. Log on to the Ranger UI.
    2. Navigate Hadoop SQL and click on it. The list of Hadoop SQL policies appear.
    3. On the List of Hadoop policies: Hadoop SQL, click on the edit button in Action for the all - databases, tables, and column Policy Name.
    4. On the Edit Policy page, add the SELECT privileges to the non-HDFS user.

Grant MySQL Permissions

To enable Pulse to collect Hive and Oozie metadata stored in MySQL, you must grant the required permissions.

  1. Log in to MySQL as the root or an administrative user.
Bash
Copy
  1. Create the users (if they do not already exist):
Bash
Copy
  1. Grant read-only (SELECT) privileges (replace placeholders with actual values):
Bash
Copy

The commands vary depending on the MySQL version.

  • hive_database / oozie_database: Names of the Hive and Oozie metadata databases.
  • hive_user / oozie_user: Usernames Pulse uses to access these databases.
  • Pulse_host: Hostname or IP address of the Pulse server. Use % to allow access from any host.
  • Password: Password assigned to the database user.
  1. Apply changes.
Bash
Copy

Example:

Bash
Copy

HDP 2.x & 3.x

Kafka

  1. Login to the Ambari Admin Web UI.
  2. Navigate to the: Kafka > Configs > Advanced kafka-env > kafka-env.
  3. Go to the end of the file and add the following line:

__

Bash__ +

export JMX_PORT=${JMX_PORT:-9999}

__

  1. To enable Basic Authentication in the JMX Remote Port, use the following parameters:

__

Bash__ +

export KAFKA_JMX_OPTS="$KAFKA_JMX_OPTS -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file>"

__

Change the values of <> with appropriate values.

  1. To enable TLS/SSL on the JMX Remote Port, use the following parameters:

__

Bash__ +

export KAFKA_JMX_OPTS="$KAFKA_JMX_OPTS -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=true -Dcom.sun.management.jmxremote.registry.ssl=true -Djavax.net.ssl.keyStore=</path/to/keystore.jks/file> -Djavax.net.ssl.keyStorePassword=<Keystore Password> -Djavax.net.ssl.trustStore=</path/to/truststore.jks/file> -Djavax.net.ssl.trustStorePassword=<Truststore Password>"

__

Change the values of <> with appropriate values.

Zookeeper

  1. Log into the Ambari Admin Web UI.
  2. Navigate to the: Zookeeper > Configs > Advanced zookeeper-env > zookeeper-env template.
  3. Go to the end of the file and add the following line:

Before including any of the following lines, ensure to add the JMXDISABLE environment variable first.

__

Bash__ +

export JMXDISABLE="true"

__

__

Bash__ +

export SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=8989 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dzookeeper.jmx.log4j.disable=true"

__

  1. To enable Basic Authentication in the JMX Remote Port, use the following parameters:

__

Bash__ +

export SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=8989 -Dzookeeper.jmx.log4j.disable=true -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file>"

__

Change the values of <> with appropriate values.

  1. To enable SSL on the JMX Remote Port, use the following parameters:

__

Bash__ +

export SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=8989 -Dzookeeper.jmx.log4j.disable=true -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=true -Dcom.sun.management.jmxremote.registry.ssl=true -Djavax.net.ssl.keyStore=</path/to/keystore.jks/file> -Djavax.net.ssl.keyStorePassword=<Keystore Password> -Djavax.net.ssl.trustStore=</path/to/truststore.jks/file> -Djavax.net.ssl.trustStorePassword=<Truststore Password>"

__

Change the values of <> with appropriate values.

Spark

  1. Login to the Ambari Admin Web UI.
  2. Navigate to the: Spark > Configs > Advanced spark2-metrics-properties.
  3. Go to the end of the file and add the following lines:

__

Bash__ +

# Graphite sink class

*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink

# Location of your graphite instance

*.sink.graphite.host=localhost

*.sink.graphite.port=12003

*.sink.graphite.protocol=tcp

*.sink.graphite.prefix=spark.metrics

*.sink.graphite.period=20

master.source.jvm.class=org.apache.spark.metrics.source.JvmSource

worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource

driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource

executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource

__

__

This change requires Pulse node agent running on all spark clients

__

Additional Note:

  • In case of edge nodes where spark clients are not being managed by Ambari append above properties to file /etc/spark2/conf/metrics.properties

  • Make sure to have following properties enabled for any spark job (spark-defaults.conf):

    • spark.eventLog.enabled=true
    • spark.eventLog.dir=hdfs:///spark2-history/
  • Update same properties in managed configurations for applications running on Spark 1.x and 3.x

Hive

Hive Server 2

  1. Login to the Ambari Admin Web UI.
  2. Navigate to: Hive > Configs > Advanced hive-env > hive-env template.
  3. Go to the end of the file add the following lines:

__

Avoid JMX changes for Hive 1.x using MR engine as it has bug that causes query failure with JMX enablement.

__

__

Bash__ +

if [ "$SERVICE" = "hiveserver2" ]; then

export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8008"

fi

__

  1. To enable Basic Authentication in the JMX Remote Port, use the following parameters:

__

Bash__ +

if [ "$SERVICE" = "hiveserver2" ]; then

export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8008"

fi

__

Change the values of <> with appropriate values.

  1. To enable TLS/SSL on the JMX Remote Port, use the following parameters:

__

Bash__ +

if [ "$SERVICE" = "hiveserver2" ]; then

export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=true -Dcom.sun.management.jmxremote.registry.ssl=true -Djavax.net.ssl.keyStore=</path/to/keystore.jks/file> -Djavax.net.ssl.keyStorePassword=<Keystore Password> -Djavax.net.ssl.trustStore=</path/to/truststore.jks/file> -Djavax.net.ssl.trustStorePassword=<Truststore Password> -Dcom.sun.management.jmxremote.port=8008"

fi

__

Change the values of <> with appropriate values.

Hive Meta Store

  1. Login to the Ambari Admin Web UI.
  2. Navigate to: Hive > Configs > Advanced hive-env > hive-env template.
  3. Go to the end of the file add the following lines:

__

Bash__ +

if [ "$SERVICE" = "metastore" ]; then

export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8009"

fi

__

  1. To enable Basic Authentication in the JMX Remote Port, use the following parameters:

__

Bash__ +

if [ "$SERVICE" = "metastore" ]; then

export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8008"

fi

__

Change the values of <> with appropriate values.

  1. To enable TLS/SSL on the JMX Remote Port, use the following parameters:

__

Bash__ +

if [ "$SERVICE" = "metastore" ]; then

export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=true -Dcom.sun.management.jmxremote.registry.ssl=true -Djavax.net.ssl.keyStore=</path/to/keystore.jks/file> -Djavax.net.ssl.keyStorePassword=<Keystore Password> -Djavax.net.ssl.trustStore=</path/to/truststore.jks/file> -Djavax.net.ssl.trustStorePassword=<Truststore Password> -Dcom.sun.management.jmxremote.port=8008"

fi

__

Change the values of <> with appropriate values.

Place Hook Jars

Distro Version Hive Version Tez Version Pulse Hook Jar NameHDP 2.x1.2.x0.7.xad-hive-hook__hdp__1.2.x-assembly-1.2.3.jarHDP 2.x2.1.x (LLAP)0.7.xad-hive-hook__hdp__2.1.x-assembly-1.2.3.jarHDP 3.1.0.x3.1.x0.9.xad-hive-hook__hdp__3.1.0.3.1.0.0-78-assembly-1.2.3.jarHDP 3.1.4.x3.1.x0.9.xad-hive-hook__hdp__3.1.0.3.1.4.0-315-assembly-1.2.3.jar __

For above Hive versions, we use hive hooks to capture query statistics, and it requires below config changes:

  • Get hive-hook jars (Acceldata team will share) as mentioned in above tables
  • Place the provided hook jars on all edge, Hiveserver2 and Hive interactive nodes on local path /opt/acceldata
  • Hook directory should be readable and executable by all users
  • Login to the Ambari Admin Web UI. Navigate to: Hive > Configs > Advanced hive-env, go to the end of the file add the following lines:

__

Please change the hook jar name in below properties according to installed HDP distro version

__

__

Bash__ +

export AUX_CLASSPATH=/opt/acceldata/ad-hive-hook_hdp_1.2.x-assembly-1.2.3.jar

__

  • Navigate to: Hive > Configs > Advanced hive-interactive-env, go to the end of the file add the following lines:

__

Bash__ +

export AUX_CLASSPATH=/opt/acceldata/ad-hive-hook_hdp_2.1.x-assembly-1.2.3.jar

__

  • Navigate to: Hive > Configs > Custom hive-site and Custom hive-interactive-site and add below new property values:

    • ad.events.streaming.servers=(<Pulse IP>:19009)
    • ad.cluster=(cluster name as specified in Pulse installation)
  • Navigate to: Hive > Configs > General, append io.acceldata.hive.AdHiveHook with comma(if needed) for the following properties:

    • hive.exec.failure.hooks
    • hive.exec.pre.hooks
    • hive.exec.post.hooks

Tez

  • Get same hive-hook jars (Acceldata team will share) as mentioned in above mapping table

  • Login to any HDFS client node and follow below steps to add Pulse hook jar inside Tez tar

  • In case of Hive 3.x for HDP 3.x, use below locations to update respective hook jars for the version, example here for HDP 3.1.4 hook jar:

    • Use ad-hive-hook_hdp_3.1.0.3.1.4.0-315-assembly-1.2.3.jar for Hive 3.x on HDFS path /hdp/apps/${hdp.version}/tez/tez.tar.gz
  • In case of both Hive 1.x and Hive 2.x (LLAP) such as HDP 2.6.x, use below locations to update respective hook jars for the version:

    • Use ad-hive-hook_hdp_1.2.x-assembly-1.2.3.jar for Hive 1.x on HDFS path /hdp/apps/${hdp.version}/tez/tez.tar.gz
    • Use ad-hive-hook_hdp_2.1.x-assembly-1.2.3.jar for Hive 2.x on HDFS path /hdp/apps/${hdp.version}/tez_hive2/tez.tar.gz

__

Bash__ +

# Create a directory

mkdir -p tez_pack/ && cd tez_pack

# Take backup of existing tez tarball in HDFS /tmp

hdfs dfs -cp /hdp/apps/<cluster_version>/tez/tez.tar.gz /tmp

# Download tez tarball from HDFS to local, switch to accesible user

hdfs dfs -get /hdp/apps/<cluster_version>/tez/tez.tar.gz .

# Unpack the tarball

tar -zxvf tez.tar.gz

# Copy Pulse hook jar to tez libs/

cp </location../../pulse_hook.jar> ./lib/

# Package tez tarball

tar -cvzf /tmp/tez.tar.gz .

# Upload back and provide right permissions and ownership

hdfs dfs -put -f /tmp/tez.tar.gz /hdp/apps/<cluster_version>/tez/tez.tar.gz

hdfs dfs -chown hdfs:hadoop /hdp/apps/<cluster_version>/tez/tez.tar.gz

hdfs dfs -chmod 755 /hdp/apps/<cluster_version>/tez/tez.tar.gz

__

  • Navigate to: Tez > Configs > Custom tez-site and add/update below property values:
    • tez.history.logging.service.class=io.acceldata.hive.AdTezEventsNatsClient
    • ad.events.streaming.servers (PULSE_IP:19009)
    • ad.cluster (your cluster name, ex: ad_hdp3_dev)
    • [Optional Step for Hive 3.x] ad.hdfs.sink is by default set to true, if false then TEZ will not publish query metadata proto logging details to HDFS

Sqoop

Copy and place the above specified hook jars on sqoop classpath directory (for example: /usr/hdp/current/sqoop-client/lib ). For LLAP (hive interactive) enabled cluster copy both Hive v1.2.x and 2.1.x jars on the classpath.

ODP 3.2.x

All Ambari changes will be available as part of release including hook jars and JMX changes except for few components, please validate following details once as part of general checks:

ODP Kafka

  1. Login to the Ambari Admin Web UI.
  2. Navigate to the: Kafka > Configs > Advanced kafka-env > kafka-env.
  3. Go to the end of the file and add the following line:

__

Bash__ +

export JMX_PORT=${JMX_PORT:-9999}

__

  1. To enable Basic Authentication in the JMX Remote Port, use the following parameters:

__

Bash__ +

export KAFKA_JMX_OPTS="$KAFKA_JMX_OPTS -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file>"

__

Change the values of <> with appropriate values.

  1. To enable TLS/SSL on the JMX Remote Port, use the following parameters:

__

Bash__ +

export KAFKA_JMX_OPTS="$KAFKA_JMX_OPTS -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=true -Dcom.sun.management.jmxremote.registry.ssl=true -Djavax.net.ssl.keyStore=</path/to/keystore.jks/file> -Djavax.net.ssl.keyStorePassword=<Keystore Password> -Djavax.net.ssl.trustStore=</path/to/truststore.jks/file> -Djavax.net.ssl.trustStorePassword=<Truststore Password>"

__

Change the values of <> with appropriate values.

ODP Zookeeper

  1. Login to the Ambari Admin Web UI.
  2. Navigate to the: Zookeeper > Configs > Advanced zookeeper-env > zookeeper-env template.
  3. Go to the end of the file and add the following line:

Before including any of the following lines, ensure to add the JMXDISABLE environment variable first.

__

Bash__ +

export JMXDISABLE="true"

__

__

Bash__ +

export SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=8989 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dzookeeper.jmx.log4j.disable=true"

__

  1. To enable Basic Authentication in the JMX Remote Port, use the following parameters:

__

Bash__ +

export SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=8989 -Dzookeeper.jmx.log4j.disable=true -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file>"

__

Change the values of <> with appropriate values.

  1. To enable SSL on the JMX Remote Port, use the following parameters:

__

Bash__ +

export SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=8989 -Dzookeeper.jmx.log4j.disable=true -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=true -Dcom.sun.management.jmxremote.registry.ssl=true -Djavax.net.ssl.keyStore=</path/to/keystore.jks/file> -Djavax.net.ssl.keyStorePassword=<Keystore Password> -Djavax.net.ssl.trustStore=</path/to/truststore.jks/file> -Djavax.net.ssl.trustStorePassword=<Truststore Password>"

__

Change the values of <> with appropriate values.

ODP Hive

__

To see the Hive table details with data on the UI, ensure to set the boolean value for hive.stats.autogather and hive.stats.column.autogather in the hive-site.xml file for it to compute the data automatically.

You can also run the following command manually to compute the table analysis.

ANALYZE TABLE <table name> COMPUTE STATISTICS

__

Hive Server 2

  1. Login to the Ambari Admin Web UI.
  2. Navigate to: Hive > Configs > Advanced hive-env > hive-env template.
  3. Go to the end of the file add the following lines:

__

Avoid JMX changes for Hive 1.x using MR engine as it has bug that causes query failure with JMX enablement.

__

__

Bash__ +

if [ "$SERVICE" = "hiveserver2" ]; then

export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8008"

fi

__

  1. To enable Basic Authentication in the JMX Remote Port, use the following parameters:

__

Bash__ +

if [ "$SERVICE" = "hiveserver2" ]; then

export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8008"

fi

__

Change the values of <> with appropriate values.

  1. To enable TLS/SSL on the JMX Remote Port, use the following parameters:

__

Bash__ +

if [ "$SERVICE" = "hiveserver2" ]; then

export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=true -Dcom.sun.management.jmxremote.registry.ssl=true -Djavax.net.ssl.keyStore=</path/to/keystore.jks/file> -Djavax.net.ssl.keyStorePassword=<Keystore Password> -Djavax.net.ssl.trustStore=</path/to/truststore.jks/file> -Djavax.net.ssl.trustStorePassword=<Truststore Password> -Dcom.sun.management.jmxremote.port=8008"

fi

__

Change the values of <> with appropriate values.

Hive Meta Store

  1. Login to the Ambari Admin Web UI.
  2. Navigate to: Hive > Configs > Advanced hive-env > hive-env template.
  3. Go to the end of the file add the following lines:

__

Bash__ +

if [ "$SERVICE" = "metastore" ]; then

export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8009"

fi

__

  1. To enable Basic Authentication in the JMX Remote Port, use the following parameters:

__

Bash__ +

if [ "$SERVICE" = "metastore" ]; then

export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8008"

fi

__

Change the values of <> with appropriate values.

  1. To enable TLS/SSL on the JMX Remote Port, use the following parameters:

__

Bash__ +

if [ "$SERVICE" = "metastore" ]; then

export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.access.file=</path/to/jmxremote.access/file> -Dcom.sun.management.jmxremote.password.file=</path/to/jmxremote.password/file> -Dcom.sun.management.jmxremote.ssl=true -Dcom.sun.management.jmxremote.registry.ssl=true -Djavax.net.ssl.keyStore=</path/to/keystore.jks/file> -Djavax.net.ssl.keyStorePassword=<Keystore Password> -Djavax.net.ssl.trustStore=</path/to/truststore.jks/file> -Djavax.net.ssl.trustStorePassword=<Truststore Password> -Dcom.sun.management.jmxremote.port=8008"

fi

__

Change the values of <> with appropriate values.

Place Hook Jars

Pulse hook JARs are included in the installation package. Additional configuration changes are required as described below.

  • JMX enablement should already be in place similar to HDP changes.

  • Update below properties as per the installation details:

    • ad.events.streaming.servers=(<Pulse IP>:19009)
    • ad.cluster=(cluster name as specified in Pulse installation)
  • Navigate to: Hive > Configs > General, check if io.acceldata.hive.AdHiveHook is appended with comma under following properties:

    • hive.exec.failure.hooks
    • hive.exec.pre.hooks
    • hive.exec.post.hooks

ODP Tez

Pulse hook JARs are included in the installation package. Additional configuration changes are required as described below.

  • Update below properties as per the installation details:

    • ad.events.streaming.servers=(<Pulse IP>:19009)
    • ad.cluster=(cluster name as specified in Pulse installation)
  • Navigate to: Tez > Configs check if property tez.history.logging.service.class is configured to io.acceldata.hive.AdTezEventsNatsClient

ODP Spark 2 & 3

  1. Login to the Ambari Admin Web UI.
  2. Navigate to the: Spark > Configs > Advanced spark2-metrics-properties.
  3. Go to the end of the file and add the following lines:

__

Bash__ +

# Graphite sink class

*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink

# Location of your graphite instance

*.sink.graphite.host=localhost

*.sink.graphite.port=12003

*.sink.graphite.protocol=tcp

*.sink.graphite.prefix=spark.metrics

*.sink.graphite.period=20

master.source.jvm.class=org.apache.spark.metrics.source.JvmSource

worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource

driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource

executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource

__

Additional Config changes (common for HDP & ODP)

Additional configuration changes are required if ACL is enabled at services or running an old service versions:

  • YARN ACL- Check if ACL enabled for YARN (yarn.acl.enable), if yes add this property yarn.timeline-service.read.allowed.users=hdfs in custom yarn-site.xml. Restart Yarn service. Here hdfs is the default user being used and shared by the team, enter other specific users created for Pulse.
  • Kafka Protocol- Modify “PLAINTEXTSASL___ _**** _** _ __** _** _ __** _** _ __** _** _ __** _** _ _ **_ **Listeners and Interbroker protocol on Ambari > Kafka to _SASL_PLAINTEXT**_ ,also check for listeners and update it to SASL_PLAINTEXT://localhost:6667
  • Kafka ACL- Allow Kafka ACL(s) permission to hdfs user, run below command using kafka user:

__

Bash__ +

/usr/hdp/current/kafka-broker/bin/kafka-acls.sh --authorizer-properties zookeeper.connect=<zk_hostname>:2181 --add --allow-principal User:hdfs --operation All --topic '*' --cluster

/usr/hdp/current/kafka-broker/bin/kafka-acls.sh --authorizer-properties zookeeper.connect=<zk_hostname>:2181 --add --allow-principal User:hdfs --operation All --group '*' --cluster

__

CDH 5.x & 6.x

Kafka and Zookeeper JMX are auto-enabled with CDH-based installation.

CDH Spark

  1. Under the Spark configurations search for (Safety Valve) for spark-conf/spark-defaults.conf .
  2. Add the following properties:

__

Bash__ +

spark.metrics.conf.*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink

spark.metrics.conf.*.sink.graphite.host=localhost

spark.metrics.conf.*.sink.graphite.port=12003

spark.metrics.conf.*.sink.graphite.protocol=tcp

spark.metrics.conf.*.sink.graphite.prefix=spark.metrics

spark.metrics.conf.*.sink.graphite.period=20

spark.metrics.conf.*.source.jvm.class=org.apache.spark.metrics.source.JvmSource

__

  1. Repeat the same steps for Spark2 configurations search for Spark Client Advanced Configuration Snippet (Safety Valve) for spark2-conf/spark-defaults.conf and add the preceding properties.

__

Refer to the special additional note in the HDP 2.x & 3.x section under spark configuration changes.

__

CDH Hive

Distro Version Hive Version Pulse Hook Jar NameCDH 6.2.x2.1.xad-hive-hook__2.1.1__cdh6.2.1-assembly-1.2.3.jarCDH 6.3.42.1.xad-hive-hook__cdh__3.0.0-assembly-1.2.3.jar __

For above Hive versions, we use hive hooks to capture query statistics, and it requires below config changes:

  • Get hive-hook jars (Acceldata team will share) as mentioned in above tables
  • Place the provided hook jars on all edge, Hiveserver2 on local path /opt/acceldata
  • Hook directory should be readable and executable by all users
  1. Under Hive, configurations search for the Gateway Client Environment Advanced Configuration Snippet (Safety Valve) for hive-env.sh and add the following property:

__

Bash__ +

AUX_CLASSPATH=${AUX_CLASSPATH}:/opt/acceldata/<AD HIVE 1.x or HIVE 2.x hook jar name>

__

  1. Under Hive, configurations search for Hive Client Advanced Configuration Snippet (Safety Valve) for hive-site.xml & HiveServer2 Advanced Configuration Snippet (Safety Valve) for hive-site.xml, change view as xml _and add the following properties:

__

XML__ +

<property>

<name>hive.exec.failure.hooks</name>

<value>io.acceldata.hive.AdHiveHook</value>

<description>for Acceldata APM</description>

</property>

<property>

<name>hive.exec.post.hooks</name>

<value>io.acceldata.hive.AdHiveHook</value>

<description>for Acceldata APM</description>

</property>

<property>

<name>hive.exec.pre.hooks</name>

<value>io.acceldata.hive.AdHiveHook</value>

<description>for Acceldata APM</description>

</property>

__

  1. Add below new properties under Advanced hive-site xml section:

    1. ad.events.streaming.servers=(<Pulse IP>:19009)
    2. ad.cluster=(cluster name as specified in Pulse installation)
  2. Restart the affected Hive Components and deploy the new client configuration

CDH Sqoop

Place the hook jar in classpath libraries of Sqoop client on given edge nodes.

CDP 7.x

Kafka and Zookeeper JMX are auto enabled with CDH based installation. Make the same changes for Spark as mentioned for Spark in CDH section.

CDP Kafka

Under Additional Broker Java Options/broker_java_opts, replace the

-Dcom.sun.management.jmxremote.host=127.0.0.1with -Dcom.sun.management.jmxremote.host=0.0.0.0

CDP Hive

__

To see the Hive table details with data on the UI, ensure to set the boolean value for hive.stats.autogather and hive.stats.column.autogather in the hive-site.xml file for it to compute the data automatically.

You can also run the following command manually to compute the table analysis.

ANALYZE TABLE <table name> COMPUTE STATISTICS

__

Under the Hive -> Java Configuration Options for Hive Metastore Server, update the property with following value:

__

Bash__ +

{{JAVA_GC_ARGS}} -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8009

__

Under Hive on Tez -> Java Configuration Options for HiveServer2, update the property with following value:

__

Bash__ +

{{JAVA_GC_ARGS}} -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8008

__

Distro Version Hive Version Tez Version Pulse Hook Jar NameCDP3.1.30.9.1ad-hive-hook_cdp_ 3.1.3-assembly-1.2.3.jar __

For above Hive versions, we use hive hooks to capture query statistics, and it requires below config changes:

  • Get hive-hook jars (Acceldata team will share) as mentioned in above tables
  • Place the provided hook jars on all edge, Hiveserver2 on local path /opt/acceldata
  • Hook directory should be readable and executable by all users
  1. Under component Hive , search configuration Hive Service Environment Advanced Configuration Snippet (Safety Valve) , and under component Hive on Tez, search configuration Hive on Tez Service Environment Advanced Configuration Snippet (Safety Valve) and add the following property:

__

Bash__ +

AUX_CLASSPATH=${AUX_CLASSPATH}:/opt/acceldata/ad-hive-hook_cdp_3.1.3-assembly-1.2.3.jar

__

  1. Under component Hive, search configuration for Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml , and under component Hive on Tez, search configuration for Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml , change view as xml and add the following properties:

__

XML__ +

<property>

<name>ad.cluster</name>

<value>[cluster_name]</value>

</property>

<property>

<name>ad.events.streaming.servers</name>

<value>[PULSE_IP]:19009</value>

</property>

<property>

<name>hive.exec.failure.hooks</name>

<value>io.acceldata.hive.AdHiveHook</value>

<description>for Acceldata APM</description>

</property>

<property>

<name>hive.exec.post.hooks</name>

<value>io.acceldata.hive.AdHiveHook</value>

<description>for Acceldata APM</description>

</property>

<property>

<name>hive.exec.pre.hooks</name>

<value>io.acceldata.hive.AdHiveHook</value>

<description>for Acceldata APM</description>

</property>

__

  1. Restart the affected Hive Components and deploy the new client configuration

CDP TEZ

  • Get hive-hook jars (Acceldata team will share) as mentioned in above tables
  • Login to any HDFS client node and follow below steps to add Pulse hook jar inside Tez tar:

__

Avoid clicking action available under Tez "Upload Tez tar file to HDFS"

__

__

Bash__ +

# Create a directory

mkdir -p tez_pack/ && cd tez_pack

# Take backup of existing tez tarball in HDFS /tmp

hdfs dfs -cp /user/tez/<tez_version>/tez.tar.gz /tmp

# Download tez tarball from HDFS to local, switch to accesible user

hdfs dfs -get /user/tez/<tez_version>/tez.tar.gz .

# Unpack the tarball

tar -zxvf tez.tar.gz

# Copy Pulse hook jar to tez libs/

cp </location../../pulse_hook.jar> ./lib/

# Package tez tarball

tar -cvzf /tmp/tez.tar.gz .

# Upload back and provide right permissions and ownership

hdfs dfs -put -f /tmp/tez.tar.gz /user/tez/<tez_version>/tez.tar.gz

hdfs dfs -chown tez:hadoop /user/tez/<tez_version>/tez.tar.gz

hdfs dfs -chmod 755 /user/tez/<tez_version>/tez.tar.gz

__

  1. Under component Tez, search configuration for Tez Client Advanced Configuration Snippet (Safety Valve) for tez-conf/tez-site.xml , change view as xml and add the following properties:

__

XML__ +

<property>

<name>ad.cluster</name>

<value>[cluster_name]</value>

</property>

<property>

<name>ad.events.streaming.servers</name>

<value>[PULSE_IP]:19009</value>

</property>

<property>

<name>tez.history.logging.service.class</name>

<value>io.acceldata.hive.AdTezEventsNatsClient</value>

</property>

__

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard