Cluster Configuration Changes

Cluster setup adjustments are a necessity for multiple Pulse connectors and services. Maintenance restarts can be conducted prior to, during, or after Pulse installation to implement configuration changes.

To activate SSL and Basic Authentication on the remote JMX port, ensure that the jmxremote.password, jmxremote.access, truststore.jks, and keystore.jks files are already positioned in their correct directories.

Common Changes

HDFS

In the case of HDFS, if users are not permitted or do not have access to the Pulse server to request the Namenode API, the following options are available:

  1. Under HDFS configurations add property dfs.cluster.administrators in advanced or custom hdfs-site.xml with values such as Pulse Kerberos username.
  2. Use and provide one namenode service keytab to Pulse server
  3. Restart all the affected components and deploy the new client configuration.

MapReduce

Configure ODP as below to show MapReduce jobs in YARN > Application Explorer.

  • Add the HDFS user to the properties listed below in Ambari > MapReduce configuration.

    • mapreduce.cluster.administrators
    • MapReduce.cluster.acls.enabled (By default, it’s enabled)
    • mapreduce.job.acl-modify-job
    • mapreduce.job.acl-view-job user
  • Add the HDFS user to the property listed below in Ambari > YARN configuration.

    • Yarn.admin.acl

After completing the configuration, you need to restart the ad-connector service on Pulse Master.

Access privileges

For services managed by Ranger or by other authorization and requires permission privileges for non-HDFS user, follow the steps:

In case of a non-HDFS users, create a policy and provide read and executable permissions to the following HDFS path(s):

  • Spark v1, 2 and3 log directories, the following are the default locations, check on the respective cluster:

    • HDP 2.x - /spark2-history
    • CDH, CDP - /user/spark/applicationHistory, /user/spark/spark2ApplicationHistory
  • Hive Query path, below are default locations, please check on the respective cluster:

    • HDP 2.x, CDH 5.x or 6.x - /tmp/ad
    • HDP 3.x - /warehouse/tablespace/external/hive/sys.db/dag_data, /warehouse/tablespace/external/hive/sys.db/query_data
    • CDP 7.x - /warehouse/tablespace/managed/hive/sys.db/dag_data, /warehouse/tablespace/managed/hive/sys.db/query_data
  • In the case of a HDFS user being used to connect to Kafka service or any other user that does not have all the privileges for reading metadata information from all Kafka topics:

    • Add user to the default access policy for Describe permissions under all topics
  • Add the SELECT privileges in Ranger to all databases, tables, and columns using the below steps:

    1. Log on to the Ranger UI.
    2. Navigate Hadoop SQL and click on it. The list of Hadoop SQL policies appear.
    3. On the List of Hadoop policies: Hadoop SQL, click on the edit button in Action for the all - databases, tables, and column Policy Name.
    4. On the Edit Policy page, add the SELECT privileges to the non-HDFS user.

HDP 2.x & 3.x

Kafka

  1. Login to the Ambari Admin Web UI.
  2. Navigate to the: Kafka > Configs > Advanced kafka-env > kafka-env.
  3. Go to the end of the file and add the following line:
Bash
Copy
  1. To enable Basic Authentication in the JMX Remote Port, use the following parameters:
Bash
Copy

Change the values of <> with appropriate values.

  1. To enable TLS/SSL on the JMX Remote Port, use the following parameters:
Bash
Copy

Change the values of <> with appropriate values.

Kafka 3

  1. Login to the Ambari Admin Web UI.
  2. Navigate to the: Kafka > Configs > Advanced kafka3-env > kafka3-env.
  3. Go to the end of the file and add the following line:

For Kafka 3 with Zookeeper:

Bash
Copy

For Kafka 3 with KRaft:

Bash
Copy
  1. To enable Basic Authentication in the JMX Remote Port, use the following parameters:
Bash
Copy

Change the values of <> with appropriate values.

  1. To enable TLS/SSL on the JMX Remote Port, use the following parameters:
Bash
Copy

Change the values of <> with appropriate values.

Set ACLs

To set ACLs, run the following commands.

Bash
Copy
Bash
Copy

Enable Scram-based authentication

Follow the steps to enable scram-based authentication for Kafka 3.

  1. Navigate to /data01/acceldata/config/docker/addons/kafka3-connector.yml.
  2. Update the kafka3-connector.yml file with the following properties.
Bash
Copy
  • Update auth.type to "SCRAM"
  • Set sasl.mechanism to "SCRAM-SHA-256" or "SCRAM-SHA-512"
  • Ensure jaas.login.conf points to the correct JAAS configuration file path.

Zookeeper

  1. Log into the Ambari Admin Web UI.
  2. Navigate to the: Zookeeper > Configs > Advanced zookeeper-env > zookeeper-env template.
  3. Go to the end of the file and add the following line:

Before including any of the following lines, ensure to add the JMXDISABLE environment variable first.

Bash
Copy
Bash
Copy
  1. To enable Basic Authentication in the JMX Remote Port, use the following parameters:
Bash
Copy

Change the values of <> with appropriate values.

  1. To enable SSL on the JMX Remote Port, use the following parameters:
Bash
Copy

Change the values of <> with appropriate values.

Spark

  1. Login to the Ambari Admin Web UI.
  2. Navigate to the: Spark > Configs > Advanced spark2-metrics-properties.
  3. Go to the end of the file and add the following lines:
Bash
Copy

This change requires Pulse node agent running on all spark clients

Additional Note:

  • In case of edge nodes where spark clients are not being managed by Ambari append above properties to file /etc/spark2/conf/metrics.properties

  • Make sure to have following properties enabled for any spark job (spark-defaults.conf):

    • spark.eventLog.enabled=true
    • spark.eventLog.dir=hdfs:///spark2-history/
  • Update same properties in managed configurations for applications running on Spark 1.x and 3.x

Hive

Hive Server 2

  1. Login to the Ambari Admin Web UI.
  2. Navigate to: Hive > Configs > Advanced hive-env > hive-env template.
  3. Go to the end of the file and add the following lines:

Avoid JMX changes for Hive 1.x using MR engine as it has bug that causes query failure with JMX enablement.

Bash
Copy
  1. To enable Basic Authentication in the JMX Remote Port, use the following parameters:
Bash
Copy

Change the values of <> with appropriate values.

  1. To enable TLS/SSL on the JMX Remote Port, use the following parameters:
Bash
Copy

Change the values of <> with appropriate values.

Hive Meta Store

  1. Login to the Ambari Admin Web UI.
  2. Navigate to: Hive > Configs > Advanced hive-env > hive-env template.
  3. Go to the end of the file add the following lines:
Bash
Copy
  1. To enable Basic Authentication in the JMX Remote Port, use the following parameters:
Bash
Copy

Change the values of <> with appropriate values.

  1. To enable TLS/SSL on the JMX Remote Port, use the following parameters:
Bash
Copy

Change the values of <> with appropriate values.

Place Hook Jars

Distro VersionHive VersionTez VersionPulse Hook Jar Name
HDP 2.x1.2.x0.7.xad-hive-hook__hdp__1.2.x-assembly-1.2.3.jar
HDP 2.x2.1.x (LLAP)0.7.xad-hive-hook__hdp__2.1.x-assembly-1.2.3.jar
HDP 3.1.0.x3.1.x0.9.xad-hive-hook__hdp__3.1.0.3.1.0.0-78-assembly-1.2.3.jar
HDP 3.1.4.x3.1.x0.9.xad-hive-hook__hdp__3.1.0.3.1.4.0-315-assembly-1.2.3.jar

For above Hive versions, we use hive hooks to capture query statistics, and it requires below config changes:

  • Get hive-hook jars (Acceldata team will share) as mentioned in above tables
  • Place the provided hook jars on all edge, HiveServer 2, and Hive interactive nodes on local path /opt/acceldata.
  • Hook directory should be readable and executable by all users
  • Login to the Ambari Admin Web UI. Navigate to: Hive > Configs > Advanced hive-env, go to the end of the file and add the following lines:

Please change the hook jar name in below properties according to installed HDP distro version

Bash
Copy
  • Navigate to: Hive > Configs > Advanced hive-interactive-env, go to the end of the file and add the following lines:
Bash
Copy
  • Navigate to: Hive > Configs > Custom hive-site and Custom hive-interactive-site and add below new property values:

    • ad.events.streaming.servers=(<Pulse IP>:19009)
    • ad.cluster=(cluster name as specified in Pulse installation)
  • Navigate to: Hive > Configs > General, append io.acceldata.hive.AdHiveHook with comma(if needed) for the following properties:

    • hive.exec.failure.hooks
    • hive.exec.pre.hooks
    • hive.exec.post.hooks

Tez

  • Get same hive-hook jars (Acceldata team will share) as mentioned in above mapping table

  • Login to any HDFS client node and follow below steps to add Pulse hook jar inside Tez tar

  • In case of Hive 3.x for HDP 3.x, use below locations to update respective hook jars for the version, example here for HDP 3.1.4 hook jar:

    • Use ad-hive-hook_hdp_3.1.0.3.1.4.0-315-assembly-1.2.3.jar for Hive 3.x on HDFS path /hdp/apps/${hdp.version}/tez/tez.tar.gz
  • In case of both Hive 1.x and Hive 2.x (LLAP) such as HDP 2.6.x, use below locations to update respective hook jars for the version:

    • Use ad-hive-hook_hdp_1.2.x-assembly-1.2.3.jar for Hive 1.x on HDFS path /hdp/apps/${hdp.version}/tez/tez.tar.gz
    • Use ad-hive-hook_hdp_2.1.x-assembly-1.2.3.jar for Hive 2.x on HDFS path /hdp/apps/${hdp.version}/tez_hive2/tez.tar.gz
Bash
Copy
  • Navigate to: Tez > Configs > Custom tez-site and add/update below property values:
    • tez.history.logging.service.class=io.acceldata.hive.AdTezEventsNatsClient
    • ad.events.streaming.servers (PULSE_IP:19009)
    • ad.cluster (your cluster name, ex: ad_hdp3_dev)
    • [Optional Step for Hive 3.x] ad.hdfs.sink is by default set to true, if false then TEZ will not publish query metadata proto logging details to HDFS

Sqoop

Copy and place the above specified hook jars on sqoop classpath directory (for example: /usr/hdp/current/sqoop-client/lib ). For LLAP (hive interactive) enabled cluster copy both Hive v1.2.x and 2.1.x jars on the classpath.

ODP 3.2.x

All Ambari changes will be available as part of release including hook jars and JMX changes except for few components, please validate following details once as part of general checks:

ODP Kafka

  1. Login to the Ambari Admin Web UI.
  2. Navigate to the: Kafka > Configs > Advanced kafka-env > kafka-env.
  3. Go to the end of the file and add the following line:
Bash
Copy
  1. To enable Basic Authentication in the JMX Remote Port, use the following parameters:
Bash
Copy

Change the values of <> with appropriate values.

  1. To enable TLS/SSL on the JMX Remote Port, use the following parameters:
Bash
Copy

Change the values of <> with appropriate values.

ODP Kafka 3

  1. Login to the Ambari Admin Web UI.
  2. Navigate to the: Kafka > Configs > Advanced kafka3-env > kafka3-env.
  3. Go to the end of the file and add the following line:

For Kafka 3 with Zookeeper:

Bash
Copy

For Kafka 3 with KRaft:

Bash
Copy
  1. To enable Basic Authentication in the JMX Remote Port, use the following parameters:
Bash
Copy

Change the values of <> with appropriate values.

  1. To enable TLS/SSL on the JMX Remote Port, use the following parameters:
Bash
Copy

Change the values of <> with appropriate values.

Set ACLs

To set ACLs, run the following commands.

Bash
Copy
Bash
Copy

Enable Scram-based authentication

Follow the steps to enable scram-based authentication for Kafka 3.

  1. Navigate to /data01/acceldata/config/docker/addons/kafka3-connector.yml.
  2. Update the kafka3-connector.yml file with the following properties.
Bash
Copy
  • Update auth.type to "SCRAM"
  • Set sasl.mechanism to "SCRAM-SHA-256" or "SCRAM-SHA-512"
  • Ensure jaas.login.conf points to the correct JAAS configuration file path.

ODP Zookeeper

  1. Login to the Ambari Admin Web UI.
  2. Navigate to the: Zookeeper > Configs > Advanced zookeeper-env > zookeeper-env template.
  3. Go to the end of the file and add the following line:

Before including any of the following lines, ensure to add the JMXDISABLE environment variable first.

Bash
Copy
Bash
Copy
  1. To enable Basic Authentication in the JMX Remote Port, use the following parameters:
Bash
Copy

Change the values of <> with appropriate values.

  1. To enable SSL on the JMX Remote Port, use the following parameters:
Bash
Copy

Change the values of <> with appropriate values.

ODP Hive

To see the Hive table details with data on the UI, ensure to set the boolean value for hive.stats.autogather and hive.stats.column.autogather in the hive-site.xml file for it to compute the data automatically.

You can also run the following command manually to compute the table analysis.

ANALYZE TABLE <table name> COMPUTE STATISTICS

Hive Server 2

  1. Login to the Ambari Admin Web UI.
  2. Navigate to: Hive > Configs > Advanced hive-env > hive-env template.
  3. Go to the end of the file add the following lines:

Avoid JMX changes for Hive 1.x using MR engine as it has bug that causes query failure with JMX enablement.

Bash
Copy
  1. To enable Basic Authentication in the JMX Remote Port, use the following parameters:
Bash
Copy

Change the values of <> with appropriate values.

  1. To enable TLS/SSL on the JMX Remote Port, use the following parameters:
Bash
Copy

Change the values of <> with appropriate values.

Hive Meta Store

  1. Login to the Ambari Admin Web UI.
  2. Navigate to: Hive > Configs > Advanced hive-env > hive-env template.
  3. Go to the end of the file add the following lines:
Bash
Copy
  1. To enable Basic Authentication in the JMX Remote Port, use the following parameters:
Bash
Copy

Change the values of <> with appropriate values.

  1. To enable TLS/SSL on the JMX Remote Port, use the following parameters:
Bash
Copy

Change the values of <> with appropriate values.

Place Hook Jars

Pulse hook JARs are included in the installation package. Additional configuration changes are required as described below.

  • JMX enablement should already be in place similar to HDP changes.

  • Update below properties as per the installation details:

    • ad.events.streaming.servers=(<Pulse IP>:19009)
    • ad.cluster=(cluster name as specified in Pulse installation)
  • Navigate to: Hive > Configs > General, check if io.acceldata.hive.AdHiveHook is appended with comma under following properties:

    • hive.exec.failure.hooks
    • hive.exec.pre.hooks
    • hive.exec.post.hooks

ODP Tez

Pulse hook JARs are included in the installation package. Additional configuration changes are required as described below.

  • Update below properties as per the installation details:

    • ad.events.streaming.servers=(<Pulse IP>:19009)
    • ad.cluster=(cluster name as specified in Pulse installation)
  • Navigate to: Tez > Configs check if property tez.history.logging.service.class is configured to io.acceldata.hive.AdTezEventsNatsClient

ODP Spark 2 & 3

  1. Login to the Ambari Admin Web UI.
  2. Navigate to the: Spark > Configs > Advanced spark2-metrics-properties.
  3. Go to the end of the file and add the following lines:
Bash
Copy

Additional Config changes (common for HDP & ODP)

Additional configuration changes are required if ACL is enabled at services or running an old service versions:

  • YARN ACL- Check if ACL enabled for YARN (yarn.acl.enable), if yes add this property yarn.timeline-service.read.allowed.users=hdfs in custom yarn-site.xml. Restart Yarn service. Here hdfs is the default user being used and shared by the team, enter other specific users created for Pulse.
  • Kafka Protocol- Modify “PLAINTEXTSASL____ _** _ __** _** _ __** _** _ __** _** _ __** _** _ __** _** _ _ **_ **Listeners and Interbroker protocol on Ambari > Kafka to _SASL_PLAINTEXT**_ ,also check for listeners and update it to SASL_PLAINTEXT://localhost:6667
  • Kafka ACL- Allow Kafka ACL(s) permission to hdfs user, run below command using kafka user:
Bash
Copy

CDH 5.x & 6.x

Kafka and Zookeeper JMX are auto-enabled with CDH-based installation.

CDH Spark

  1. Under the Spark configurations search for (Safety Valve) for spark-conf/spark-defaults.conf .
  2. Add the following properties:
Bash
Copy
  1. Repeat the same steps for Spark2 configurations search for Spark Client Advanced Configuration Snippet (Safety Valve) for spark2-conf/spark-defaults.conf and add the preceding properties.

Refer to the special additional note in the HDP 2.x & 3.x section under spark configuration changes.

CDH Hive

Distro VersionHive VersionPulse Hook Jar Name
CDH 6.2.x2.1.xad-hive-hook__2.1.1__cdh6.2.1-assembly-1.2.3.jar
CDH 6.3.42.1.xad-hive-hook__cdh__3.0.0-assembly-1.2.3.jar

For above Hive versions, we use hive hooks to capture query statistics, and it requires below config changes:

  • Get hive-hook jars (Acceldata team will share) as mentioned in above tables
  • Place the provided hook jars on all edge, Hiveserver2 on local path /opt/acceldata
  • Hook directory should be readable and executable by all users
  1. Under Hive, configurations search for the Gateway Client Environment Advanced Configuration Snippet (Safety Valve) for hive-env.sh and add the following property:
Bash
Copy
  1. Under Hive, configurations search for Hive Client Advanced Configuration Snippet (Safety Valve) for hive-site.xml & HiveServer2 Advanced Configuration Snippet (Safety Valve) for hive-site.xml, change view as xml _and add the following properties:
XML
Copy
  1. Add below new properties under Advanced hive-site xml section:

    1. ad.events.streaming.servers=(<Pulse IP>:19009)
    2. ad.cluster=(cluster name as specified in Pulse installation)
  2. Restart the affected Hive Components and deploy the new client configuration

CDH Sqoop

Place the hook jar in classpath libraries of Sqoop client on given edge nodes.

CDP 7.x

Kafka and Zookeeper JMX are auto enabled with CDH based installation. Make the same changes for Spark as mentioned for Spark in CDH section.

CDP Kafka

Under Additional Broker Java Options/broker_java_opts, replace the

-Dcom.sun.management.jmxremote.host=127.0.0.1with -Dcom.sun.management.jmxremote.host=0.0.0.0

CDP Hive

To see the Hive table details with data on the UI, ensure to set the boolean value for hive.stats.autogather and hive.stats.column.autogather in the hive-site.xml file for it to compute the data automatically.

You can also run the following command manually to compute the table analysis.

ANALYZE TABLE <table name> COMPUTE STATISTICS

Under the Hive -> Java Configuration Options for Hive Metastore Server, update the property with following value:

Bash
Copy

Under Hive on Tez -> Java Configuration Options for HiveServer2, update the property with following value:

Bash
Copy
Distro VersionHive VersionTez VersionPulse Hook Jar Name
CDP3.1.30.9.1ad-hive-hook_cdp_ 3.1.3-assembly-1.2.3.jar

For above Hive versions, we use hive hooks to capture query statistics, and it requires below config changes:

  • Get hive-hook jars (Acceldata team will share) as mentioned in above tables
  • Place the provided hook jars on all edge, Hiveserver2 on local path /opt/acceldata
  • Hook directory should be readable and executable by all users
  1. Under component Hive , search configuration Hive Service Environment Advanced Configuration Snippet (Safety Valve) , and under component Hive on Tez, search configuration Hive on Tez Service Environment Advanced Configuration Snippet (Safety Valve) and add the following property:
Bash
Copy
  1. Under component Hive, search configuration for Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml , and under component Hive on Tez, search configuration for Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml , change view as xml and add the following properties:
XML
Copy
  1. Restart the affected Hive Components and deploy the new client configuration

CDP TEZ

  • Get hive-hook jars (Acceldata team will share) as mentioned in above tables
  • Login to any HDFS client node and follow below steps to add Pulse hook jar inside Tez tar:

Avoid clicking action available under Tez "Upload Tez tar file to HDFS"

Bash
Copy
  1. Under component Tez, search configuration for Tez Client Advanced Configuration Snippet (Safety Valve) for tez-conf/tez-site.xml , change view as xml and add the following properties:
XML
Copy
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard