CDP Deployment for Single KDC

This document provides a step by step process to deploy single Pulse instance for Cloudera clusters with single KDC.

Prerequisites

Keep the following information handy:

  1. CM URL (https://<Alias/FQDN of the CM URL>:<CM Port>)
  2. CM Username
  3. CM Password
  4. Spark History HDFS path & Spark3 History HDFS path
  5. Kafka Version
  6. Hbase Version
  7. Hive Version
  8. Hive Metastore DB Connection URL
  9. hive metastore Database Name
  10. hive metastore DB Username
  11. hive metastore DB Password
  12. Oozie DB Name
  13. Oozie DB URL
  14. Oozie DB Username
  15. Oozie DB Password
  16. Kerberos Keytab
  17. krb5.conf file
  18. Principal
  19. Kerberos Username
  20. cacerts/jssecacerts
  21. YARN Scheduler Type
  22. Kafka Interbroker protocol
  1. Certificate File: cert.crt
  2. Certificate Key: cert.key
  3. CA Certificate: ca.crt (optional)
  4. Decide whether to keep the HTTP port (Default: 4000) open or not
  5. Decide on which port to use (default: 443)

Uninstallation

  1. For uninstalling agents, you must follow the Cloudera Parcel Agent Uninstall doc.
  2. You must also remove the Pulse JARS and the configuration for Hive and Tez.
  3. Acceldata will then perform the following command for backup and uninstalling the existing Pulse.

a. Create a backup directory.

Bash
Copy

b. For backup, we can copy the whole config and work dir.

Bash
Copy

c. Uninstall the existing Pulse setup by running the following command:

Bash
Copy

OUTPUT

Bash
Copy

d. Logout from the terminal session.

Download the Binaries and Docker Images and Load Them

  1. Download the jars, hystaller, accelo binaries, and docker images from the download links provided by Acceldata.
  2. Move the docker images and jars in the following directory:
Bash
Copy
  1. Copy the binaries and tar files in to the /data01/images folder.
Bash
Copy
  1. Change the directory.
Bash
Copy
  1. Extract the single tar file.
Bash
Copy

OUTPUT

Bash
Copy
  1. Load the Docker images by running the following command:
Bash
Copy
  1. Check if all the images are loaded into the server.
Bash
Copy

Config Cluster

  1. Validate the all the hosts file.
  2. Create the acceldata dir by running the following command:
Bash
Copy
  1. Copy the Spark hosts and Zookeeper hosts file in acceldata directory, by running the following command:
Bash
Copy
  1. Place the accelo binary in the /data01/acceldata directory.
Bash
Copy
  1. Rename the accelo.linux binary to accelo.
Bash
Copy
  1. Change the directory.
Bash
Copy
  1. Run the following command to do accelo init :
Bash
Copy
  1. Enter the appropriate answers when prompted.
  2. Source the ad.sh file.
Bash
Copy
  1. Run the init command to provide the Pulse version.
Bash
Copy

OUTPUT

Bash
Copy

Provide the correct Pulse version, in this case it will be 3.3.3.

  1. Now run accelo info command to get the initial info.
Bash
Copy

OUTPUT

Bash
Copy
  1. Run the config cluster command to configure the cluster in Pulse.
Bash
Copy
  1. Provide appropriate answers when prompted.
Bash
Copy

Copy the License

Place the license file provided by Acceldata in the work directory.

Bash
Copy

Deploy Core

Deploy the Pulse core components by running the following command:

Bash
Copy

OUTPUT

Bash
Copy

Configure SSL For Connectors and Streaming

If you have TLS/SSL enforced for any of the Hadoop components in the target cluster, copy the cacerts and jsseCaCerts certificates to the Pulse Node and enter their path when Accelo CLI asks the following question.

  1. Select Y if the SSL/TLS is enabled.
Bash
Copy
  1. Enter the certificate path.
Bash
Copy
  1. ad-connectors
  2. ad-sparkstats
  3. ad-streaming
  4. ad-kafka-connector
  5. ad-kafka-0-10-2-connector
  6. ad-fsanalyticsv2-connector

For Kafka connectors, first, verify the version of Kafka running in your cluster, and then generate the configurations accordingly.

Only these services will establish connections to the corresponding Hadoop components of the cluster via the HTTPS URI.

Ensure that the permissions of these files are set to 0655 . i.e, readable for all the users.

Bash
Copy

It's not obligatory to have both configuration files available for a target cluster. Sometimes, you might only have one of the files accessible. In such cases, you can simply utilize the available file and disregard the other.

AD-CONNECTORS & AD-SPARKSTATS

  1. Generate the ad-core-connectors configuration file if not present:
Bash
Copy
  1. Edit the file in path <$AcceloHome>/config/docker/addons/ad-core-connectors.yml and add the following lines under the volumes section of both ad-connectors and ad-sparkstats service blocks.
Bash
Copy
  1. If you only have the jssecacert file available and not the cacerts file, you can mount the jssecacerts file as the cacerts file inside the container, as demonstrated below:
Bash
Copy

AD-STREAMING

  1. Generate the ad-core configuration file if not present:
Bash
Copy
  1. Edit the file in path <$AcceloHome>/config/docker/ad-core.yml and add the following lines under the volumes section of ad-streaming service block.
Bash
Copy
  1. If you only have the jssecacert file available and not the cacerts file, you can mount the jssecacerts file as the cacerts file inside the container, as demonstrated below:
Bash
Copy

AD-FSANALYTICSV2-CONNECTOR

  1. Generate the ad-fsanalyticsv2-connector configuration file if not present:
Bash
Copy
  1. Edit the file in path <$AcceloHome>/config/docker/addons/ad-fsanalyticsv2-connector.yml and add the following lines under the volumes section of ad-fsanalyticsv2-connector.
Bash
Copy
  1. If you only have the jssecacert file available and not the cacerts file, you can mount the jssecacerts file as the cacerts file inside the container, as demonstrated below:
Bash
Copy

AD-KAFKA-CONNECTOR

  1. Generate the ad-core-connectors configuration file if not present:
Bash
Copy
  1. Edit the file in path <$AcceloHome>/config/docker/addons/ad-kafka-connector.yml and add the following lines under the volumes section of ad-kafka-connector.
Bash
Copy
  1. If you only have the jssecacert file available and not the cacerts file, you can mount the jssecacerts file as the cacerts file inside the container, as demonstrated below:
Bash
Copy

AD-KAFKA-0-10-2-CONNECTOR

  1. Generate the ad-core-connectors configuration file if not present:
Bash
Copy
  1. Edit the file in path <$AcceloHome>/config/docker/addons/ad-kafka-0-10-2-connector.yml and add the following lines under the volumes section of ad-kafka-0-10-2-connector.
Bash
Copy
  1. If you only have the jssecacert file available and not the cacerts file, you can mount the jssecacerts file as the cacerts file inside the container, as demonstrated below:
Bash
Copy

Deploy Addons

Run the following command to deploy the Pulse addons, and then select the components that are needed for Spark standalone:

Bash
Copy

OUTPUT

Bash
Copy

Configure Alerts Notifications

  1. For setting the active cluster, run the following command:
Bash
Copy
  1. Configure the alerts notifications.
Bash
Copy

OUTPUT

Bash
Copy
  1. Set the cluster2 as the active cluster.
Bash
Copy
  1. Configure the alerts for the second cluster.
Bash
Copy
  1. Set the cluster3 as the active cluster.
Bash
Copy
  1. Configure the alerts for the third cluster.
Bash
Copy
  1. Restart the alerts notifications.
Bash
Copy

OUTPUT

Bash
Copy

Database Push Configuration

Run the following command to push config to db:

Bash
Copy

Configure Gauntlet

Updating the Gauntlet Crontab Duration

  1. Check if the ad-core.yml file is present or not by running the following command:
Bash
Copy
  1. If the above file is not present, then generate it by running the following command:
Bash
Copy
  1. Edit the ad-core.yml file.

a. Open the file.

Bash
Copy

b. Update the CRON_TAB_DURATION env variable in the ad-gauntlet section.

Bash
Copy

This makes gauntlet run every two days at midnight.

c. The updated file will look something like this:

Bash
Copy

d. Save the file.

  1. Restart gauntlet service by running the following command:
Bash
Copy

Updating the Gauntlet Dry Run Mode

  1. Check if the ad-core.yml file is present or not by running the following command:
Bash
Copy
  1. If the above file is not present, then generate it by running the following command:
Bash
Copy
  1. Edit the ad-core.yml file.

a. Open the file.

Bash
Copy

b. Update the DRY_RUN_ENABLE env variable in the ad-gauntlet section.

Bash
Copy

This will make the gauntlet delete the older elastic indices and MongoDB data.

c. The updated file will look something like this:

Bash
Copy

d. Save the file.

  1. Restart gauntlet service by running the following command:
Bash
Copy

Configuring Gauntlet for Multi Node and Multi Cluster Deployment

  1. Run the following command to generate the gauntlet config files:
Bash
Copy
  1. Change the dir to config/gauntlet/ .
Bash
Copy
  1. Check if all the files are present or not for all the clusters or not.
Bash
Copy
  1. Modify the gauntlet_elastic_<clustername>.yml file.
Bash
Copy
  1. Edit the elastic address in the file for multi node setup.
Bash
Copy
  1. Modify the elastic address for both clusters.
  2. Push the config to database.
Bash
Copy
  1. Restart the gauntlet service.
Bash
Copy

Updating MongoDB Cleanup and Compaction Frequency in Hours

By default, when dry run is disabled MongoDB cleanup and compaction will run once a day. To configure the frequency, follow the steps listed below:

  1. Run the following command:
Bash
Copy
  1. Answer the prompts. If you’re unsure about how many days you wish to retain, then proceed with the default values.
Bash
Copy
  1. When the following prompt comes up, specify the hours of the day during which you would like MongoDB clean up and compaction to run. The value must be a CSV of hours as per the 24 hour time notation.
Bash
Copy
  1. Run the following command. When gauntlet runs the next time, MongoDB clean up and compaction will run at the specified hours, once per hour.
Bash
Copy

Enabling (TLS) HTTPS for Pulse Web UI Configuration Using ad-proxy

Deployment and Configuration

  1. Copy the cert.crt, cert.key and ca.crt (optional) files to $AcceloHome/config/proxy/certs location.
  2. Check if ad-core.yml file is present or not.
Bash
Copy
  1. If ad-core.yml file is not present, then generate the ad-core.yml file.
Bash
Copy

OUTPUT

Bash
Copy
  1. Modify the ad-core.yml file.

a. Open the ad-core.yml file.

Bash
Copy

b. Remove the ports: field in the ad-graphql section of ad-core.yml .

Bash
Copy

c. The resulting ad-graphql section will look like this:

Bash
Copy

d. Save the file.

  1. Restart the ad-graphql container.
Bash
Copy
  1. Check if the port is not exposed to host.
Bash
Copy

OUTPUT

Bash
Copy
  1. Check if there any errors in ad-graphql container.
Bash
Copy
  1. Deploy the ad-proxy addons, run the following command and select Proxy from the list and press enter.
Bash
Copy
  1. Now you can access the Pulse UI using https://<pulse-server-hostname> By default the port used is 443.

Configuration

If you want to change the SSL port to another port, follow the steps below:

  1. Check if ad-proxy.yml file is present or not.
Bash
Copy
  1. Generate the ad-proxy.yml file if its not present.
Bash
Copy

OUTPUT

Bash
Copy
  1. Modify the ad-core.yml file.

a. Open the ad-proxy.yml file.

Bash
Copy

b. Change the host port in the ports list to the desired port.

Bash
Copy

The final file will look like this if the host port is 6003 :

Bash
Copy

c. Save the file.

  1. Restart the ad-proxy container.
Bash
Copy
  1. Check if there are any errors.
Bash
Copy
  1. Now you can access the Pulse UI using https://<pulse-server-hostname>:6003 .

Set Up LDAP for Pulse UI

  1. Check if the ldap.conf is present or not.
Bash
Copy
  1. Run the configure command to generate the default ldap.conf if not already present.
Bash
Copy

OUTPUT

Bash
Copy
  1. Edit the file in path $AcceloHome/config/ldap/ldap.conf .
Bash
Copy
  1. Configure file for below properties:

    • LDAP FQDN : FQDN where LDAP server is running

      • host = [FQDN]
    • If port 389 is being used then

      • insecureNoSSL = true
    • SSL root CA Certificate

      • rootCA = [CERTIFICATE_FILE_PATH]
    • bindDN : to be used for ldap search need to be member of admin group

    • bindPW : <encrypted-password-string> for entering in database.

    • encryptedPassword = true , set this to true to enable the use of encrypted password.

    • baseDN used for user search

      • Eg: (cn=users, cn=accounts, dc=accedata, dc=io)
    • Filter used for the user search

      • Eg: (objectClass=person)
    • baseDN used for group search

      • Eg: (cn= groups, cn=accounts, dc=acceldata, dc=io)
    • Group Search: Object class used for group search

      • Eg: (objectClass= posixgroup)

Here is the command to check if user has search entry access and group access in LDAP directory:

Bash
Copy

If the file is already generated it will ask for the LDAP credentials to validate the connectivity and configurations which are mentioned in the below steps.

  1. Run the configure command.
Bash
Copy
  1. It will ask for the LDAP user credentials.
Bash
Copy
  1. If things went correctly, it will show the below confirmation message:
Bash
Copy
  1. Press ‘y' and press 'Enter’.

OUTPUT

Bash
Copy
  1. Push the LDAP config.
Bash
Copy
  1. Run the deploy add-ons command.
Bash
Copy
  1. Select the LDAP from the list shown and click Enter.
Bash
Copy

OUTPUT

Bash
Copy
  1. Run the restart command.
Bash
Copy
  1. Open Pulse Web UI and create default roles.
  2. Create an ops role with the necessary access permissions. Any users who log in via LDAP will automatically be assigned to this role.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard