Spark Standalone Multi-cluster

This document provides you a step by step process on how to install single Pulse instance for multiple Spark Standalone clusters.

Pre-requisites

Ensure the following are present:

  1. Spark hosts: Refer to steps 1 and 2 mentioned below the note.
  2. Zookeeper hosts files: Refer to step 3 mentioned below the note.
  3. Log locations
  4. Spark history server locations
  5. Certificates (if any for Spark history server)
  6. Docker version

Prerequisites for enabling (TLS) HTTPS for Pulse Web UI Configuration using ad-proxy:

  1. Certificate File: cert.crt
  2. Certificate Key: cert.key
  3. CA Certificate: ca.crt (optional)
  4. Decide whether to keep the HTTP port (Default: 4000) open or not
  5. Decide on which port to use (default: 443)
  1. Obtain the fully qualified domain names (FQDN) for the Spark Master URLs for both clusters and include them in the spark_<clustername>.hosts file. The Spark hosts file should be structured as follows:
Bash
Copy
  1. Retrieve the fully qualified domain names (FQDN) for the Spark History Server URLs for both clusters. When requested, provide the URL in the following format:
Bash
Copy
  1. Obtain the fully qualified domain names (FQDN) for the Zookeeper Server URLs for both clusters and place them in the zk_<clustername>.hosts file. The Zookeeper Hosts file should adhere to the following format:
Bash
Copy
  1. Retrieve the log locations for the application and deployment logs, as well as the SPARK_HOME directory for both clusters.
  2. Ensure that the Docker version is >= 20.10.x.

Uninstallation

To uninstall agents, perform the following:

  1. To uninstall agents, you must run the hystaller uninstallcommand through their ansible setup.
  2. You must remove the Pulse Spark Hook Jars from the locations along with the related configurations from the Spark master and worker nodes.
  3. Acceldata team must then perform the following steps using the command below to backup and uninstall the existing Pulse application.
    1. Create a backup directory: mkdir -p /data01/backup
    2. As a backup, copy the entire config and work directories: cp -R $AcceloHome/config /data01/backup/cp -R $AcceloHome/work /data01/backup/``
    3. Uninstall the existing Pulse setup by running the following command: accelo uninstall local

OUTPUT

Bash
Copy

Executing this action will remove all files, folders, docker containers, docker images, and the entire Acceldata directory.

  1. Logout of the terminal session.

Download and Load Binaries and Docker Images

To download and load binaries and Docker images, perform the following:

When downloading the Pulse all-in-one TAR file, you must also download the hystaller binary separately for Pulse version 3.3.3 and perform the following:

  1. Download all the 3.3.3 binaries.
  2. Replace the hystaller binary with the direct download link provided by the Acceldata team.
  1. Download the jars, hystaller, accelo binaries, and docker images from the download links provided by the Acceldata team.
  2. Move the Docker images and jars into the following directory:
Bash
Copy
  1. Copy the Binaries and Tar files into the /data01/images folder.
Bash
Copy
  1. Change the directory
Bash
Copy
  1. Extract the single tar file
Bash
Copy

OUTPUT

Bash
Copy
  1. To load the Docker images, execute the following command:
Bash
Copy
  1. Check if all the images are loaded to the server using the following command:
Bash
Copy

Configure the Cluster

To configure the cluster in Pulse, perform the following:

  1. Validate all the host files.
  2. Create the acceldata directory by running the following command:
Bash
Copy
  1. Copy the Spark hosts and Zookeeper host files in acceldata directory by running the following command:
Bash
Copy
  1. Place the accelo binary in this /data01/acceldata directory:
Bash
Copy
  1. Rename the accelo.linux binary to accelo .
Bash
Copy
  1. Change the directory:
Bash
Copy
  1. Run the following command to perform accelo init:
Bash
Copy
  1. Enter appropriate answers when prompted.
  2. Run the following command to source the ad.sh file:
Bash
Copy
  1. Run the init command to provide the Pulse version:
Bash
Copy

OUTPUT

Bash
Copy

Provide the correct Pulse version, in this case it is 3.3.3

  1. Run accelo info command as follows:
Bash
Copy

OUTPUT

Bash
Copy
  1. To configure the cluster in Pulse, run the config cluster command:
Bash
Copy
  1. Provide the correct information when prompted. The output must appear as follows:
Bash
Copy
  1. Run the config cluster command for all the clusters and provide the appropriate answers when prompted.
Bash
Copy
  1. Run the config cluster command for Nifi Stand-Alone and select standalone > nifi.
Bash
Copy

Copy the License

Place the license file provided by the Acceldata team in the work directory as shown below:

Bash
Copy

Deploy Pulse Core Components

Deploy the Pulse core components by running the following command:

Bash
Copy

The output must appear as follows:

Bash
Copy

Deploy Add-ons

To deploy the Pulse add-ons, run the code below and select the required components for Spark standalone:

Bash
Copy

The output must appear as follows:

Bash
Copy

Configure Alerts Notifications

To configure alerts notifications, perform the following:

  1. Set the active cluster by running the following command:
Bash
Copy
  1. Configure the alerts notifications using the following command:
Bash
Copy

OUTPUT

Bash
Copy
  1. Set cluster2 as the active cluster:
Bash
Copy
  1. Configure the alerts for second cluster:
Bash
Copy
  1. Set cluster3 as the active cluster:
Bash
Copy
  1. Configure the alerts for the third cluster:
Bash
Copy
  1. Restart the alerts notifications:
Bash
Copy

OUTPUT

Bash
Copy

Database Push Configuration

Run the following command to push config to db:

Bash
Copy

Configure the Override

  1. Change the dir to work/<clustername> .
Bash
Copy
  1. Modify the override.yml file.
Bash
Copy
  1. Paste the below config in the file.
Bash
Copy

Do the above steps for all clusters.

Deploy the Pulse Agents

Install the new Pulse version 3.3.3 agents on all cluster nodes. Make a copy of the new hystaller file to /tmp or any executable location on all cluster nodes and then run the following command on all cluster nodes.

Change the following code snippet as per your environment

Bash
Copy

Reconfig Cluster

  1. After completing the edits to the override files as outlined above, the next step is to run the following command:
Bash
Copy

OUTPUT

Bash
Copy
  1. DB Push Config
Bash
Copy

Adding Edge Nodes for Monitoring

These are edge nodes that are not the part of the spark standalone cluster.

  1. Change the dir to work/<clustername> .
Bash
Copy
  1. Modify the hydra_hosts_override.yml file.
Bash
Copy
  1. Add the following code to add a host to a already existing host for pulse to monitor:
Bash
Copy
  1. Run the accelo reconfig cluster command for clusters with edge nodes that require monitoring by Pulse. Alternatively, for comprehensive coverage, perform a reconfig cluster on all clusters.
Bash
Copy
  1. Check the hydra_hosts.yml file which will now contain the new hosts as well. For example:
Bash
Copy

Configure Gauntlet

Updating the Gauntlet Crontab Duration

  1. Check if the ad-core.yml file is present or not by running the following command:
Bash
Copy
  1. If the file above is not present, then generate it by:
Bash
Copy
  1. Edit the ad-core.yml file

a. Open the file:

Bash
Copy

b. Update the CRON_TAB_DURATION env variable in the ad-gauntlet section:

Bash
Copy

This makes gauntlet run every 2 days at midnight.

c. The updated file will look something like this:

Bash
Copy

d. Save the file.

  1. Restart gauntlet service by running the command:
Bash
Copy

Updating the Gauntlet Dry Run Mode

  1. Check if the ad-core.yml file is present or not by running the following command:
Bash
Copy
  1. If the file above is not present, then generate it by:
Bash
Copy
  1. Edit the ad-core.yml file.

a. Open the file.

Bash
Copy

b. Update the DRY_RUN_ENABLE env variable in the ad-gauntlet section:

Bash
Copy

This will make the gauntlet delete the order elastic indices and mongo db data.

c. The updated file will look something like this:

Bash
Copy

d. Save the file.

  1. Restart gauntlet service by running the command:
Bash
Copy

Updating MongoDB Cleanup and Compaction Frequency in Hours

By default, when dry run is disabled MongoDB cleanup and compaction will run once a day. To configure the frequency, follow the steps listed below.

  1. Run the following command:
Bash
Copy
  1. Answer the following prompts, if you’re unsure about how many days you wish to retain. Then proceed with the default values.
Bash
Copy
  1. When the following prompt comes up, specify the hours of the day during which you would like MongoDB clean up and compaction to run. The value must be a CSV of hours as per the 24 hour time notation.
Bash
Copy
  1. Run the following command. When gauntlet runs the next time, MongoDB clean up and compaction will run at the specified hours, once per hour.
Bash
Copy

Enabling (TLS) HTTPS for Pulse Web UI Configuration Using ad-proxy

Deployment and Configuration

  1. Copy the cert.crt, cert.key and ca.crt (optional) files to $AcceloHome/config/proxy/certs location.
  2. Check if ad-core.yml file is present or not.
Bash
Copy
  1. If ad-core.yml file is not present, then generate the ad-core.yml file.
Bash
Copy

OUTPUT

Bash
Copy
  1. Modify the ad-core.yml file.

a. Open the ad-core.yml file.

Bash
Copy

b. Remove the ports: field in the ad-graphql section of ad-core.yml .

Bash
Copy

c. The resulting ad-graphql section will look like this:

Bash
Copy

d. Save the file

  1. Restart the ad-graphql container:
Bash
Copy
  1. Check if the port is not exposed to the host.
Bash
Copy
  1. Check if there are any errors in ad-graphql container:
Bash
Copy
  1. Deploy the ad-proxy addons, run the following command, and select Proxy from the list and press enter.
Bash
Copy

OUTPUT

Bash
Copy
  1. Check if there are any errors in the ad-proxy container:
Bash
Copy
  1. Now you can access the Pulse UI using https://<pulse-server-hostname> . By default the port used is 443 .

Configuration

If you want to change the SSL port to another port, follow the below steps:

  1. Check if ad-proxy.yml file is present or not.
Bash
Copy
  1. Generate the ad-proxy.yml file if its not present.
Bash
Copy

OUTPUT

Bash
Copy
  1. Modify the ad-proxy.yml .

a. Open the ad-proxy.yml file.

Bash
Copy

b. Change the host port in the ports list to the desired port.

Bash
Copy

The final file will look like this, if the host port is 6003 :

Bash
Copy

c. Save the file.

  1. Restart the ad-proxy container.
Bash
Copy
  1. Check if there aren’t any errors.
Bash
Copy
  1. Now you can access the Pulse UI using https://<pulse-server-hostname>:6003 .

Set Up LDAP for Pulse UI

  1. Check if the ldap.conf is present or not.
Bash
Copy
  1. Run the accelo config ldap command to generate the default ldap.conf if not present already.
Bash
Copy

OUTPUT

Bash
Copy
  1. Edit the file in path $AcceloHome/config/ldap/ldap.conf .
Bash
Copy
  1. Configure file for below properties:
  • LDAP FQDN : FQDN where LDAP server is running

    • host = [FQDN]
  • If port 389 is being used then

    • insecureNoSSL = true
  • SSL root CA Certificate

    • rootCA = [CERTIFICATE_FILE_PATH]
  • bindDN : to be used for ldap search need to be member of admin group

  • bindPW : password for entering in database, can be removed later once ldapgets enabled

  • baseDN used for user search

    • Eg: (cn=users, cn=accounts, dc=accedata, dc=io)
  • Filter used for the user search

    • Eg: (objectClass=person)
  • baseDN used for group search

    • Eg: (cn= groups, cn=accounts, dc=acceldata, dc=io)
  • Group Search: Object class used for group search

    • Eg: (objectClass= posixgroup)

Here is the command to check if a user has search entry access and group access in LDAP directory:

Bash
Copy
  1. If the file is already generated, it will ask for the LDAP credentials to validate the connectivity and configurations, which are mentioned in the below steps.
  2. Run the accelo config ldap command.
Bash
Copy
  1. It will ask for the LDAP user credentials:
Bash
Copy
  1. If things went correctly, the below confirmation message will be displayed:
Bash
Copy
  1. Press ‘y' and press 'Enter’.

OUTPUT

Bash
Copy
  1. Push the ldap config.
Bash
Copy
  1. Run the deploy addon command.
Bash
Copy
  1. Select the LDAP from the list shown and press 'Enter':
Bash
Copy

OUTPUT

Bash
Copy
  1. Run the restart command.
Bash
Copy
  1. Open Pulse Web UI and create default roles.
  2. Add ops role with required access and all incoming users with ldap login will come under this role automatically.

Spark Jars Placements and Spark Config Changes

Perform the following steps for all the Spark Cluster Nodes:

  1. Add the following configuration in the metrics.properties file for Spark TimeSeries data:
Bash
Copy
  1. Add the following configuration in the spark-defaults.conf file for the Events data:
Bash
Copy
  1. Take the ad-spark-hook.jar file and put it in the following dir:
Bash
Copy
  1. Restart all the Spark services.

DotLog Download

We have introduced a feature that allows downloading of service logs in .log format. This file is not the original server log but an xlsx sheet merged into a .log format.

Perform the following to add a configurable parameter to enable this feature:

  1. Insert the dotLogFileDownload parameter into the feature flags property of the ad-graphql section found at the file path: $Acceldata_Home/config/docker/ad-core.yml.
Bash
Copy
  1. Restart the ad-graphl service using the following command:
Bash
Copy

Perform the following to enable new search options:

  1. Locate the “ad-graphql“ section in file $Acceldata_Home/config/docker/ad-core.yml and under the “environment“ key, add the following line:
Bash
Copy
  1. Restart the ad-graphl service using the following command:
Bash
Copy

Does a user in the Spark Standalone environment still see the Spark option in the left menu even after their access has been revoked from the role?

Create a different role that does not have Spark permission and assign that role to the user. Alternatively, you can leave it as is because even if the Spark entry is visible in the left navigation, the user will not be able to access it if access has been revoked from their role.

Are non-admin users in the Spark Standalone environment able to access Spark even though they have the appropriate role permissions for accessing Spark?

In the role edit window, click on "Select All" just below the Page permissions. Then, remove any permissions that you do not wish to grant and save the role. Any user assigned to this role should now have access to Spark in the Spark Standalone environment.

What is the reason for the absence of the Oozie workflow link between the Oozie workflow and the application ID in PULSE for a Spark job?

The Spark job's Application ID is generated by the Oozie service. It will only appear in the Pulse UI if it is available in Oozie's Web Service UI. If it is not present in the Oozie's Web Service UI, it will not be displayed in Pulse.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard