Title
Create new category
Edit page index title
Edit category
Edit link
Configure Pulse to Monitor Standalone Spark
This document provides you a step by step process on how to install single Pulse instance for multiple Spark Standalone clusters.
Pre-requisites
Ensure the following are present:
- Spark hosts: Refer to steps 1 and 2 mentioned below the note.
- Zookeeper hosts files: Refer to step 3 mentioned below the note.
- Log locations
- Spark history server locations
- Certificates (if any for Spark history server)
- Docker version
Prerequisites for enabling (TLS) HTTPS for Pulse Web UI Configuration using ad-proxy:
- Certificate File: cert.crt
- Certificate Key: cert.key
- CA Certificate: ca.crt (optional)
- Decide whether to keep the HTTP port (Default: 4000) open or not
- Decide on which port to use (default: 443)
- Obtain the fully qualified domain names (FQDN) for the Spark Master URLs for both clusters and include them in the
spark_<clustername>.hostsfile. The Spark hosts file should be structured as follows:
xxxxxxxxxxSparkMasterURLList: - <http/s>://<Alias/FQDN of the Spark Master 1>:<Spark Master UI Port> - <http/s>://<Alias/FQDN of the Spark Master 2>:<Spark Master UI Port>SparkWorkerURLList: - <http/s>://<Alias/FQDN of the Spark Worker 1>:<Spark Worker Port> - <http/s>://<Alias/FQDN of the Spark Worker 2>:<Spark Worker Port>- Retrieve the fully qualified domain names (FQDN) for the Spark History Server URLs for both clusters. When requested, provide the URL in the following format:
xxxxxxxxxx<http/s>://<Alias/FQDN of the Spark History Server URL>:<Spark History Server URL>- Obtain the fully qualified domain names (FQDN) for the Zookeeper Server URLs for both clusters and place them in the
zk_<clustername>.hostsfile. The Zookeeper Hosts file should adhere to the following format:
xxxxxxxxxx<http/s>://<Alias/FQDN for the Zookeeper Server>:<Zookeeper Server Port>- Retrieve the log locations for the application and deployment logs, as well as the
SPARK_HOMEdirectory for both clusters. - Ensure that the Docker version is >= 20.10.x.
Uninstallation
To uninstall agents, perform the following:
- To uninstall agents, you must run the
hystaller uninstallcommand through their ansible setup. - You must remove the Pulse Spark Hook Jars from the locations along with the related configurations from the Spark master and worker nodes.
- Acceldata team must then perform the following steps using the command below to backup and uninstall the existing Pulse application.
- Create a backup directory:
mkdir -p /data01/backup - As a backup, copy the entire
configandworkdirectories:cp -R $AcceloHome/config /data01/backup/cp -R $AcceloHome/work /data01/backup/`` - Uninstall the existing Pulse setup by running the following command:
accelo uninstall local
- Create a backup directory:
OUTPUT
[root@nifihost1:data01 (ad-default)]$ accelo uninstall local✗ You're about to uninstall the local AccelData setup. This will also DELETE all persistent data from the current node. However, NONE of the remote no✔ You're about to uninstall the local AccelData setup. This will also DELETE all persistent data from the current node. However, NONE of the remote no✔ You're about to uninstall the local AccelData setup. This will also DELETE all persistent data from the current node. However, NONE of the remote noYou're about to uninstall the local AccelData setup. This will also DELETE all persistent data from the current node. However, NONE of the remote nodes will be affected. Please confirm your action [y/n]: : yWARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DBUninstalling the AccelData components from local machine ...Executing this action will remove all files, folders, docker containers, docker images, and the entire Acceldata directory.
- Logout of the terminal session.
Download and Load Binaries and Docker Images
To download and load binaries and Docker images, perform the following:
When downloading the Pulse all-in-one TAR file, extract the hystaller binary directly from the package.
- Download the jars, hystaller, accelo binaries, and docker images from the download links provided by the Acceldata team.
- Move the Docker images and jars into the following directory:
xxxxxxxxxxmkdir -p /data01/images- Copy the Binaries and Tar files into the
/data01/imagesfolder.
xxxxxxxxxxcp </path/to/binaries/tar> /data01/images- Change the directory
xxxxxxxxxxcd /data01/images- Extract the single tar file
xxxxxxxxxxtar xvf <name_of_tar_file>.tarOUTPUT
xxxxxxxxxx[root@nifihost1 images]# tar xvf pulse-333-beta.tar./ad-alerts.tgz./ad-connectors.tgz./ad-dashplots.tgz./ad-database.tgz./ad-deployer.tgz./ad-director.tgz./ad-elastic.tgz./ad-events.tgz./ad-fsanalyticsv2-connector.tgz./ad-gauntlet.tgz./ad-graphql.tgz./ad-hydra.tgz./ad-impala-connector.tgz./ad-kafka-0-10-2-connector.tgz./ad-kafka-connector.tgz./ad-ldap.tgz./ad-logsearch-curator.tgz./ad-logstash.tgz./ad-notifications.tgz./ad-oozie-connector.tgz./ad-pg.tgz./ad-proxy.tgz./ad-pulsemon-ui.tgz./ad-recom.tgz./ad-sparkstats.tgz./ad-sql-analyser.tgz./ad-streaming.tgz./ad-vminsert.tgz./ad-vmselect.tgz./ad-vmstorage.tgz./accelo.linux./admon./hystaller- To load the Docker images, execute the following command:
xxxxxxxxxxls -1 *.tgz | xargs --no-run-if-empty -L 1 docker load -i- Check if all the images are loaded to the server using the following command:
xxxxxxxxxxdocker images | grep 4.x.xReplace 4.x.x with the Pulse version you want to install.
Configure the Cluster
To configure the cluster in Pulse, perform the following:
- Validate all the host files.
- Create the
acceldatadirectory by running the following command:
xxxxxxxxxxcd /data01/mkdir -p acceldata- Place the
accelobinary in this/data01/acceldatadirectory:
xxxxxxxxxxcp </path/to/accelo/binary> /data01/acceldata- Rename the
accelo.linuxbinary toaccelo.
xxxxxxxxxxmv /data01/acceldata/accelo.linux accelochmod +x /data01/acceldata/accelo- Change the directory:
xxxxxxxxxxcd /data01/acceldata/accelo- Run the following command to perform
accelo init:
xxxxxxxxxx./accelo init- Enter appropriate answers when prompted.
- When the Spark master is available, you can add the following parameter in the /etc/profile.d/ad.sh file to sync the Spark worker list from the Spark master URL.
xxxxxxxxxxSYNC_SPARK_MASTER=true- Run the following command to source the
ad.shfile:
xxxxxxxxxxsource /etc/profile.d/ad.sh- Run the
initcommand to provide the Pulse version:
xxxxxxxxxxaccelo initOUTPUT
xxxxxxxxxx[root@nifihost1:~ (ad-default)]$ accelo initEnter the AccelData ImageTag: : 4.x.x✓ Done, AccelData Init Successful.Replace 4.x.x with the Pulse version you want to install.
- Run
accelo infocommand as follows:
xxxxxxxxxxaccelo infoOUTPUT
xxxxxxxxxx[root@nifihost1:~ (ad-default)]$ accelo infoWARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DB ___ ____________________ ____ ___ _________ / | / ____/ ____/ ____/ / / __ \/ |/_ __/ | / /| |/ / / / / __/ / / / / / / /| | / / / /| | / ___ / /___/ /___/ /___/ /___/ /_/ / ___ |/ / / ___ |/_/ |_\____/\____/_____/_____/_____/_/ |_/_/ /_/ |_| Accelo CLI Version: 4.x.xAccelo CLI Build Hash: 8ba4727f11e5b3f3902547585a37611b6ec74e7cAccelo CLI Build ID: 1700746329Accelo CLI Builder ID: ZEdjMmxrYUdGdWRGOWhZMk5sYkdSaEVLCg==Accelo CLI Git Branch Hash: TXdLaTlCVDFBdE56STNvPQo=AcceloHome: /data01/acceldataAcceloStack: ad-defaultAccelData Registry: 191579300362.dkr.ecr.us-east-1.amazonaws.com/acceldataAccelData ImageTag: 4.x.xActive Cluster Name: NotFoundAcceloConfig Mongo DB Retention days: 15AcceloConfig Mongo DB HDFS Reports Retention days: 15AccelConfig TSDB Retention days: 31dNumber of AccelData stacks found in this node: 0Replace 4.x.x with the Pulse version you want to install.
- To configure the cluster in Pulse, run the
config clustercommand:
xxxxxxxxxxaccelo config cluster- Provide the correct information when prompted. The output must appear as follows:
- Run the
config cluster commandfor all the clusters and provide the appropriate answers when prompted.
- Run the
config cluster commandfor Nifi Stand-Alone and select standalone > nifi.
Copy the License
Place the license file provided by the Acceldata team in the work directory as shown below:
Deploy Pulse Core Components
Deploy the Pulse core components by running the following command:
The output must appear as follows:
Deploy Add-ons
To deploy the Pulse add-ons, run the code below and select the required components for Spark standalone:
The output must appear as follows:
Configure Alerts Notifications
To configure alerts notifications, perform the following:
- Set the active cluster by running the following command:
- Configure the alerts notifications using the following command:
OUTPUT
- Set cluster2 as the active cluster:
- Configure the alerts for second cluster:
- Set cluster3 as the active cluster:
- Configure the alerts for the third cluster:
- Restart the alerts notifications:
OUTPUT
Database Push Configuration
Run the following command to push config to db:
Configure the Override
- Change the dir to
work/<clustername>.
- Modify the
override.ymlfile.
- Paste the below config in the file.
Do the above steps for all clusters.
Deploy the Pulse Agents
Install the new Pulse version x.x.x agents on all cluster nodes. Make a copy of the new hystaller file to /tmp or any executable location on all cluster nodes and then run the following command on all cluster nodes.
Change the following code snippet as per your environment
Reconfig Cluster
- After completing the edits to the override files as outlined above, the next step is to run the following command:
OUTPUT
- DB Push Config
Adding Edge Nodes for Monitoring
These are edge nodes that are not the part of the spark standalone cluster.
- Change the dir to
work/<clustername>.
- Modify the
hydra_hosts_override.ymlfile.
- Add the following code to add a host to a already existing host for pulse to monitor:
- Run the
accelo reconfig clustercommand for clusters with edge nodes that require monitoring by Pulse. Alternatively, for comprehensive coverage, perform a reconfig cluster on all clusters.
- Check the
hydra_hosts.ymlfile which will now contain the new hosts as well. For example:
Configure Gauntlet
Updating the Gauntlet Crontab Duration
- Check if the
ad-core.ymlfile is present or not by running the following command:
- If the file above is not present, then generate it by:
- Edit the
ad-core.ymlfile
a. Open the file:
b. Update the CRON_TAB_DURATION env variable in the ad-gauntlet section:
This makes gauntlet run every 2 days at midnight.
c. The updated file will look something like this:
d. Save the file.
- Restart gauntlet service by running the command:
Updating the Gauntlet Dry Run Mode
- Check if the
ad-core.ymlfile is present or not by running the following command:
- If the file above is not present, then generate it by:
- Edit the
ad-core.ymlfile.
a. Open the file.
b. Update the DRY_RUN_ENABLE env variable in the ad-gauntlet section:
This will make the gauntlet delete the order elastic indices and mongo db data.
c. The updated file will look something like this:
d. Save the file.
- Restart gauntlet service by running the command:
Updating MongoDB Cleanup and Compaction Frequency in Hours
By default, when dry run is disabled MongoDB cleanup and compaction will run once a day. To configure the frequency, follow the steps listed below.
- Run the following command:
- Answer the following prompts, if you’re unsure about how many days you wish to retain. Then proceed with the default values.
- When the following prompt comes up, specify the hours of the day during which you would like MongoDB clean up and compaction to run. The value must be a CSV of hours as per the 24 hour time notation.
- Run the following command. When gauntlet runs the next time, MongoDB clean up and compaction will run at the specified hours, once per hour.
Enabling (TLS) HTTPS for Pulse Web UI Configuration Using ad-proxy
Deployment and Configuration
- Copy the
cert.crt,cert.keyandca.crt(optional) files to$AcceloHome/config/proxy/certslocation. - Check if
ad-core.ymlfile is present or not.
- If
ad-core.ymlfile is not present, then generate thead-core.ymlfile.
OUTPUT
- Modify the
ad-core.ymlfile.
a. Open the ad-core.yml file.
b. Remove the ports: field in the ad-graphql section of ad-core.yml .
c. The resulting ad-graphql section will look like this:
d. Save the file
- Restart the
ad-graphqlcontainer:
- Check if the port is not exposed to the host.
- Check if there are any errors in
ad-graphqlcontainer:
- Deploy the
ad-proxyaddons, run the following command, and selectProxyfrom the list and press enter.
OUTPUT
- Check if there are any errors in the
ad-proxycontainer:
- Now you can access the Pulse UI using
https://<pulse-server-hostname>. By default the port used is443.
Configuration
If you want to change the SSL port to another port, follow the below steps:
- Check if
ad-proxy.ymlfile is present or not.
- Generate the
ad-proxy.ymlfile if its not present.
OUTPUT
- Modify the
ad-proxy.yml.
a. Open the ad-proxy.yml file.
b. Change the host port in the ports list to the desired port.
The final file will look like this, if the host port is 6003 :
c. Save the file.
- Restart the
ad-proxycontainer.
- Check if there aren’t any errors.
- Now you can access the Pulse UI using
https://<pulse-server-hostname>:6003.
Set Up LDAP for Pulse UI
- Check if the
ldap.confis present or not.
- Run the
accelo config ldapcommand to generate the defaultldap.confif not present already.
OUTPUT
- Edit the file in path
$AcceloHome/config/ldap/ldap.conf.
- Configure file for below properties:
LDAP FQDN : FQDN where LDAP server is running
- host = [FQDN]
If port 389 is being used then
- insecureNoSSL = true
SSL root CA Certificate
- rootCA = [CERTIFICATE_FILE_PATH]
bindDN : to be used for
ldapsearch need to be member of admin groupbindPW :
<encrypted-password-string>for entering in database.encryptedPassword =
true, set this to true to enable the use of encrypted password.baseDN used for user search
- Eg:
(cn=users, cn=accounts, dc=accedata, dc=io)
- Eg:
Filter used for the user search
- Eg:
(objectClass=person)
- Eg:
baseDN used for group search
- Eg:
(cn= groups, cn=accounts, dc=acceldata, dc=io)
- Eg:
Group Search: Object class used for group search
- Eg:
(objectClass= posixgroup)
- Eg:
Here is the command to check if a user has search entry access and group access in LDAP directory:
- If the file is already generated, it will ask for the
LDAPcredentials to validate the connectivity and configurations, which are mentioned in the below steps. - Run the
accelo config ldapcommand.
- It will ask for the LDAP user credentials:
- If things went correctly, the below confirmation message will be displayed:
- Press ‘y' and press 'Enter’.
OUTPUT
- Push the ldap config.
- Run the deploy addon command.
- Select the LDAP from the list shown and press 'Enter':
OUTPUT
- Run the restart command.
- Open Pulse Web UI and create default roles.
- Add ops role with required access and all incoming users with
ldaplogin will come under this role automatically.
Spark Jars Placements and Spark Config Changes
Perform the following steps for all the Spark Cluster Nodes:
- Add the following configuration in the
metrics.propertiesfile for Spark TimeSeries data:
- Add the following configuration in the
spark-defaults.conffile for the Events data:
- Take the
ad-spark-hook.jarfile and put it in the following dir:
- Restart all the Spark services.
DotLog Download
We have introduced a feature that allows downloading of service logs in .log format. This file is not the original server log but an xlsx sheet merged into a .log format.
Perform the following to add a configurable parameter to enable this feature:
- Insert the dotLogFileDownload parameter into the feature flags property of the
ad-graphqlsection found at the file path: $Acceldata_Home/config/docker/ad-core.yml.
- Restart the ad-graphl service using the following command:
New Search Bar
Perform the following to enable new search options:
- Locate the “ad-graphql“ section in file $Acceldata_Home/config/docker/ad-core.yml and under the “environment“ key, add the following line:
- Restart the ad-graphl service using the following command:
Does a user in the Spark Standalone environment still see the Spark option in the left menu even after their access has been revoked from the role?
Create a different role that does not have Spark permission and assign that role to the user. Alternatively, you can leave it as is because even if the Spark entry is visible in the left navigation, the user will not be able to access it if access has been revoked from their role.
Are non-admin users in the Spark Standalone environment able to access Spark even though they have the appropriate role permissions for accessing Spark?
In the role edit window, click on "Select All" just below the Page permissions. Then, remove any permissions that you do not wish to grant and save the role. Any user assigned to this role should now have access to Spark in the Spark Standalone environment.
What is the reason for the absence of the Oozie workflow link between the Oozie workflow and the application ID in PULSE for a Spark job?
The Spark job's Application ID is generated by the Oozie service. It will only appear in the Pulse UI if it is available in Oozie's Web Service UI. If it is not present in the Oozie's Web Service UI, it will not be displayed in Pulse.
For additional help, contact www.acceldata.force.com OR call our service desk +1 844 9433282
Copyright © 2026