Spark Standalone Multi-cluster
This document provides you a step by step process on how to install single Pulse instance for multiple Spark Standalone clusters.
Pre-requisites
Ensure the following are present:
- Spark hosts: Refer to steps 1 and 2 mentioned below the note.
- Zookeeper hosts files: Refer to step 3 mentioned below the note.
- Log locations
- Spark history server locations
- Certificates (if any for Spark history server)
- Docker version
Prerequisites for enabling (TLS) HTTPS for Pulse Web UI Configuration using ad-proxy:
- Certificate File: cert.crt
- Certificate Key: cert.key
- CA Certificate: ca.crt (optional)
- Decide whether to keep the HTTP port (Default: 4000) open or not
- Decide on which port to use (default: 443)
- Obtain the fully qualified domain names (FQDN) for the Spark Master URLs for both clusters and include them in the
spark_<clustername>.hosts
file. The Spark hosts file should be structured as follows:
<http/s>://<Alias/FQDN of the Spark Master 1>:<Spark Master UI Port>
<http/s>://<Alias/FQDN of the Spark Master 1>:<Spark Master12 UI Port>
- Retrieve the fully qualified domain names (FQDN) for the Spark History Server URLs for both clusters. When requested, provide the URL in the following format:
<http/s>://<Alias/FQDN of the Spark History Server URL>:<Spark History Server URL>
- Obtain the fully qualified domain names (FQDN) for the Zookeeper Server URLs for both clusters and place them in the
zk_<clustername>.hosts
file. The Zookeeper Hosts file should adhere to the following format:
<http/s>://<Alias/FQDN for the Zookeeper Server>:<Zookeeper Server Port>
- Retrieve the log locations for the application and deployment logs, as well as the
SPARK_HOME
directory for both clusters. - Ensure that the Docker version is >= 20.10.x.
Uninstallation
To uninstall agents, perform the following:
- To uninstall agents, you must run the
hystaller uninstall
command through their ansible setup. - You must remove the Pulse Spark Hook Jars from the locations along with the related configurations from the Spark master and worker nodes.
- Acceldata team must then perform the following steps using the command below to backup and uninstall the existing Pulse application.
- Create a backup directory:
mkdir -p /data01/backup
- As a backup, copy the entire
config
andwork
directories:cp -R $AcceloHome/config /data01/backup/
cp -R $AcceloHome/work /data01/backup/`` - Uninstall the existing Pulse setup by running the following command:
accelo uninstall local
- Create a backup directory:
OUTPUT
[root@nifihost1:data01 (ad-default)]$ accelo uninstall local
✗ You're about to uninstall the local AccelData setup. This will also DELETE all persistent data from the current node. However, NONE of the remote no
✔ You're about to uninstall the local AccelData setup. This will also DELETE all persistent data from the current node. However, NONE of the remote no
✔ You're about to uninstall the local AccelData setup. This will also DELETE all persistent data from the current node. However, NONE of the remote no
You're about to uninstall the local AccelData setup. This will also DELETE all persistent data from the current node. However, NONE of the remote nodes will be affected. Please confirm your action [y/n]: : y
WARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DB
Uninstalling the AccelData components from local machine ...
Executing this action will remove all files, folders, docker containers, docker images, and the entire Acceldata directory.
- Logout of the terminal session.
Download and Load Binaries and Docker Images
To download and load binaries and Docker images, perform the following:
When downloading the Pulse all-in-one TAR file, you must also download the hystaller binary separately for Pulse version 3.3.3 and perform the following:
- Download all the 3.3.3 binaries.
- Replace the hystaller binary with the direct download link provided by the Acceldata team.
- Download the jars, hystaller, accelo binaries, and docker images from the download links provided by the Acceldata team.
- Move the Docker images and jars into the following directory:
mkdir -p /data01/images
- Copy the Binaries and Tar files into the
/data01/images
folder.
cp </path/to/binaries/tar> /data01/images
- Change the directory
cd /data01/images
- Extract the single tar file
tar xvf <name_of_tar_file>.tar
OUTPUT
[root@nifihost1 images]# tar xvf pulse-333-beta.tar
./ad-alerts.tgz
./ad-connectors.tgz
./ad-dashplots.tgz
./ad-database.tgz
./ad-deployer.tgz
./ad-director.tgz
./ad-elastic.tgz
./ad-events.tgz
./ad-fsanalyticsv2-connector.tgz
./ad-gauntlet.tgz
./ad-graphql.tgz
./ad-hydra.tgz
./ad-impala-connector.tgz
./ad-kafka-0-10-2-connector.tgz
./ad-kafka-connector.tgz
./ad-ldap.tgz
./ad-logsearch-curator.tgz
./ad-logstash.tgz
./ad-notifications.tgz
./ad-oozie-connector.tgz
./ad-pg.tgz
./ad-proxy.tgz
./ad-pulsemon-ui.tgz
./ad-recom.tgz
./ad-sparkstats.tgz
./ad-sql-analyser.tgz
./ad-streaming.tgz
./ad-vminsert.tgz
./ad-vmselect.tgz
./ad-vmstorage.tgz
./accelo.linux
./admon
./hystaller
- To load the Docker images, execute the following command:
ls -1 *.tgz | xargs --no-run-if-empty -L 1 docker load -i
- Check if all the images are loaded to the server using the following command:
docker images | grep 3.3.3
Configure the Cluster
To configure the cluster in Pulse, perform the following:
- Validate all the host files.
- Create the
acceldata
directory by running the following command:
cd /data01/
mkdir -p acceldata
- Copy the Spark hosts and Zookeeper host files in
acceldata
directory by running the following command:
cp </path/to/hosts_files> /data01/acceldata
- Place the
accelo
binary in this/data01/acceldata
directory:
cp </path/to/accelo/binary> /data01/acceldata
- Rename the
accelo.linux
binary toaccelo
.
mv /data01/acceldata/accelo.linux accelo
chmod +x /data01/acceldata/accelo
- Change the directory:
cd /data01/acceldata/accelo
- Run the following command to perform
accelo init:
./accelo init
- Enter appropriate answers when prompted.
- Run the following command to source the
ad.sh
file:
source /etc/profile.d/ad.sh
- Run the
init
command to provide the Pulse version:
accelo init
OUTPUT
[root@nifihost1:~ (ad-default)]$ accelo init
Enter the AccelData ImageTag: : 3.3.3
✓ Done, AccelData Init Successful.
Provide the correct Pulse version, in this case it is 3.3.3
- Run
accelo info
command as follows:
accelo info
OUTPUT
[root@nifihost1:~ (ad-default)]$ accelo info
WARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DB
___ ____________________ ____ ___ _________
/ | / ____/ ____/ ____/ / / __ \/ |/_ __/ |
/ /| |/ / / / / __/ / / / / / / /| | / / / /| |
/ ___ / /___/ /___/ /___/ /___/ /_/ / ___ |/ / / ___ |
/_/ |_\____/\____/_____/_____/_____/_/ |_/_/ /_/ |_|
Accelo CLI Version: 3.3.3-beta
Accelo CLI Build Hash: 8ba4727f11e5b3f3902547585a37611b6ec74e7c
Accelo CLI Build ID: 1700746329
Accelo CLI Builder ID: ZEdjMmxrYUdGdWRGOWhZMk5sYkdSaEVLCg==
Accelo CLI Git Branch Hash: TXdLaTlCVDFBdE56STNvPQo=
AcceloHome: /data01/acceldata
AcceloStack: ad-default
AccelData Registry: 191579300362.dkr.ecr.us-east-1.amazonaws.com/acceldata
AccelData ImageTag: 3.3.3-beta
Active Cluster Name: NotFound
AcceloConfig Mongo DB Retention days: 15
AcceloConfig Mongo DB HDFS Reports Retention days: 15
AccelConfig TSDB Retention days: 31d
Number of AccelData stacks found in this node: 0
- To configure the cluster in Pulse, run the
config cluster
command:
accelo config cluster
- Provide the correct information when prompted. The output must appear as follows:
[root@nifihost1:acceldata (ad-default)]$ accelo config cluster
INFO: Configuring the cluster ...
INFO: Using default API Version v10 for CM API
Is the 'Database Service' up and running? [y/n]: : n
WARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DB
✔ Stand-Alone
✔ Spark
Enter Your Cluster's Display Name: : spark341
Enter the cluster name to use (MUST be all lowercase & unique): : spark341
ERROR: stat /data01/acceldata/.activecluster: no such file or directory
INFO: Creating Post dirs.
Enter the hosts file path for Spark-On-StandAlone (MUST formatted, one IP/host per line): : spark_spark341.hosts
The hostname for the spark worker node is : : kafka2.ops.iti.acceldata.dev
The hostname for the spark worker node is : : kafka1.ops.iti.acceldata.dev
✔ The hostname for the spark worker node is : : nifihost2.ops.iti.acceldata.dev█
Is Zookeeper installed in the cluster: [Y/N]: Y
Enter the hosts file path for Zookeeper Hosts (MUST formatted, one IP/host per line): : spark341_zookeeper.hosts
Enter the Spark History URL (with http/https): : https://10.90.6.169:18480
✔ The hostname for the spark history server is : : nifihost2.ops.iti.acceldata.dev█
INFO: min-reports is set to default value 10
INFO: Purging old config files
✓ acceldata.conf file generated successfully.
INFO: Creating post config files
INFO: Writing the .dist files
INFO: Clustername : spark341
INFO: Performing PreCheck of Files
INFO: Setting the active cluster
WARN: Cannot find the pulse.yaml file, getting the values from acceldata.conf file
Creating hydra inventory
✔ SSH Key Algorithm used (RSA/DSA)?: : RSA█
Which user should connect over SSH: : root
SSH private key file path for connecting to hosts: : /root/.ssh/id_rsa
nifihost1.ops.iti.acceldata.dev is the hostname of the Pulse Server, Is this correct? [Y/N]: : y
✔ Enter the JMX Port for zookeeper_server: : 8989█
✔ Enter the JMX Port for zookeeper_server: : 8989█
Enter the JMX Port for zookeeper_server: : 8989
✔ Would you like to enable NTP Stats? [y/n]: : y█
Would you like to setup LogSearch? [y/n]: : y
? Select the logs for components that are installed/enabled in your target cluster: spark_application, zookeeper, syslog, kern, spark_jobhistoryserver, spark_master, spark_worker
✓ Generated the vars.yml file successfully
Configuring notifications
✓ Generated the notifications.yml file successfully
Configuring notifications
✓ Generated the actions notifications.yml file successfully
INFO: Please run 'accelo deploy core' to deploy APM core using this configuration.
[root@nifihost1:acceldata (ad-default)]$
- Run the
config cluster command
for all the clusters and provide the appropriate answers when prompted.
[root@nifihost1:acceldata (ad-default)]$ accelo config cluster
INFO: Configuring the cluster ...
INFO: Using default API Version v10 for CM API
✔ Is the 'Database Service' up and running? [y/n]: : n█
WARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DB
✔ Stand-Alone
✔ Spark
Enter Your Cluster's Display Name: : spark330
Enter the cluster name to use (MUST be all lowercase & unique): : spark330
Enter the hosts file path for Spark-On-StandAlone (MUST formatted, one IP/host per line): : spark_330.hosts
The hostname for the spark worker node is : : sac03.acceldata.dvl
The hostname for the spark worker node is : : sac02.acceldata.dvl
Is Zookeeper installed in the cluster: [Y/N]: Y
Enter the hosts file path for Zookeeper Hosts (MUST formatted, one IP/host per line): : zookeeper.hosts
✔ Enter the Spark History URL (with http/https): : http://sac01.acceldata.dvl:18080█
INFO: min-reports is set to default value 10
INFO: Purging old config files
✓ acceldata.conf file generated successfully.
INFO: Creating post config files
INFO: Writing the .dist files
INFO: Clustername : spark330
INFO: Performing PreCheck of Files
INFO: Setting the active cluster
Creating hydra inventory
SSH Key Algorithm used (RSA/DSA)?: : RSA
✔ Which user should connect over SSH: : root█
SSH private key file path for connecting to hosts: : /root/.ssh/id_rsa
nifihost1.ops.iti.acceldata.dev is the hostname of the Pulse Server, Is this correct? [Y/N]: : y
Enter the JMX Port for zookeeper_server: : 8989
✔ Would you like to enable NTP Stats? [y/n]: : y█
Would you like to setup LogSearch? [y/n]: : y
? Select the logs for components that are installed/enabled in your target cluster: syslog, kern, spark_jobhistoryserver, spark_master, spark_worker, spark_application, zookeeper
✓ Generated the vars.yml file successfully
Configuring notifications
✓ Generated the notifications.yml file successfully
Configuring notifications
✓ Generated the actions notifications.yml file successfully
INFO: Please run 'accelo deploy core' to deploy APM core using this configuration.
[root@nifihost1:acceldata (ad-default)]$
- Run the
config cluster command
for Nifi Stand-Alone and select standalone > nifi.
[root@nifihost1:acceldata (ad-default)]$ accelo config cluster
INFO: Configuring the cluster ...
INFO: Using default API Version v10 for CM API
Is the 'Database Service' up and running? [y/n]: : n
WARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DB
✔ Stand-Alone
✔ Nifi
Enter Your Cluster's Display Name: : nifisa
✔ Enter the cluster name to use (MUST be all lowercase & unique): : nifisa█
INFO: Creating Post dirs.
INFO: Getting the Nifi Host List
Enter the hosts file path for Nifi (One Nifi URL per line, Must be formatted): : nifi.hosts
Discovered NIFI Hosts:
✓ nifihost1.ops.iti.acceldata.dev
Would you like to continue with the above NIFI nodes? [y/n]: : Y
INFO: min-reports is set to default value 10
INFO: Purging old config files
✓ acceldata.conf file generated successfully.
INFO: Creating post config files
INFO: Writing the .dist files
INFO: Clustername : nifisa
INFO: Performing PreCheck of Files
INFO: Setting the active cluster
Creating hydra inventory
✔ SSH Key Algorithm used (RSA/DSA)?: : RSA█
Which user should connect over SSH: : root
✔ SSH private key file path for connecting to hosts: : /root/.ssh/id_rsa█
nifihost1.ops.iti.acceldata.dev is the hostname of the Pulse Server, Is this correct? [Y/N]: : y
Would you like to enable NTP Stats? [y/n]: : y
Would you like to enable NTP Stats? [y/n]: : y
Would you like to setup LogSearch? [y/n]: : y
? Select the logs for components that are installed/enabled in your target cluster: syslog, nifi, kern
✓ Generated the vars.yml file successfully
Configuring notifications
✓ Generated the notifications.yml file successfully
Configuring notifications
✓ Generated the actions notifications.yml file successfully
INFO: Please run 'accelo deploy core' to deploy APM core using this configuration.
[root@nifihost1:acceldata (ad-default)]$
Copy the License
Place the license file provided by the Acceldata team in the work directory as shown below:
cp </path/to/license> /data01/acceldata/work
Deploy Pulse Core Components
Deploy the Pulse core components by running the following command:
accelo deploy core
The output must appear as follows:
[root@nifihost1:acceldata (ad-default)]$ accelo deploy core
ERROR: Cannot connect to DB, Because: cannot connect to mongodb
WARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DB
Have you verified the acceldata config file at '/data01/acceldata/config/acceldata_spark341.conf' ? [y/n]: : y
✓ accelo.yml file found and parsed
✓ AcceloEvents - events.json file found and parsed
✓ acceldata conf file found and parsed
✓ .dist file found and parsed
✓ hydra_hosts.yml file found and parsed
✓ vars.yml file found and parsed
✓ alerts notification.yml file found and parsed
✓ actions notification.yml file found and parsed
✓ alerts default-endpoints.yml file found and parsed
✓ override.yml file found and parsed
✓ gauntlet_mongo_spark341.yml file found and parsed
✓ gauntlet_elastic.yml file found and parsed
INFO: No existing AccelData networks found. Current stack 'ad-default' is missing.
INFO: Trying to create a new network ..
INFO: If you're setting up AccelData for the first time give 'y' to the below.
Would you like to initiate DB with the config file '/data01/acceldata/config/acceldata'? [y/n]: : y
Creating group monitors [================================================================================================>-------------------] 83.33%INFO: Pushing the hydra_hosts.yml to mongodb
Deployment Completed [==============================================================================================>--------------------] 81.82% 28s
✓ Done, Core services deployment completed.
Now, you can access the AccelData APM Server at the configured port of this node.
To deploy the AccelData addons, Run './accelo deploy addons'
Deploy Add-ons
To deploy the Pulse add-ons, run the code below and select the required components for Spark standalone:
accelo deploy addons
The output must appear as follows:
[root@nifihost1:acceldata (ad-default)]$ accelo deploy addons
WARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DB
INFO: Active Cluster: spark341
? Select the components you would like to install: Alerts (Agents MUST be configured), Core Connectors, Dashplot, Director (Agents MUST be configured), HYDRA, LogSearch, Notifications
Starting the deployment ..
Completed [==============================================================================================================================] 137.50% 29s
✓ Done, Addons deployment completed.
Configure Alerts Notifications
To configure alerts notifications, perform the following:
- Set the active cluster by running the following command:
accelo set
- Configure the alerts notifications using the following command:
[root@nifihost1:acceldata (ad-default)]$ accelo config alerts notifications
Enter the JODA Timezone value (Example: Asia/Jakarta): : Asia/Kolkata
? Select the metric groups you would like to enable: druid, nifi, ntpd, anomaly, chrony, customApp
? Select the notifications you would like to enable: email
INFO: Configuring Email Notifications:
Enter Email DefaultToEmailIds (comma separated list): :
Enter Email DefaultSnoozeIntervalInSecs: : 0
Enter Email MaxEmailThreshold: : 1
✓ Done, Alerts Notifications Configuration file generated
✓ Done, Alerts Notifications pushed to Pulse DB
OUTPUT
[root@nifihost1:acceldata (ad-default)]$ accelo config alerts notifications
Enter the JODA Timezone value (Example: Asia/Jakarta): : Asia/Kolkata
? Select the metric groups you would like to enable: druid, nifi, ntpd, anomaly, chrony, customApp
? Select the notifications you would like to enable: email
INFO: Configuring Email Notifications:
Enter Email DefaultToEmailIds (comma separated list): :
Enter Email DefaultSnoozeIntervalInSecs: : 0
Enter Email MaxEmailThreshold: : 1
✓ Done, Alerts Notifications Configuration file generated
✓ Done, Alerts Notifications pushed to Pulse DB
- Set cluster2 as the active cluster:
accelo set
- Configure the alerts for second cluster:
[root@nifihost1:acceldata (ad-default)]$ accelo config alerts notifications
Enter the JODA Timezone value (Example: Asia/Jakarta): : Asia/Kolkata
? Select the metric groups you would like to enable: druid, nifi, ntpd, anomaly, chrony, customApp
? Select the notifications you would like to enable: email
INFO: Configuring Email Notifications:
Enter Email DefaultToEmailIds (comma separated list): :
Enter Email DefaultSnoozeIntervalInSecs: : 0
Enter Email MaxEmailThreshold: : 1
✓ Done, Alerts Notifications Configuration file generated
✓ Done, Alerts Notifications pushed to Pulse DB
- Set cluster3 as the active cluster:
accelo set
- Configure the alerts for the third cluster:
[root@nifihost1:acceldata (ad-default)]$ accelo config alerts notifications
Enter the JODA Timezone value (Example: Asia/Jakarta): : Asia/Kolkata
? Select the metric groups you would like to enable: druid, nifi, ntpd, anomaly, chrony, customApp
? Select the notifications you would like to enable: email
INFO: Configuring Email Notifications:
✔ Enter Email DefaultSnoozeIntervalInSecs: : 0█
mEnter Email MaxEmailThreshold: : 11█
✔ Enter Email MaxEmailThreshold: : 1█
✓ Done, Alerts Notifications Configuration file generated
✓ Done, Alerts Notifications pushed to Pulse DB
- Restart the alerts notifications:
accelo restart ad-alerts
OUTPUT
[root@nifihost1:spark341 (ad-default)]$ accelo restart ad-alerts
WARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DB
✗ You're about to restart AccelData services. This will restart all or any specified the service. However, any persistent data will be left untouched.
✔ You're about to restart AccelData services. This will restart all or any specified the service. However, any persistent data will be left untouched.
✔ You're about to restart AccelData services. This will restart all or any specified the service. However, any persistent data will be left untouched.
You're about to restart AccelData services. This will restart all or any specified the service. However, any persistent data will be left untouched. Please confirm your action [y/n]: : y
Completed [===============================================================================================================================] 100.00% 1s
Restart ad-alerts completed ✓
Database Push Configuration
Run the following command to push config to db:
accelo admin datbase push-config -a
Configure the Override
- Change the dir to
work/<clustername>
.
cd /data01/acceldata/work/<clustername>
- Modify the
override.yml
file.
vi override.yml
- Paste the below config in the file.
log_locations:
kern:
- path: /var/log/kern.log
type: DATESTAMP
spark_application:
- path: <SPARK_HOME>/work/*/*/stdout,<SPARK_HOME>/work/*/*/stderr
type: SPARK_APPLICATION
spark_jobhistoryserver:
- path: <SPARK_HOME>/logs/spark-*-org.apache.spark.deploy.history.HistoryServer-*-*.out
type: YARN_APP
spark_master:
- path: <SPARK_HOME>/logs/spark-*-org.apache.spark.deploy.master.Master-*-*.out
type: YARN_APP
spark_worker:
- path: <SPARK_HOME>/logs/spark-*-org.apache.spark.deploy.worker.Worker-*-*.out
type: YARN_APP
sysloroot:
- path: /var/log/syslog,/var/log/messages
type: DATESTAMP
hydra:
hostname_method: ENV
Do the above steps for all clusters.
Deploy the Pulse Agents
Install the new Pulse version 3.3.3 agents on all cluster nodes. Make a copy of the new hystaller file to /tmp or any executable location on all cluster nodes and then run the following command on all cluster nodes.
Change the following code snippet as per your environment
PULSE_HOME="/opt/pulse"
PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin"
HYDRA_SERVER_URL="http://<PULSE_SERVER_HOSTNAME>:19072"
HYDRA_HEARTBEAT_DURATION="60"
HYDRA_PARCEL_MODE="False"
HYDRA_HOSTNAME_CASE="lower"
HYDRA_HOSTNAME_METHOD="ENV"
HYDRA_HEARTBEAT_JITTER="10"
PULSE_HOSTNAME="<FQDN/Alias of the server where hydra to be installed>"
sudo env "PULSE_HOME=$PULSE_HOME" "PULSE_HOSTNAME=$PULSE_HOSTNAME" "PATH=$PATH" "HYDRA_SERVER_URL=$HYDRA_SERVER_URL" "HYDRA_HEARTBEAT_DURATION=$HYDRA_HEARTBEAT_DURATION" "HYDRA_PARCEL_MODE=$HYDRA_PARCEL_MODE" "HYDRA_HOSTNAME_CASE=$HYDRA_HOSTNAME_CASE" "HYDRA_HOSTNAME_METHOD=$HYDRA_HOSTNAME_METHOD" "HYDRA_HEARTBEAT_JITTER=$HYDRA_HEARTBEAT_JITTER" /tmp/hystaller install
Reconfig Cluster
- After completing the edits to the override files as outlined above, the next step is to run the following command:
accelo reconfig cluster -a
OUTPUT
[root@nifihost1:spark341 (ad-default)]$ accelo reconfig cluster -a
INFO: Using default API Version v10 for CM API
WARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DB
yINFO: Read Cluster Info for spark341 From MongoDB
INFO: Clustername : spark341
INFO: Reconfiguring the spark341 cluster
Zookeeper Enabled
INFO: Getting the Spark Master List
✓ https://nifihost2.ops.iti.acceldata.dev:8480
INFO: Getting the Spark Worker List
✓ nifihost2.ops.iti.acceldata.dev
✓ kafka2.ops.iti.acceldata.dev
✓ kafka1.ops.iti.acceldata.dev
INFO: Purging old config files
INFO: Pushing the hydra_hosts.yml to mongodb
INFO: Regenerating vars.yml
INFO: Merging the override.yml with vars struct...
WARN: /data01/acceldata/config/users/passwd already being generated
INFO: Pushing vars tar
INFO: Updating the Epoch Time
INFO: Reloading the Hydra Server
INFO: Read Cluster Info for spark330 From MongoDB
INFO: Clustername : spark330
INFO: Reconfiguring the spark330 cluster
Zookeeper Enabled
INFO: Getting the Spark Master List
✓ http://sac01.acceldata.dvl:8080
✓ http://sac02.acceldata.dvl:8080
INFO: Getting the Spark Worker List
✓ sac03.acceldata.dvl
✓ sac02.acceldata.dvl
INFO: Purging old config files
INFO: Pushing the hydra_hosts.yml to mongodb
INFO: Regenerating vars.yml
INFO: Merging the override.yml with vars struct...
WARN: /data01/acceldata/config/users/group already being generated
WARN: /data01/acceldata/config/users/passwd already being generated
INFO: Pushing vars tar
INFO: Updating the Epoch Time
INFO: Reloading the Hydra Server
INFO: Read Cluster Info for nifisa From MongoDB
INFO: Clustername : nifisa
INFO: Reconfiguring the nifisa cluster
INFO: Getting the Nifi Host List
Discovered NIFI Hosts:
✓ nifihost1.ops.iti.acceldata.dev
INFO: Purging old config files
INFO: Pushing the hydra_hosts.yml to mongodb
INFO: Regenerating vars.yml
WARN: /data01/acceldata/config/users/group already being generated
WARN: /data01/acceldata/config/users/passwd already being generated
INFO: Pushing vars tar
INFO: Updating the Epoch Time
INFO: Reloading the Hydra Server
- DB Push Config
accelo admin database push-config -a
Adding Edge Nodes for Monitoring
These are edge nodes that are not the part of the spark standalone cluster.
- Change the dir to
work/<clustername>
.
cd /data01/acceldata/work/<clustername>
- Modify the
hydra_hosts_override.yml
file.
vi hydra_hosts_override.yml
- Add the following code to add a host to a already existing host for pulse to monitor:
hosts:
append:
- <Alias/FQDN>
- Run the
accelo reconfig cluster
command for clusters with edge nodes that require monitoring by Pulse. Alternatively, for comprehensive coverage, perform a reconfig cluster on all clusters.
accelo reconfig cluster -a
- Check the
hydra_hosts.yml
file which will now contain the new hosts as well. For example:
cluster:
hosts:
sac01.acceldata.dvl: ""
sac02.acceldata.dvl: ""
sac03.acceldata.dvl: ""
sac04.acceldata.dvl: "" ----- NEW HOSTS
Configure Gauntlet
Updating the Gauntlet Crontab Duration
- Check if the
ad-core.yml
file is present or not by running the following command:
ls -al $AcceloHome/config/docker/ad-core.yml
- If the file above is not present, then generate it by:
accelo admin makeconfig ad-core
- Edit the
ad-core.yml
file
a. Open the file:
vi $AcceloHome/config/docker/ad-core.yml
b. Update the CRON_TAB_DURATION
env variable in the ad-gauntlet
section:
CRON_TAB_DURATION=0 0 */2 * *
This makes gauntlet run every 2 days at midnight.
c. The updated file will look something like this:
ad-gauntlet:
image: ad-gauntlet
container_name: ad-gauntlet
environment:
- MONGO_URI=ZN4v8cuUTXYvdnDJIDp+R8Z+ZsVXXjv8zDOvh8UwQXosC8vfVkGYGWGPNnX64ZVSp9yHgErQknPBAfYZ9cOG1A==
- MONGO_ENCRYPTED=true
- ELASTIC_ADDRESSES=http://ad-elastic:9200
- DRY_RUN_ENABLE=true
- CRON_TAB_DURATION=0 0 */2 * *
volumes:
- /etc/localtime:/etc/localtime:ro
- /root/acceldata/config/logsearch/gauntlet_elastic.yml:/gauntlet/config/config.yml
- /root/acceldata/logs/logsearch/:/gauntlet/logs/
ulimits: {}
ports: []
depends_on: []
opts: {}
restart: ""
extra_hosts: []
network_alias: []
d. Save the file.
- Restart gauntlet service by running the command:
accelo restart ad-gauntlet
Updating the Gauntlet Dry Run Mode
- Check if the
ad-core.yml
file is present or not by running the following command:
ls -al $AcceloHome/config/docker/ad-core.yml
- If the file above is not present, then generate it by:
accelo admin makeconfig ad-core
- Edit the
ad-core.yml
file.
a. Open the file.
vi $AcceloHome/config/docker/ad-core.yml
b. Update the DRY_RUN_ENABLE
env variable in the ad-gauntlet
section:
DRY_RUN_ENABLE=false
This will make the gauntlet delete the order elastic indices and mongo db data.
c. The updated file will look something like this:
ad-gauntlet:
image: ad-gauntlet
container_name: ad-gauntlet
environment:
- MONGO_URI=ZN4v8cuUTXYvdnDJIDp+R8Z+ZsVXXjv8zDOvh8UwQXosC8vfVkGYGWGPNnX64ZVSp9yHgErQknPBAfYZ9cOG1A==
- MONGO_ENCRYPTED=true
- ELASTIC_ADDRESSES=http://ad-elastic:9200
- DRY_RUN_ENABLE=false
- CRON_TAB_DURATION=0 0 */2 * *
volumes:
- /etc/localtime:/etc/localtime:ro
- /root/acceldata/config/logsearch/gauntlet_elastic.yml:/gauntlet/config/config.yml
- /root/acceldata/logs/logsearch/:/gauntlet/logs/
ulimits: {}
ports: []
depends_on: []
opts: {}
restart: ""
extra_hosts: []
network_alias: []
d. Save the file.
- Restart gauntlet service by running the command:
accelo restart ad-gauntlet
Updating MongoDB Cleanup and Compaction Frequency in Hours
By default, when dry run is disabled MongoDB cleanup and compaction will run once a day. To configure the frequency, follow the steps listed below.
- Run the following command:
accelo config retention
- Answer the following prompts, if you’re unsure about how many days you wish to retain. Then proceed with the default values.
✔ How many days of data would you like to retain at Mongo DB ?: 15
✔ How many days of data would you like to retain at Mongo DB for HDFS reports ?: 15
✔ How many days of data would you like to retain at TSDB ?: 31
- When the following prompt comes up, specify the hours of the day during which you would like MongoDB clean up and compaction to run. The value must be a CSV of hours as per the 24 hour time notation.
✔ How often should Mongo DB clean up & compaction run, provide a comma separated string of hours (valid values are [0,23] (Ex. 8,12,15,18)?: 0,6,12,18
- Run the following command. When gauntlet runs the next time, MongoDB clean up and compaction will run at the specified hours, once per hour.
accelo admin database push-config
Enabling (TLS) HTTPS for Pulse Web UI Configuration Using ad-proxy
Deployment and Configuration
- Copy the
cert.crt
,cert.key
andca.crt
(optional) files to$AcceloHome/config/proxy/certs
location. - Check if
ad-core.yml
file is present or not.
ls -al $AcceloHome/config/docker/ad-core.yml
- If
ad-core.yml
file is not present, then generate thead-core.yml
file.
accelo admin makeconfig ad-core
OUTPUT
[root@hostname:addons (ad-default)]$ accelo admin makeconfig ad-core
WARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DB
✓ Done, Configuration file generated
IMPORTANT: Please edit/verify the file '/data01/acceldata/config/docker/ad-core.yml'.
If the stack is already up and running, use './accelo admin recreate' to recreate the whole environment with the new configuration.
- Modify the
ad-core.yml
file.
a. Open the ad-core.yml
file.
vi $AcceloHome/config/docker/ad-core.yml
b. Remove the ports:
field in the ad-graphql
section of ad-core.yml
.
ports:
- 4000:4000
c. The resulting ad-graphql
section will look like this:
ad-graphql:
image: ad-graphql
container_name: ""
environment:
- MONGO_URI=ZN4v8cuUTXYvdnDJIDp+R8Z+ZsVXXjv8zDOvh8UwQXosC8vfVkGYGWGPNnX64ZVSp9yHgErQknPBAfYZ9cOG1A==
- MONGO_ENCRYPTED=true
- MONGO_SECRET=Ah+MqxeIjflxE8u+/wcqWA==
- UI_PORT=4000
- LDAP_HOST=ad-ldap
- LDAP_PORT=19020
- SSL_ENFORCED=false
- SSL_ENABLED=false
- SSL_KEYDIR=/etc/acceldata/ssl/
- SSL_KEYFILE=ssl.key
- SSL_CERTDIR=/etc/acceldata/ssl/
- SSL_CERTFILE=ssl.crt
- SSL_PASSPHRASE=""
- DS_HOST=ad-query-estimation
- DS_PORT=8181
- 'FEATURE_FLAGS={ "ui_regex": { "regex": "ip-([^.]+)", "index": 1 }, "rename_nav_labels":{},
"timezone": "", "experimental": true, "themes": false, "hive_const":{ "HIVE_QUERY_COST_ENABLED":
false, "HIVE_MEMORY_GBHOUR_COST": 0, "HIVE_VCORE_HOUR_COST": 0 }, "spark_const":
{ "SPARK_QUERY_COST_ENABLED": false, "SPARK_MEMORY_GBHOUR_COST": 0, "SPARK_VCORE_HOUR_COST":
0 }, "queryRecommendations": false, "hostIsTrialORLocalhost": false, "data_temp_string":
"" }'
volumes:
- /etc/localtime:/etc/localtime:ro
- /etc/hosts:/etc/hosts:ro
- /data01/acceldata/work/license:/etc/acceldata/license:ro
ulimits: {}
depends_on:
- ad-db
opts: {}
restart: ""
extra_hosts: []
network_alias: []
d. Save the file
- Restart the
ad-graphql
container:
accelo restart ad-graphql
- Check if the port is not exposed to the host.
docker ps
- Check if there are any errors in
ad-graphql
container:
docker logs -f ad-graphql_default
- Deploy the
ad-proxy
addons, run the following command, and selectProxy
from the list and press enter.
accelo deploy addons
OUTPUT
[x] Notifications
[x] Oozie Connector
> [x] Proxy
[ ] QUERY ROUTER DB
[ ] SHARD SERVER DB
[ ] StandAlone Connector
- Check if there are any errors in the
ad-proxy
container:
docker logs -f ad-proxy_default
- Now you can access the Pulse UI using
https://<pulse-server-hostname>
. By default the port used is443
.
Configuration
If you want to change the SSL port to another port, follow the below steps:
- Check if
ad-proxy.yml
file is present or not.
ls -altrh $AcceloHome/config/docker/addons/ad-proxy.yml
- Generate the
ad-proxy.yml
file if its not present.
accelo admin makeconfig ad-proxy
OUTPUT
[root@hostname:addons (ad-default)]$ accelo admin makeconfig ad-proxy
WARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DB
✓ Done, Configuration file generated
IMPORTANT: Please edit/verify the file '/data01/acceldata/config/docker/addons/ad-proxy.yml'.
If the addon is already up and running, use './accelo deploy addons' to remove and recreate the addon service.
- Modify the
ad-proxy.yml
.
a. Open the ad-proxy.yml
file.
vi $AcceloHome/config/docker/addons/ad-proxy.yml
b. Change the host port in the ports list to the desired port.
ports:
- <DESIRED_HOST_PORT>:443
The final file will look like this, if the host port is 6003
:
version: "2"
services:
ad-proxy:
image: ad-proxy
container_name: ""
environment: []
volumes:
- /etc/localtime:/etc/localtime:ro
- /data01/acceldata/config/proxy/traefik.toml:/etc/traefik/traefik.toml
- /data01/acceldata/config/proxy/config.toml:/etc/traefik/conf/config.toml
- /data01/acceldata/config/proxy/certs:/etc/acceldata
ulimits: {}
ports:
- 6003:443
depends_on: []
opts: {}
restart: ""
extra_hosts: []
network_alias: []
label: Proxy
c. Save the file.
- Restart the
ad-proxy
container.
accelo restart ad-proxy
- Check if there aren’t any errors.
docker logs -f ad-proxy_default
- Now you can access the Pulse UI using
https://<pulse-server-hostname>:6003
.
Set Up LDAP for Pulse UI
- Check if the
ldap.conf
is present or not.
ls -al $AcceloHome/config/ldap/ldap.conf
- Run the
accelo config ldap
command to generate the defaultldap.conf
if not present already.
accelo configure ldap
OUTPUT
There is no ldap config file available
Generating a new ldap config file
Please edit '$AcceloHome/config/ldap/ldap.conf' and rerun this command
- Edit the file in path
$AcceloHome/config/ldap/ldap.conf
.
vi $AcceloHome/config/ldap/ldap.conf
- Configure file for below properties:
LDAP FQDN : FQDN where LDAP server is running
- host = [FQDN]
If port 389 is being used then
- insecureNoSSL = true
SSL root CA Certificate
- rootCA = [CERTIFICATE_FILE_PATH]
bindDN : to be used for
ldap
search need to be member of admin groupbindPW :
password
for entering in database, can be removed later onceldap
gets enabledbaseDN used for user search
- Eg:
(cn=users, cn=accounts, dc=accedata, dc=io)
- Eg:
Filter used for the user search
- Eg:
(objectClass=person)
- Eg:
baseDN used for group search
- Eg:
(cn= groups, cn=accounts, dc=acceldata, dc=io)
- Eg:
Group Search: Object class used for group search
- Eg:
(objectClass= posixgroup)
- Eg:
Here is the command to check if a user has search entry access and group access in LDAP
directory:
ldapsearch -x -h <hostname> -p 389 -D "uid=admins,cn=users,dc=acceldata,dc=io" -W -b "cn=accounts,dc=acceldata,dc=io" "(&(objectClass=person)(uid=admins))"
- If the file is already generated, it will ask for the
LDAP
credentials to validate the connectivity and configurations, which are mentioned in the below steps. - Run the
accelo config ldap
command.
accelo configure ldap
- It will ask for the LDAP user credentials:
Checking LDAP connection
Enter LDAP username: gs
Enter LDAP password: *******
- If things went correctly, the below confirmation message will be displayed:
performing ldap search ou=users,dc=acceldata,dc=io sub (&(objectClass=inetOrgPerson)(uid=gs))
username "gs" mapped to entry cn=gs,ou=users,dc=acceldata,dc=io
✗ Do you want to use this configuration: y
- Press ‘y' and press 'Enter’.
OUTPUT
Ok, Updating login properties.
✓ Done, You can now login using LDAP.
- Push the ldap config.
accelo admin database push-config -a
- Run the deploy addon command.
accelo deploy addons
- Select the LDAP from the list shown and press 'Enter':
[ ] Job Runner
[ ] Kafka 0.10.2 Connector
[ ] Kafka Connector
> [x] LDAP
[ ] Log Reduce
[ ] LogSearch
[ ] Memsql Connector
OUTPUT
Starting the deployment ..
Completed [==================================================================================================] 100.00% 0s
✓ Done, Addons deployment completed.
- Run the restart command.
accelo restart ad-graphql
- Open Pulse Web UI and create default roles.
- Add ops role with required access and all incoming users with
ldap
login will come under this role automatically.
Spark Jars Placements and Spark Config Changes
Perform the following steps for all the Spark Cluster Nodes:
- Add the following configuration in the
metrics.properties
file for Spark TimeSeries data:
$SPARK_HOME/conf/metrics.properties
[root@sac01 conf]# cat metrics.properties
*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink
*.sink.graphite.host=localhost
*.sink.graphite.port=12003
*.sink.graphite.protocol=tcp
*.sink.graphite.prefix=spark.metrics
*.sink.graphite.period=20
*.source.jvm.class=org.apache.spark.metrics.source.JvmSource
- Add the following configuration in the
spark-defaults.conf
file for the Events data:
$SPARK_HOME/conf/spark-defaults.conf
[root@sac01 conf]# cat spark-defaults.conf
spark.event_wait_period_sec 30
spark.eventLog.enabled true
spark.extraListeners io.acceldata.sparkstats.AdSparkListener
spark.ad.connector.context yarn;<CLUSTERNAME>;admin;test
spark.ad.events.url http://<PULSE SERVER HOSTNAME>:19005/eventsasync
- Take the
ad-spark-hook.jar
file and put it in the following dir:
$SPARK_HOME/jars/
- Restart all the Spark services.
DotLog Download
We have introduced a feature that allows downloading of service logs in .log format. This file is not the original server log but an xlsx sheet merged into a .log format.
Perform the following to add a configurable parameter to enable this feature:
- Insert the dotLogFileDownload parameter into the feature flags property of the
ad-graphql
section found at the file path: $Acceldata_Home/config/docker/ad-core.yml.
'FEATURE_FLAGS={ "ui_regex": { "regex": "ip-([^.]+)", "index": 1 }, "rename_nav_labels":{},
"timezone": "", "experimental": true, "themes": false, "hive_const":{ "HIVE_QUERY_COST_ENABLED":
false, "HIVE_MEMORY_GBHOUR_COST": 0, "HIVE_VCORE_HOUR_COST": 0 }, "spark_const":
{ "SPARK_QUERY_COST_ENABLED": false, "SPARK_MEMORY_GBHOUR_COST": 0, "SPARK_VCORE_HOUR_COST":
0 }, "queryRecommendations": false, "hostIsTrialORLocalhost": false, "data_temp_string":
"", "dotLogFileDownload": true }'
- Restart the ad-graphl service using the following command:
accelo restart ad-graphql
New Search Bar
Perform the following to enable new search options:
- Locate the “ad-graphql“ section in file $Acceldata_Home/config/docker/ad-core.yml and under the “environment“ key, add the following line:
- NEW_SEARCH=true
- Restart the ad-graphl service using the following command:
accelo restart ad-graphql
Does a user in the Spark Standalone environment still see the Spark option in the left menu even after their access has been revoked from the role?
Create a different role that does not have Spark permission and assign that role to the user. Alternatively, you can leave it as is because even if the Spark entry is visible in the left navigation, the user will not be able to access it if access has been revoked from their role.
Are non-admin users in the Spark Standalone environment able to access Spark even though they have the appropriate role permissions for accessing Spark?
In the role edit window, click on "Select All" just below the Page permissions. Then, remove any permissions that you do not wish to grant and save the role. Any user assigned to this role should now have access to Spark in the Spark Standalone environment.
What is the reason for the absence of the Oozie workflow link between the Oozie workflow and the application ID in PULSE for a Spark job?
The Spark job's Application ID is generated by the Oozie service. It will only appear in the Pulse UI if it is available in Oozie's Web Service UI. If it is not present in the Oozie's Web Service UI, it will not be displayed in Pulse.