Spark Standalone Multi-cluster
This document provides you a step by step process on how to install single Pulse instance for multiple Spark Standalone clusters.
Pre-requisites
Ensure the following are present:
- Spark hosts: Refer to steps 1 and 2 mentioned below the note.
- Zookeeper hosts files: Refer to step 3 mentioned below the note.
- Log locations
- Spark history server locations
- Certificates (if any for Spark history server)
- Docker version
Prerequisites for enabling (TLS) HTTPS for Pulse Web UI Configuration using ad-proxy:
- Certificate File: cert.crt
- Certificate Key: cert.key
- CA Certificate: ca.crt (optional)
- Decide whether to keep the HTTP port (Default: 4000) open or not
- Decide on which port to use (default: 443)
- Obtain the fully qualified domain names (FQDN) for the Spark Master URLs for both clusters and include them in the
spark_<clustername>.hostsfile. The Spark hosts file should be structured as follows:
For Pulse 3.8.0:
<http/s>://<Alias/FQDN of the Spark Master 1>:<Spark Master UI Port><http/s>://<Alias/FQDN of the Spark Master 1>:<Spark Master12 UI Port>For Pulse 3.8.1 or later:
SparkMasterURLList: - <http/s>://<Alias/FQDN of the Spark Master 1>:<Spark Master UI Port> - <http/s>://<Alias/FQDN of the Spark Master 2>:<Spark Master UI Port>SparkWorkerURLList: - <http/s>://<Alias/FQDN of the Spark Worker 1>:<Spark Worker Port> - <http/s>://<Alias/FQDN of the Spark Worker 2>:<Spark Worker Port>- Retrieve the fully qualified domain names (FQDN) for the Spark History Server URLs for both clusters. When requested, provide the URL in the following format:
<http/s>://<Alias/FQDN of the Spark History Server URL>:<Spark History Server URL>- Obtain the fully qualified domain names (FQDN) for the Zookeeper Server URLs for both clusters and place them in the
zk_<clustername>.hostsfile. The Zookeeper Hosts file should adhere to the following format:
<http/s>://<Alias/FQDN for the Zookeeper Server>:<Zookeeper Server Port>- Retrieve the log locations for the application and deployment logs, as well as the
SPARK_HOMEdirectory for both clusters. - Ensure that the Docker version is >= 20.10.x.
Uninstallation
To uninstall agents, perform the following:
- To uninstall agents, you must run the
hystaller uninstallcommand through their ansible setup. - You must remove the Pulse Spark Hook Jars from the locations along with the related configurations from the Spark master and worker nodes.
- Acceldata team must then perform the following steps using the command below to backup and uninstall the existing Pulse application.
- Create a backup directory:
mkdir -p /data01/backup - As a backup, copy the entire
configandworkdirectories:cp -R $AcceloHome/config /data01/backup/cp -R $AcceloHome/work /data01/backup/`` - Uninstall the existing Pulse setup by running the following command:
accelo uninstall local
- Create a backup directory:
OUTPUT
[root@nifihost1:data01 (ad-default)]$ accelo uninstall local✗ You're about to uninstall the local AccelData setup. This will also DELETE all persistent data from the current node. However, NONE of the remote no✔ You're about to uninstall the local AccelData setup. This will also DELETE all persistent data from the current node. However, NONE of the remote no✔ You're about to uninstall the local AccelData setup. This will also DELETE all persistent data from the current node. However, NONE of the remote noYou're about to uninstall the local AccelData setup. This will also DELETE all persistent data from the current node. However, NONE of the remote nodes will be affected. Please confirm your action [y/n]: : yWARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DBUninstalling the AccelData components from local machine ...Executing this action will remove all files, folders, docker containers, docker images, and the entire Acceldata directory.
- Logout of the terminal session.
Download and Load Binaries and Docker Images
To download and load binaries and Docker images, perform the following:
When downloading the Pulse all-in-one TAR file, you must also download the hystaller binary separately for Pulse version 3.3.3 and perform the following:
- Download all the Pulse 3.3.3 binaries.
- Replace the hystaller binary with the direct download link provided by the Acceldata team.
- Download the jars, hystaller, accelo binaries, and docker images from the download links provided by the Acceldata team.
- Move the Docker images and jars into the following directory:
mkdir -p /data01/images- Copy the Binaries and Tar files into the
/data01/imagesfolder.
cp </path/to/binaries/tar> /data01/images- Change the directory
cd /data01/images- Extract the single tar file
tar xvf <name_of_tar_file>.tarOUTPUT
[root@nifihost1 images]# tar xvf pulse-333-beta.tar./ad-alerts.tgz./ad-connectors.tgz./ad-dashplots.tgz./ad-database.tgz./ad-deployer.tgz./ad-director.tgz./ad-elastic.tgz./ad-events.tgz./ad-fsanalyticsv2-connector.tgz./ad-gauntlet.tgz./ad-graphql.tgz./ad-hydra.tgz./ad-impala-connector.tgz./ad-kafka-0-10-2-connector.tgz./ad-kafka-connector.tgz./ad-ldap.tgz./ad-logsearch-curator.tgz./ad-logstash.tgz./ad-notifications.tgz./ad-oozie-connector.tgz./ad-pg.tgz./ad-proxy.tgz./ad-pulsemon-ui.tgz./ad-recom.tgz./ad-sparkstats.tgz./ad-sql-analyser.tgz./ad-streaming.tgz./ad-vminsert.tgz./ad-vmselect.tgz./ad-vmstorage.tgz./accelo.linux./admon./hystaller- To load the Docker images, execute the following command:
ls -1 *.tgz | xargs --no-run-if-empty -L 1 docker load -i- Check if all the images are loaded to the server using the following command:
docker images | grep 3.3.3Configure the Cluster
To configure the cluster in Pulse, perform the following:
- Validate all the host files.
- Create the
acceldatadirectory by running the following command:
cd /data01/mkdir -p acceldata- Place the
accelobinary in this/data01/acceldatadirectory:
cp </path/to/accelo/binary> /data01/acceldata- Rename the
accelo.linuxbinary toaccelo.
mv /data01/acceldata/accelo.linux accelochmod +x /data01/acceldata/accelo- Change the directory:
cd /data01/acceldata/accelo- Run the following command to perform
accelo init:
./accelo init- Enter appropriate answers when prompted.
- When the Spark master is available, you can add the following parameter in the /etc/profile.d/ad.sh file to sync the Spark worker list from the Spark master URL.
SYNC_SPARK_MASTER=true- Run the following command to source the
ad.shfile:
source /etc/profile.d/ad.sh- Run the
initcommand to provide the Pulse version:
accelo initOUTPUT
[root@nifihost1:~ (ad-default)]$ accelo initEnter the AccelData ImageTag: : 3.3.3✓ Done, AccelData Init Successful.Provide the correct Pulse version, in this case it is 3.3.3
- Run
accelo infocommand as follows:
accelo infoOUTPUT
[root@nifihost1:~ (ad-default)]$ accelo infoWARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DB ___ ____________________ ____ ___ _________ / | / ____/ ____/ ____/ / / __ \/ |/_ __/ | / /| |/ / / / / __/ / / / / / / /| | / / / /| | / ___ / /___/ /___/ /___/ /___/ /_/ / ___ |/ / / ___ |/_/ |_\____/\____/_____/_____/_____/_/ |_/_/ /_/ |_|Accelo CLI Version: 3.3.3-betaAccelo CLI Build Hash: 8ba4727f11e5b3f3902547585a37611b6ec74e7cAccelo CLI Build ID: 1700746329Accelo CLI Builder ID: ZEdjMmxrYUdGdWRGOWhZMk5sYkdSaEVLCg==Accelo CLI Git Branch Hash: TXdLaTlCVDFBdE56STNvPQo=AcceloHome: /data01/acceldataAcceloStack: ad-defaultAccelData Registry: 191579300362.dkr.ecr.us-east-1.amazonaws.com/acceldataAccelData ImageTag: 3.3.3-betaActive Cluster Name: NotFoundAcceloConfig Mongo DB Retention days: 15AcceloConfig Mongo DB HDFS Reports Retention days: 15AccelConfig TSDB Retention days: 31dNumber of AccelData stacks found in this node: 0- To configure the cluster in Pulse, run the
config clustercommand:
accelo config cluster- Provide the correct information when prompted. The output must appear as follows:
[root@nifihost1:acceldata (ad-default)]$ accelo config clusterINFO: Configuring the cluster ...INFO: Using default API Version v10 for CM APIIs the 'Database Service' up and running? [y/n]: : nWARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DB✔ Stand-Alone✔ SparkEnter Your Cluster's Display Name: : spark341Enter the cluster name to use (MUST be all lowercase & unique): : spark341ERROR: stat /data01/acceldata/.activecluster: no such file or directoryINFO: Creating Post dirs.Enter the hosts file path for Spark-On-StandAlone (MUST formatted, one IP/host per line): : spark_spark341.hostsThe hostname for the spark worker node is : : kafka2.ops.iti.acceldata.devThe hostname for the spark worker node is : : kafka1.ops.iti.acceldata.dev✔ The hostname for the spark worker node is : : nifihost2.ops.iti.acceldata.dev█Is Zookeeper installed in the cluster: [Y/N]: YEnter the hosts file path for Zookeeper Hosts (MUST formatted, one IP/host per line): : spark341_zookeeper.hostsEnter the Spark History URL (with http/https): : https://10.90.6.169:18480✔ The hostname for the spark history server is : : nifihost2.ops.iti.acceldata.dev█INFO: min-reports is set to default value 10INFO: Purging old config files✓ acceldata.conf file generated successfully.INFO: Creating post config filesINFO: Writing the .dist filesINFO: Clustername : spark341INFO: Performing PreCheck of FilesINFO: Setting the active clusterWARN: Cannot find the pulse.yaml file, getting the values from acceldata.conf fileCreating hydra inventory✔ SSH Key Algorithm used (RSA/DSA)?: : RSA█Which user should connect over SSH: : rootSSH private key file path for connecting to hosts: : /root/.ssh/id_rsanifihost1.ops.iti.acceldata.dev is the hostname of the Pulse Server, Is this correct? [Y/N]: : y✔ Enter the JMX Port for zookeeper_server: : 8989█✔ Enter the JMX Port for zookeeper_server: : 8989█Enter the JMX Port for zookeeper_server: : 8989✔ Would you like to enable NTP Stats? [y/n]: : y█Would you like to setup LogSearch? [y/n]: : y? Select the logs for components that are installed/enabled in your target cluster: spark_application, zookeeper, syslog, kern, spark_jobhistoryserver, spark_master, spark_worker✓ Generated the vars.yml file successfullyConfiguring notifications✓ Generated the notifications.yml file successfullyConfiguring notifications✓ Generated the actions notifications.yml file successfullyINFO: Please run 'accelo deploy core' to deploy APM core using this configuration.[root@nifihost1:acceldata (ad-default)]$- Run the
config cluster commandfor all the clusters and provide the appropriate answers when prompted.
[root@nifihost1:acceldata (ad-default)]$ accelo config clusterINFO: Configuring the cluster ...INFO: Using default API Version v10 for CM API✔ Is the 'Database Service' up and running? [y/n]: : n█WARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DB✔ Stand-Alone✔ SparkEnter Your Cluster's Display Name: : spark330Enter the cluster name to use (MUST be all lowercase & unique): : spark330Enter the hosts file path for Spark-On-StandAlone (MUST be formatted in pulse host file format): : spark_330.hostsThe hostname for the spark worker node is : : sac03.acceldata.dvlThe hostname for the spark worker node is : : sac02.acceldata.dvlIs Zookeeper installed in the cluster: [Y/N]: YEnter the hosts file path for Zookeeper Hosts (MUST formatted, one IP/host per line): : zookeeper.hosts✔ Enter the Spark History URL (with http/https): : http://sac01.acceldata.dvl:18080█INFO: min-reports is set to default value 10INFO: Purging old config files✓ acceldata.conf file generated successfully.INFO: Creating post config filesINFO: Writing the .dist filesINFO: Clustername : spark330INFO: Performing PreCheck of FilesINFO: Setting the active clusterCreating hydra inventorySSH Key Algorithm used (RSA/DSA)?: : RSA✔ Which user should connect over SSH: : root█SSH private key file path for connecting to hosts: : /root/.ssh/id_rsanifihost1.ops.iti.acceldata.dev is the hostname of the Pulse Server, Is this correct? [Y/N]: : yEnter the JMX Port for zookeeper_server: : 8989✔ Would you like to enable NTP Stats? [y/n]: : y█Would you like to setup LogSearch? [y/n]: : y? Select the logs for components that are installed/enabled in your target cluster: syslog, kern, spark_jobhistoryserver, spark_master, spark_worker, spark_application, zookeeper✓ Generated the vars.yml file successfullyConfiguring notifications✓ Generated the notifications.yml file successfullyConfiguring notifications✓ Generated the actions notifications.yml file successfullyINFO: Please run 'accelo deploy core' to deploy APM core using this configuration.[root@nifihost1:acceldata (ad-default)]$- Run the
config cluster commandfor Nifi Stand-Alone and select standalone > nifi.
[root@nifihost1:acceldata (ad-default)]$ accelo config clusterINFO: Configuring the cluster ...INFO: Using default API Version v10 for CM APIIs the 'Database Service' up and running? [y/n]: : nWARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DB✔ Stand-Alone✔ NifiEnter Your Cluster's Display Name: : nifisa✔ Enter the cluster name to use (MUST be all lowercase & unique): : nifisa█INFO: Creating Post dirs.INFO: Getting the Nifi Host ListEnter the hosts file path for Nifi (One Nifi URL per line, Must be formatted): : nifi.hostsDiscovered NIFI Hosts: ✓ nifihost1.ops.iti.acceldata.devWould you like to continue with the above NIFI nodes? [y/n]: : YINFO: min-reports is set to default value 10INFO: Purging old config files✓ acceldata.conf file generated successfully.INFO: Creating post config filesINFO: Writing the .dist filesINFO: Clustername : nifisaINFO: Performing PreCheck of FilesINFO: Setting the active clusterCreating hydra inventory✔ SSH Key Algorithm used (RSA/DSA)?: : RSA█Which user should connect over SSH: : root✔ SSH private key file path for connecting to hosts: : /root/.ssh/id_rsa█nifihost1.ops.iti.acceldata.dev is the hostname of the Pulse Server, Is this correct? [Y/N]: : yWould you like to enable NTP Stats? [y/n]: : yWould you like to enable NTP Stats? [y/n]: : yWould you like to setup LogSearch? [y/n]: : y? Select the logs for components that are installed/enabled in your target cluster: syslog, nifi, kern✓ Generated the vars.yml file successfullyConfiguring notifications✓ Generated the notifications.yml file successfullyConfiguring notifications✓ Generated the actions notifications.yml file successfullyINFO: Please run 'accelo deploy core' to deploy APM core using this configuration.[root@nifihost1:acceldata (ad-default)]$Copy the License
Place the license file provided by the Acceldata team in the work directory as shown below:
cp </path/to/license> /data01/acceldata/workDeploy Pulse Core Components
Deploy the Pulse core components by running the following command:
accelo deploy coreThe output must appear as follows:
[root@nifihost1:acceldata (ad-default)]$ accelo deploy coreERROR: Cannot connect to DB, Because: cannot connect to mongodbWARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DBHave you verified the acceldata config file at '/data01/acceldata/config/acceldata_spark341.conf' ? [y/n]: : y✓ accelo.yml file found and parsed✓ AcceloEvents - events.json file found and parsed✓ acceldata conf file found and parsed✓ .dist file found and parsed✓ hydra_hosts.yml file found and parsed✓ vars.yml file found and parsed✓ alerts notification.yml file found and parsed✓ actions notification.yml file found and parsed✓ alerts default-endpoints.yml file found and parsed✓ override.yml file found and parsed✓ gauntlet_mongo_spark341.yml file found and parsed✓ gauntlet_elastic.yml file found and parsedINFO: No existing AccelData networks found. Current stack 'ad-default' is missing.INFO: Trying to create a new network ..INFO: If you're setting up AccelData for the first time give 'y' to the below.Would you like to initiate DB with the config file '/data01/acceldata/config/acceldata'? [y/n]: : yCreating group monitors [================================================================================================>-------------------] 83.33%INFO: Pushing the hydra_hosts.yml to mongodbDeployment Completed [==============================================================================================>--------------------] 81.82% 28s✓ Done, Core services deployment completed.Now, you can access the AccelData APM Server at the configured port of this node.To deploy the AccelData addons, Run './accelo deploy addons'Deploy Add-ons
To deploy the Pulse add-ons, run the code below and select the required components for Spark standalone:
accelo deploy addonsThe output must appear as follows:
[root@nifihost1:acceldata (ad-default)]$ accelo deploy addonsWARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DBINFO: Active Cluster: spark341? Select the components you would like to install: Alerts (Agents MUST be configured), Core Connectors, Dashplot, Director (Agents MUST be configured), HYDRA, LogSearch, NotificationsStarting the deployment ..Completed [==============================================================================================================================] 137.50% 29s✓ Done, Addons deployment completed.Configure Alerts Notifications
To configure alerts notifications, perform the following:
- Set the active cluster by running the following command:
accelo set- Configure the alerts notifications using the following command:
[root@nifihost1:acceldata (ad-default)]$ accelo config alerts notificationsEnter the JODA Timezone value (Example: Asia/Jakarta): : Asia/Kolkata? Select the metric groups you would like to enable: druid, nifi, ntpd, anomaly, chrony, customApp? Select the notifications you would like to enable: emailINFO: Configuring Email Notifications:Enter Email DefaultToEmailIds (comma separated list): :Enter Email DefaultSnoozeIntervalInSecs: : 0Enter Email MaxEmailThreshold: : 1✓ Done, Alerts Notifications Configuration file generated✓ Done, Alerts Notifications pushed to Pulse DBOUTPUT
[root@nifihost1:acceldata (ad-default)]$ accelo config alerts notificationsEnter the JODA Timezone value (Example: Asia/Jakarta): : Asia/Kolkata? Select the metric groups you would like to enable: druid, nifi, ntpd, anomaly, chrony, customApp? Select the notifications you would like to enable: emailINFO: Configuring Email Notifications:Enter Email DefaultToEmailIds (comma separated list): :Enter Email DefaultSnoozeIntervalInSecs: : 0Enter Email MaxEmailThreshold: : 1✓ Done, Alerts Notifications Configuration file generated✓ Done, Alerts Notifications pushed to Pulse DB- Set cluster2 as the active cluster:
accelo set- Configure the alerts for second cluster:
[root@nifihost1:acceldata (ad-default)]$ accelo config alerts notificationsEnter the JODA Timezone value (Example: Asia/Jakarta): : Asia/Kolkata? Select the metric groups you would like to enable: druid, nifi, ntpd, anomaly, chrony, customApp? Select the notifications you would like to enable: emailINFO: Configuring Email Notifications:Enter Email DefaultToEmailIds (comma separated list): :Enter Email DefaultSnoozeIntervalInSecs: : 0Enter Email MaxEmailThreshold: : 1✓ Done, Alerts Notifications Configuration file generated✓ Done, Alerts Notifications pushed to Pulse DB- Set cluster3 as the active cluster:
accelo set- Configure the alerts for the third cluster:
[root@nifihost1:acceldata (ad-default)]$ accelo config alerts notificationsEnter the JODA Timezone value (Example: Asia/Jakarta): : Asia/Kolkata? Select the metric groups you would like to enable: druid, nifi, ntpd, anomaly, chrony, customApp? Select the notifications you would like to enable: emailINFO: Configuring Email Notifications:✔ Enter Email DefaultSnoozeIntervalInSecs: : 0█mEnter Email MaxEmailThreshold: : 11█✔ Enter Email MaxEmailThreshold: : 1█✓ Done, Alerts Notifications Configuration file generated✓ Done, Alerts Notifications pushed to Pulse DB- Restart the alerts notifications:
accelo restart ad-alertsOUTPUT
[root@nifihost1:spark341 (ad-default)]$ accelo restart ad-alertsWARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DB✗ You're about to restart AccelData services. This will restart all or any specified the service. However, any persistent data will be left untouched.✔ You're about to restart AccelData services. This will restart all or any specified the service. However, any persistent data will be left untouched.✔ You're about to restart AccelData services. This will restart all or any specified the service. However, any persistent data will be left untouched.You're about to restart AccelData services. This will restart all or any specified the service. However, any persistent data will be left untouched. Please confirm your action [y/n]: : yCompleted [===============================================================================================================================] 100.00% 1sRestart ad-alerts completed ✓Database Push Configuration
Run the following command to push config to db:
accelo admin datbase push-config -aConfigure the Override
- Change the dir to
work/<clustername>.
cd /data01/acceldata/work/<clustername>- Modify the
override.ymlfile.
vi override.yml- Paste the below config in the file.
log_locations: kern: - path: /var/log/kern.log type: DATESTAMP spark_application: - path: <SPARK_HOME>/work/*/*/stdout,<SPARK_HOME>/work/*/*/stderr type: SPARK_APPLICATION spark_jobhistoryserver: - path: <SPARK_HOME>/logs/spark-*-org.apache.spark.deploy.history.HistoryServer-*-*.out type: YARN_APP spark_master: - path: <SPARK_HOME>/logs/spark-*-org.apache.spark.deploy.master.Master-*-*.out type: YARN_APP spark_worker: - path: <SPARK_HOME>/logs/spark-*-org.apache.spark.deploy.worker.Worker-*-*.out type: YARN_APP sysloroot: - path: /var/log/syslog,/var/log/messages type: DATESTAMPhydra: hostname_method: ENVDo the above steps for all clusters.
Deploy the Pulse Agents
Install the new Pulse version 3.3.3 agents on all cluster nodes. Make a copy of the new hystaller file to /tmp or any executable location on all cluster nodes and then run the following command on all cluster nodes.
Change the following code snippet as per your environment
PULSE_HOME="/opt/pulse"PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin"HYDRA_SERVER_URL="http://<PULSE_SERVER_HOSTNAME>:19072"HYDRA_HEARTBEAT_DURATION="60"HYDRA_PARCEL_MODE="False"HYDRA_HOSTNAME_CASE="lower"HYDRA_HOSTNAME_METHOD="ENV"HYDRA_HEARTBEAT_JITTER="10"PULSE_HOSTNAME="<FQDN/Alias of the server where hydra to be installed>"sudo env "PULSE_HOME=$PULSE_HOME" "PULSE_HOSTNAME=$PULSE_HOSTNAME" "PATH=$PATH" "HYDRA_SERVER_URL=$HYDRA_SERVER_URL" "HYDRA_HEARTBEAT_DURATION=$HYDRA_HEARTBEAT_DURATION" "HYDRA_PARCEL_MODE=$HYDRA_PARCEL_MODE" "HYDRA_HOSTNAME_CASE=$HYDRA_HOSTNAME_CASE" "HYDRA_HOSTNAME_METHOD=$HYDRA_HOSTNAME_METHOD" "HYDRA_HEARTBEAT_JITTER=$HYDRA_HEARTBEAT_JITTER" /tmp/hystaller installReconfig Cluster
- After completing the edits to the override files as outlined above, the next step is to run the following command:
accelo reconfig cluster -aOUTPUT
[root@nifihost1:spark341 (ad-default)]$ accelo reconfig cluster -aINFO: Using default API Version v10 for CM APIWARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DByINFO: Read Cluster Info for spark341 From MongoDBINFO: Clustername : spark341INFO: Reconfiguring the spark341 clusterZookeeper EnabledINFO: Getting the Spark Master List✓ https://nifihost2.ops.iti.acceldata.dev:8480INFO: Getting the Spark Worker List✓ nifihost2.ops.iti.acceldata.dev✓ kafka2.ops.iti.acceldata.dev✓ kafka1.ops.iti.acceldata.devINFO: Purging old config filesINFO: Pushing the hydra_hosts.yml to mongodbINFO: Regenerating vars.ymlINFO: Merging the override.yml with vars struct...WARN: /data01/acceldata/config/users/passwd already being generatedINFO: Pushing vars tarINFO: Updating the Epoch TimeINFO: Reloading the Hydra ServerINFO: Read Cluster Info for spark330 From MongoDBINFO: Clustername : spark330INFO: Reconfiguring the spark330 clusterZookeeper EnabledINFO: Getting the Spark Master List✓ http://sac01.acceldata.dvl:8080✓ http://sac02.acceldata.dvl:8080INFO: Getting the Spark Worker List✓ sac03.acceldata.dvl✓ sac02.acceldata.dvlINFO: Purging old config filesINFO: Pushing the hydra_hosts.yml to mongodbINFO: Regenerating vars.ymlINFO: Merging the override.yml with vars struct...WARN: /data01/acceldata/config/users/group already being generatedWARN: /data01/acceldata/config/users/passwd already being generatedINFO: Pushing vars tarINFO: Updating the Epoch TimeINFO: Reloading the Hydra ServerINFO: Read Cluster Info for nifisa From MongoDBINFO: Clustername : nifisaINFO: Reconfiguring the nifisa clusterINFO: Getting the Nifi Host ListDiscovered NIFI Hosts: ✓ nifihost1.ops.iti.acceldata.devINFO: Purging old config filesINFO: Pushing the hydra_hosts.yml to mongodbINFO: Regenerating vars.ymlWARN: /data01/acceldata/config/users/group already being generatedWARN: /data01/acceldata/config/users/passwd already being generatedINFO: Pushing vars tarINFO: Updating the Epoch TimeINFO: Reloading the Hydra Server- DB Push Config
accelo admin database push-config -aAdding Edge Nodes for Monitoring
These are edge nodes that are not the part of the spark standalone cluster.
- Change the dir to
work/<clustername>.
cd /data01/acceldata/work/<clustername>- Modify the
hydra_hosts_override.ymlfile.
vi hydra_hosts_override.yml- Add the following code to add a host to a already existing host for pulse to monitor:
hosts: append: - <Alias/FQDN>- Run the
accelo reconfig clustercommand for clusters with edge nodes that require monitoring by Pulse. Alternatively, for comprehensive coverage, perform a reconfig cluster on all clusters.
accelo reconfig cluster -a- Check the
hydra_hosts.ymlfile which will now contain the new hosts as well. For example:
cluster: hosts: sac01.acceldata.dvl: "" sac02.acceldata.dvl: "" sac03.acceldata.dvl: "" sac04.acceldata.dvl: "" ----- NEW HOSTSConfigure Gauntlet
Updating the Gauntlet Crontab Duration
- Check if the
ad-core.ymlfile is present or not by running the following command:
ls -al $AcceloHome/config/docker/ad-core.yml- If the file above is not present, then generate it by:
accelo admin makeconfig ad-core- Edit the
ad-core.ymlfile
a. Open the file:
vi $AcceloHome/config/docker/ad-core.ymlb. Update the CRON_TAB_DURATION env variable in the ad-gauntlet section:
CRON_TAB_DURATION=0 0 */2 * *This makes gauntlet run every 2 days at midnight.
c. The updated file will look something like this:
ad-gauntlet: image: ad-gauntlet container_name: ad-gauntlet environment: - MONGO_URI=ZN4v8cuUTXYvdnDJIDp+R8Z+ZsVXXjv8zDOvh8UwQXosC8vfVkGYGWGPNnX64ZVSp9yHgErQknPBAfYZ9cOG1A== - MONGO_ENCRYPTED=true - ELASTIC_ADDRESSES=http://ad-elastic:9200 - DRY_RUN_ENABLE=true - CRON_TAB_DURATION=0 0 */2 * * volumes: - /etc/localtime:/etc/localtime:ro - /root/acceldata/config/logsearch/gauntlet_elastic.yml:/gauntlet/config/config.yml - /root/acceldata/logs/logsearch/:/gauntlet/logs/ ulimits: {} ports: [] depends_on: [] opts: {} restart: "" extra_hosts: [] network_alias: []d. Save the file.
- Restart gauntlet service by running the command:
accelo restart ad-gauntletUpdating the Gauntlet Dry Run Mode
- Check if the
ad-core.ymlfile is present or not by running the following command:
ls -al $AcceloHome/config/docker/ad-core.yml- If the file above is not present, then generate it by:
accelo admin makeconfig ad-core- Edit the
ad-core.ymlfile.
a. Open the file.
vi $AcceloHome/config/docker/ad-core.ymlb. Update the DRY_RUN_ENABLE env variable in the ad-gauntlet section:
DRY_RUN_ENABLE=falseThis will make the gauntlet delete the order elastic indices and mongo db data.
c. The updated file will look something like this:
ad-gauntlet: image: ad-gauntlet container_name: ad-gauntlet environment: - MONGO_URI=ZN4v8cuUTXYvdnDJIDp+R8Z+ZsVXXjv8zDOvh8UwQXosC8vfVkGYGWGPNnX64ZVSp9yHgErQknPBAfYZ9cOG1A== - MONGO_ENCRYPTED=true - ELASTIC_ADDRESSES=http://ad-elastic:9200 - DRY_RUN_ENABLE=false - CRON_TAB_DURATION=0 0 */2 * * volumes: - /etc/localtime:/etc/localtime:ro - /root/acceldata/config/logsearch/gauntlet_elastic.yml:/gauntlet/config/config.yml - /root/acceldata/logs/logsearch/:/gauntlet/logs/ ulimits: {} ports: [] depends_on: [] opts: {} restart: "" extra_hosts: [] network_alias: []d. Save the file.
- Restart gauntlet service by running the command:
accelo restart ad-gauntletUpdating MongoDB Cleanup and Compaction Frequency in Hours
By default, when dry run is disabled MongoDB cleanup and compaction will run once a day. To configure the frequency, follow the steps listed below.
- Run the following command:
accelo config retention- Answer the following prompts, if you’re unsure about how many days you wish to retain. Then proceed with the default values.
✔ How many days of data would you like to retain at Mongo DB ?: 15✔ How many days of data would you like to retain at Mongo DB for HDFS reports ?: 15✔ How many days of data would you like to retain at TSDB ?: 31- When the following prompt comes up, specify the hours of the day during which you would like MongoDB clean up and compaction to run. The value must be a CSV of hours as per the 24 hour time notation.
✔ How often should Mongo DB clean up & compaction run, provide a comma separated string of hours (valid values are [0,23] (Ex. 8,12,15,18)?: 0,6,12,18- Run the following command. When gauntlet runs the next time, MongoDB clean up and compaction will run at the specified hours, once per hour.
accelo admin database push-configEnabling (TLS) HTTPS for Pulse Web UI Configuration Using ad-proxy
Deployment and Configuration
- Copy the
cert.crt,cert.keyandca.crt(optional) files to$AcceloHome/config/proxy/certslocation. - Check if
ad-core.ymlfile is present or not.
ls -al $AcceloHome/config/docker/ad-core.yml- If
ad-core.ymlfile is not present, then generate thead-core.ymlfile.
accelo admin makeconfig ad-coreOUTPUT
[root@hostname:addons (ad-default)]$ accelo admin makeconfig ad-coreWARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DB✓ Done, Configuration file generatedIMPORTANT: Please edit/verify the file '/data01/acceldata/config/docker/ad-core.yml'.If the stack is already up and running, use './accelo admin recreate' to recreate the whole environment with the new configuration.- Modify the
ad-core.ymlfile.
a. Open the ad-core.yml file.
vi $AcceloHome/config/docker/ad-core.ymlb. Remove the ports: field in the ad-graphql section of ad-core.yml .
ports: - 4000:4000c. The resulting ad-graphql section will look like this:
ad-graphql: image: ad-graphql container_name: "" environment: - MONGO_URI=ZN4v8cuUTXYvdnDJIDp+R8Z+ZsVXXjv8zDOvh8UwQXosC8vfVkGYGWGPNnX64ZVSp9yHgErQknPBAfYZ9cOG1A== - MONGO_ENCRYPTED=true - MONGO_SECRET=Ah+MqxeIjflxE8u+/wcqWA== - UI_PORT=4000 - LDAP_HOST=ad-ldap - LDAP_PORT=19020 - SSL_ENFORCED=false - SSL_ENABLED=false - SSL_KEYDIR=/etc/acceldata/ssl/ - SSL_KEYFILE=ssl.key - SSL_CERTDIR=/etc/acceldata/ssl/ - SSL_CERTFILE=ssl.crt - SSL_PASSPHRASE="" - DS_HOST=ad-query-estimation - DS_PORT=8181 - 'FEATURE_FLAGS={ "ui_regex": { "regex": "ip-([^.]+)", "index": 1 }, "rename_nav_labels":{}, "timezone": "", "experimental": true, "themes": false, "hive_const":{ "HIVE_QUERY_COST_ENABLED": false, "HIVE_MEMORY_GBHOUR_COST": 0, "HIVE_VCORE_HOUR_COST": 0 }, "spark_const": { "SPARK_QUERY_COST_ENABLED": false, "SPARK_MEMORY_GBHOUR_COST": 0, "SPARK_VCORE_HOUR_COST": 0 }, "queryRecommendations": false, "hostIsTrialORLocalhost": false, "data_temp_string": "" }' volumes: - /etc/localtime:/etc/localtime:ro - /etc/hosts:/etc/hosts:ro - /data01/acceldata/work/license:/etc/acceldata/license:ro ulimits: {} depends_on: - ad-db opts: {} restart: "" extra_hosts: [] network_alias: []d. Save the file
- Restart the
ad-graphqlcontainer:
accelo restart ad-graphql- Check if the port is not exposed to the host.
docker ps- Check if there are any errors in
ad-graphqlcontainer:
docker logs -f ad-graphql_default- Deploy the
ad-proxyaddons, run the following command, and selectProxyfrom the list and press enter.
accelo deploy addonsOUTPUT
[x] Notifications [x] Oozie Connector> [x] Proxy [ ] QUERY ROUTER DB [ ] SHARD SERVER DB [ ] StandAlone Connector- Check if there are any errors in the
ad-proxycontainer:
docker logs -f ad-proxy_default- Now you can access the Pulse UI using
https://<pulse-server-hostname>. By default the port used is443.
Configuration
If you want to change the SSL port to another port, follow the below steps:
- Check if
ad-proxy.ymlfile is present or not.
ls -altrh $AcceloHome/config/docker/addons/ad-proxy.yml- Generate the
ad-proxy.ymlfile if its not present.
accelo admin makeconfig ad-proxyOUTPUT
[root@hostname:addons (ad-default)]$ accelo admin makeconfig ad-proxyWARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DB✓ Done, Configuration file generatedIMPORTANT: Please edit/verify the file '/data01/acceldata/config/docker/addons/ad-proxy.yml'.If the addon is already up and running, use './accelo deploy addons' to remove and recreate the addon service.- Modify the
ad-proxy.yml.
a. Open the ad-proxy.yml file.
vi $AcceloHome/config/docker/addons/ad-proxy.ymlb. Change the host port in the ports list to the desired port.
ports: - <DESIRED_HOST_PORT>:443The final file will look like this, if the host port is 6003 :
version: "2"services: ad-proxy: image: ad-proxy container_name: "" environment: [] volumes: - /etc/localtime:/etc/localtime:ro - /data01/acceldata/config/proxy/traefik.toml:/etc/traefik/traefik.toml - /data01/acceldata/config/proxy/config.toml:/etc/traefik/conf/config.toml - /data01/acceldata/config/proxy/certs:/etc/acceldata ulimits: {} ports: - 6003:443 depends_on: [] opts: {} restart: "" extra_hosts: [] network_alias: []label: Proxyc. Save the file.
- Restart the
ad-proxycontainer.
accelo restart ad-proxy- Check if there aren’t any errors.
docker logs -f ad-proxy_default- Now you can access the Pulse UI using
https://<pulse-server-hostname>:6003.
Set Up LDAP for Pulse UI
- Check if the
ldap.confis present or not.
ls -al $AcceloHome/config/ldap/ldap.conf- Run the
accelo config ldapcommand to generate the defaultldap.confif not present already.
accelo configure ldapOUTPUT
There is no ldap config file availableGenerating a new ldap config filePlease edit '$AcceloHome/config/ldap/ldap.conf' and rerun this command- Edit the file in path
$AcceloHome/config/ldap/ldap.conf.
vi $AcceloHome/config/ldap/ldap.conf- Configure file for below properties:
LDAP FQDN : FQDN where LDAP server is running
- host = [FQDN]
If port 389 is being used then
- insecureNoSSL = true
SSL root CA Certificate
- rootCA = [CERTIFICATE_FILE_PATH]
bindDN : to be used for
ldapsearch need to be member of admin groupbindPW :
<encrypted-password-string>for entering in database.encryptedPassword =
true, set this to true to enable the use of encrypted password.baseDN used for user search
- Eg:
(cn=users, cn=accounts, dc=accedata, dc=io)
- Eg:
Filter used for the user search
- Eg:
(objectClass=person)
- Eg:
baseDN used for group search
- Eg:
(cn= groups, cn=accounts, dc=acceldata, dc=io)
- Eg:
Group Search: Object class used for group search
- Eg:
(objectClass= posixgroup)
- Eg:
Here is the command to check if a user has search entry access and group access in LDAP directory:
ldapsearch -x -h <hostname> -p 389 -D "uid=admins,cn=users,dc=acceldata,dc=io" -W -b "cn=accounts,dc=acceldata,dc=io" "(&(objectClass=person)(uid=admins))"- If the file is already generated, it will ask for the
LDAPcredentials to validate the connectivity and configurations, which are mentioned in the below steps. - Run the
accelo config ldapcommand.
accelo configure ldap- It will ask for the LDAP user credentials:
Checking LDAP connectionEnter LDAP username: gsEnter LDAP password: *******- If things went correctly, the below confirmation message will be displayed:
performing ldap search ou=users,dc=acceldata,dc=io sub (&(objectClass=inetOrgPerson)(uid=gs))username "gs" mapped to entry cn=gs,ou=users,dc=acceldata,dc=io✗ Do you want to use this configuration: y- Press ‘y' and press 'Enter’.
OUTPUT
Ok, Updating login properties.✓ Done, You can now login using LDAP.- Push the ldap config.
accelo admin database push-config -a- Run the deploy addon command.
accelo deploy addons- Select the LDAP from the list shown and press 'Enter':
[ ] Job Runner [ ] Kafka 0.10.2 Connector [ ] Kafka Connector> [x] LDAP [ ] Log Reduce [ ] LogSearch [ ] Memsql ConnectorOUTPUT
Starting the deployment ..Completed [==================================================================================================] 100.00% 0s✓ Done, Addons deployment completed.- Run the restart command.
accelo restart ad-graphql- Open Pulse Web UI and create default roles.
- Add ops role with required access and all incoming users with
ldaplogin will come under this role automatically.
Spark Jars Placements and Spark Config Changes
Perform the following steps for all the Spark Cluster Nodes:
- Add the following configuration in the
metrics.propertiesfile for Spark TimeSeries data:
$SPARK_HOME/conf/metrics.properties[root@sac01 conf]# cat metrics.properties*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink*.sink.graphite.host=localhost*.sink.graphite.port=12003*.sink.graphite.protocol=tcp*.sink.graphite.prefix=spark.metrics*.sink.graphite.period=20*.source.jvm.class=org.apache.spark.metrics.source.JvmSource- Add the following configuration in the
spark-defaults.conffile for the Events data:
$SPARK_HOME/conf/spark-defaults.conf[root@sac01 conf]# cat spark-defaults.confspark.event_wait_period_sec 30spark.eventLog.enabled truespark.extraListeners io.acceldata.sparkstats.AdSparkListenerspark.ad.connector.context yarn;<CLUSTERNAME>;admin;testspark.ad.events.url http://<PULSE SERVER HOSTNAME>:19005/eventsasync- Take the
ad-spark-hook.jarfile and put it in the following dir:
$SPARK_HOME/jars/- Restart all the Spark services.
DotLog Download
We have introduced a feature that allows downloading of service logs in .log format. This file is not the original server log but an xlsx sheet merged into a .log format.
Perform the following to add a configurable parameter to enable this feature:
- Insert the dotLogFileDownload parameter into the feature flags property of the
ad-graphqlsection found at the file path: $Acceldata_Home/config/docker/ad-core.yml.
'FEATURE_FLAGS={ "ui_regex": { "regex": "ip-([^.]+)", "index": 1 }, "rename_nav_labels":{}, "timezone": "", "experimental": true, "themes": false, "hive_const":{ "HIVE_QUERY_COST_ENABLED": false, "HIVE_MEMORY_GBHOUR_COST": 0, "HIVE_VCORE_HOUR_COST": 0 }, "spark_const": { "SPARK_QUERY_COST_ENABLED": false, "SPARK_MEMORY_GBHOUR_COST": 0, "SPARK_VCORE_HOUR_COST": 0 }, "queryRecommendations": false, "hostIsTrialORLocalhost": false, "data_temp_string": "", "dotLogFileDownload": true }'- Restart the ad-graphl service using the following command:
accelo restart ad-graphqlNew Search Bar
Perform the following to enable new search options:
- Locate the “ad-graphql“ section in file $Acceldata_Home/config/docker/ad-core.yml and under the “environment“ key, add the following line:
- NEW_SEARCH=true- Restart the ad-graphl service using the following command:
accelo restart ad-graphqlDoes a user in the Spark Standalone environment still see the Spark option in the left menu even after their access has been revoked from the role?
Create a different role that does not have Spark permission and assign that role to the user. Alternatively, you can leave it as is because even if the Spark entry is visible in the left navigation, the user will not be able to access it if access has been revoked from their role.
Are non-admin users in the Spark Standalone environment able to access Spark even though they have the appropriate role permissions for accessing Spark?
In the role edit window, click on "Select All" just below the Page permissions. Then, remove any permissions that you do not wish to grant and save the role. Any user assigned to this role should now have access to Spark in the Spark Standalone environment.
What is the reason for the absence of the Oozie workflow link between the Oozie workflow and the application ID in PULSE for a Spark job?
The Spark job's Application ID is generated by the Oozie service. It will only appear in the Pulse UI if it is available in Oozie's Web Service UI. If it is not present in the Oozie's Web Service UI, it will not be displayed in Pulse.