Install Pulse on a Single Node (Docker)
This document explains how to install Pulse on a single node.
Untar the Pulse packaged tarball provided for offline installations.
Pulse Installation Steps
- Create a Directory for Pulse. You must set aside a directory path to host all the Pulse binaries and other related data. This includes the various database data storage used by Pulse. Acceldata recommends you to create a directory named acceldata in the preferred location and ensure that the directory has the required user permissions to access the required storage unit. Acceldata also recommends you to set 200 GB of size for this location. This location would be referred as Accelo Home or AccelData Home in this document.
- Create a directory by executing the
mkdir
command. In the following example, the directory is named as acceldata and is created in the data01 mount directory. This directory would be referred to as AcceloHome directory in the document.
mkdir -p /data01/acceldata
Initialise and Configure Pulse CLI
- Rename and move the accleo.linux file (downloaded as a part of prerequisite) to the AcceloHome directory, by executing the following command:
mv accelo.linux /data01/acceldata/accelo
- Make the accelo file executable, by executing the following command:
chmod +x /data01/acceldata/accelo
- Initialise the Pulse CLI by executing the following commands:
cd /data01/acceldata
./accelo init
- Accept Acceldata’s trial license agreement and enter your name and email address when prompted.
Note: You may encounter some errors on executing the above command. You can ignore them.
- Setup the OS environment variables AcceloHome, AcceloStack and append the Pulse CLI binary to the OS $PATH variable. You must perform these tasks by executing one of the following steps, based on whether you are a root user or non-root user:
Non-Root User | Root User |
---|---|
1000:1000 ownership has to be maintained in the above directory at all times. |
|
- Once the source command from the above step is successful, you can access the Pulse CLI from any directory.
- Enter the ImageTag value as the version of the Pulse you are trying to install. For example: 1.8.2 This completes the Pulse CLI initialization process.
accelo init
- Validate the versions of both the Pulse CLI and Pulse by running the following command.
accelo info
Use arrows to move up and down, press space bar to select/unselect on multiple options. For older terminals use CTRL + Backspace for deleting written text.
Avoid pressing CTRL + C or CTRL + Z while configuring Pulse using CLI as it does not maintain history for earlier configure information.
Set readOnlyRootFSEnabled Parameter to True
Be aware that configuring this step is entirely optional and should be undertaken exclusively if you seek to enhance the security of your containers.
To change the readOnlyRootfSEnabled
setting from false
to true
, perform the following:
- Set
readOnlyRootFSEnabled: true
in accelo.yml file. - Run the following command:
accelo admin database push-config
- Restart accelo by running the following command:
accelo restart all -d
To ensure that all the containers are in ReadOnlyRootFileSystem, run the following command:
docker ps --quiet --all | xargs docker inspect --format '{{ .Id }}: ReadonlyRootfs={{ .HostConfig.ReadonlyRootfs }}'
In case any service requires additional paths to be mounted as tmpfs, you must add it to the <service>.yml file.
For example: ad-proxy.yml
version: "2"
services:
ad-proxy:
image: ad-proxy
container_name: ""
environment: []
volumes:
- /etc/localtime:/etc/localtime:ro
- /data01/acceldata/config/proxy/traefik.toml:/etc/traefik/traefik.toml
- /data01/acceldata/config/proxy/config.toml:/etc/traefik/conf/config.toml
- /data01/acceldata/config/proxy/certs:/etc/acceldata
ulimits: {}
ports:
- 443:443
depends_on: []
opts: {}
restart: ""
extra_hosts: []
network_alias: []
tmpfs:
/tmp: rw
label: Proxy
Configure the Acceldata Core Components
- Execute the following command to configure installation of core components:
accelo config cluster
- The Acceldata CLI asks you for information about your environment. The following table lists the questions asked by Acceldata CLI and a guide to help you answer the questions.
Questions asked by CLI | Guidelines for answering the questions |
---|---|
Is the 'Database Service' up and running? [y/n] | Type y if the database service is up. Else type n. |
Which distribution do you use? | Select Ambari, Cloudera, Spark-On-Kubernetes, Stand-Alone, Custom, None. |
Enter Your Cluster's Display Name: | Enter a display name for your cluster. This name is displayed on the cluster lists in the Pulse UI. |
Enter Ambari URL (with http/https): | Enter your Ambari URL along with port |
Enter Ambari Manager Username: | Enter username of your Ambari/CM UI user account |
Enter Ambari User Password: | Enter the password of your Ambari/CM UI user account |
Enter the cluster name to use (MUST be all lowercase & unique) | Enter a name for your cluster. The cluster name must be in lowercase and cannot contain hyphen. Even if you give hyphen, it is converted to underscore. The cluster name you give here is used within Pulse to identify the cluster. Hence you must provide unique name to each cluster, if you have multiple clusters. |
Which cluster to use? | Enter the cluster that you want to use. |
Select the stack version you would like to use: | Select the Hadoop Stack version that you want to use. |
Enter the Spark History HDFS path: | Enter the location path for installing the Spark History. For example: /user/spark/applicationHistory |
Enter the installed Kafka version: | CLI will auto detect Kafka version, if version > 0.10.2, it will mark as default 0.11.0, and you can proceed further |
Enter the Spark3 History HDFS path: | Enter the location path for installing the Spark History. |
- If Hive Metastore is present in your cluster, the below questions are asked:
Questions asked by CLI | Guidelines for answering the questions |
---|---|
Enter the Hive Metastore MySQL DB Connection URL | Example: jdbc:mysql://<host name>/hive |
Enter the hive metastore Database Name | Enter the database name for your Hive metastore. |
Enter the hive metastore DB Username | Enter the username of your Hive metastore. |
Enter the hive metastore DB Password | Enter the password of your Hive metastore. |
Questions asked by CLI | Guidelines for answering the questions |
---|---|
Oozie DB URL: | Example: jdbc:mysql://<oozieDB URL>/oozie |
Got the Oozie JDBC URL | Example: jdbc:mysql://<oozie JDBC URL>/oozie |
Enter the Oozie DB Username: | Enter the DB user name |
Enter the Oozie DB Password: | Enter the DB Password |
Based on the responses given by you to the above questions, the information is displayed as shown in the following block.
---------------------------Discovered configurations---------------------------------------
✓ Cluster Type: HDP
✓ HDP Version: 2.6.0
✓ Discovered Cluster Name: devcluster
✓ Discovered Services:
✓ SPARK2: 2.0.0
✓ HBASE: 1.1.2
✓ HIVE: 1.2.1000
✓ ACCUMULO: 1.7.0
✓ ZEPPELIN: 0.7.0
✓ LOGSEARCH: 0.5.0
✓ MAPREDUCE2: 2.7.3
✓ ZOOKEEPER: 3.4.6
✓ STORM: 1.1.0
✓ GANGLIA: 3.5.0
✓ DRUID: 0.10.1
✓ SPARK: 1.6.0
✓ RANGER_KMS: 0.7.0
✓ MAHOUT: 0.9.0
✓ SLIDER: 0.92.0
✓ YARN: 2.7.3
✓ SUPERSET: 0.15.0
✓ SQOOP: 1.4.6
✓ OOZIE: 4.2.0
✓ PIG: 0.16.0
✓ SMARTSENSE: 1.4.5.2.6.2.2-1
✓ KNOX: 0.12.0
✓ TEZ: 0.7.0
✓ HDFS: 2.7.3
✓ RANGER: 0.7.0
✓ KAFKA: 0.10.1
✓ AMBARI_INFRA: 0.1.0
✓ AMBARI_METRICS: 0.1.0
✓ FALCON: 0.10.0
✓ ATLAS: 0.8.0
✓ FLUME: 1.5.2
✓ KERBEROS: 1.10.3-10
✓ Yarn RM URI: http://<Yarn RM URI>:8088
✓ MapReduce Job History URI: http://<MapReduce Job History>:19888
✓ Yarn ATS URI: http://<Yarn ATS URI>:8188
✓ HDFS Namenode URI: webhdfs://<HDFS Namenode>
✓ Hive Metastore URI: thrift://<Hive Metastore>:9083
✓ Hive LLAP URI: http://<Hive LLAP URI>:10502
✓ Kafka Broker URI: http://hdp1000.<Kafka Broker URI>:6667,http://hdp1004.<Kafka Broker URI>:6667
✓ Zookeeper Server URI: http://hdp1000<ZookeeperServer>:2181,http://hdp1001.<ZookeeperServer>:2181,http://hdp1002.<ZookeeperServer>:2181,http://hdp1004.<ZookeeperServer>:2181
- You are provided with an additional series of questions as presented in the table below.
Questions asked by CLI | Guidelines for answering the questions |
---|---|
Would you like to continue with the above configuration? | If you are satisfied with the above configurations printed on the screen, type y. Else type n to exit out and restart the configuration. |
Is Kerberos enabled in this cluster? | Type y if you have enabled Kerberos for your services. Else type n.
|
Enter your Kerberos keytab username (Must have required HDFS permissions) | This question is displayed only if you typed y for the previous question. Enter the username for the Kerberos keytab. You must provide the user keytab. If you are providing any other user keytab, the user must have all the required permissions as mentioned in prerequisites. |
Enter the principal | Enter the principal that is required |
Enter full path to the Keytab file | Example: /root/hdfs.keytab |
Enter the krb5Conf file path: | Example:/etc/krb5.conf |
Is HTTPS Enabled in the Cluster? [Y/N] | Type Y if the services such as HDFS, Hive, Spark, has HTTPS or else type N |
Enter the Java Keystore cacerts File Path: | Enter the Java Keystore cacerts File path |
Enter the Java Keystore jsseCaCerts File Path: | Enter the Java Keystore jsseCaCerts File Path |
- The Pulse CLI asks you the following questions. Provide the responses as per your environment settings.
Questions asked by CLI | Guidelines for answering the questions |
---|---|
SSH Key Algorithm used (RSA/DSA)? | The algorithm is either RSA or DSA. |
Which user should connect over SSH | Enter the username of the user who has the privilege to connect to SSH |
SSH private key file path for connecting to hosts | Enter the path to the SSH key for the deployment. |
<IP address> is the IP address of the AccelData Pulse Server, Is this correct? [Y/N] | Type ‘y’ - if the IP address shown here is correct. This IP is used by the agents to communicate to the Pulse server from the cluster nodes. Type ‘n’ - to get an option to enter the correct IP address of the Pulse server. |
Select the components you would like to install: [Use arrows to move, space to select, <right> to all, <left> to none, type to filter] | [ x ] Zookeeper [ x ] Impala [ x ] Hdfs [ x ] MapReduce2 [ x ] Yarn [ x ] Metastore [ x ] HiveServer2 |
Is Kerberos Enabled for Impala? | Type 'y' - if Kerberos is enabled for Impala. Type 'n' - if Impala doesn’t have Kerberos. |
✔ Do you want to enable Impala Agent: [Y/N]? | Type 'y' - if you want to enable Impala Agent. Type 'n' - if you do not want to enable Impala Agent. Important: Make sure to enable the Impala Connector along with the Impala Agent. For details, see Enable the Impala Connector. |
Would you like to enable NTP Stats? [y/n] | Type ‘y’ - If you have NTP enabled in your HDP cluster. |
Enter the JMX Port for hive_metastore | 8009 |
Enter the JMX Port for hive_server | 8008 |
Enter the JMX Port for zeppelin_master | 9996 |
Enter the JMX Port for zookeeper_server | 9010 |
Enter the Kafka JMX port | 9999 |
Would you like to install Kapxy | Type 'Y' or 'N' if you want to install the Kapxy |
Would you like to set up LogSearch? [y/n] | Type y to enable Logsearch. Else type n. |
- Validate the acceldata_<CLUSTER_NAME>.conf file; open the file located at /data01/acceldata/config/acceldata_<CLUSTER_NAME>.conf to verify all the configurations are correct.
- Set the following property in the acceldata_<CLUSTER_NAME>.conf file under the configuration section to true.
Important You must perform this step only if you are using CDH 5.x version and Hive on Spark.
hive.on.spark.legacy=true
- Edit the Tez events and Hive queries HDFS path (only if the distribution is CDP and not CDH) in the file. /data01/acceldata/config/acceldata_<ClusterName>.conf
- Use the following command to replace the strings:
sed -i 's#/tmp/ad/tez#/warehouse/tablespace/managed/hive/sys.db/dag_data#g' /data01/acceldata/config/acceldata_<ClusterName>.conf
sed -i 's#/tmp/ad/hive#/warehouse/tablespace/managed/hive/sys.db/query_data#g' /data01/acceldata/config/acceldata_<ClusterName>.conf
- Open the acceldata_<CLUSTER_NAME>.conf file and verify if the strings have been changed to the following for CDP distribution:
- In case provided kerberos is not
hdfs
then update hdfs to given <user> in fileACCELO_HOME/config/users/passwd
tez.events.hdfs.path = "/warehouse/tablespace/managed/hive/sys.db/dag_data"
hive.queries.hdfs.path = "/warehouse/tablespace/managed/hive/sys.db/query_data"
- In case provided Kerberos user is not HDFS then update HDFS to the given <user> in file
ACCELO_HOME/config/users/passwd
. - Kafka connector is enabled by default with
SASL_PLAINTEXT
, in case of other protocols, update the following two properties under kafka.connectorsconfig/acceldata_<cluster_name>.conf
. Once the file is updated, runaccelo admin database push-config
.
batchSize = 5
zk_secure = false
consumerSecurityProtocol = "PLAINTEXT"
securityProtocol = "PLAINTEXT"
}]
}
]
Change the YARN Scheduler Type for Pulse
The Hadoop YARN services has the following scheduler types.
- Capacity (default)
- FAIR
- FIFO
If you are using the capacity scheduler for YARN, you must skip this section. If you are using the FAIR scheduler, execute the following steps.
- Generate the ad-core configuration file (if not generated previously) by executing the following command:
accelo admin makeconfig ad-core
- Edit the ad-core.yml file located at path <$AcceloHome>/config/docker. Add or update the boolean property in the environment variables section of the ad-graphql service block depending on capacity or other scheduler type.
CAP_SCHEDULER_ENABLED_YARN=false
Configure SSL/TLS for Pulse Configuration
If you have enabled TLS/SSL for any of the Hadoop components in your cluster, copy the cacerts
and jsseCaCerts
certificates to the Pulse Node and enter their path when Accelo CLI asks the following question.
- Select Y if the SSL/TLS is enabled.
Is HTTPS Enabled in the Cluster on UI Endpoint? [Y/N]:y
- Enter the certificate path.
Enter the Java Keystore cacerts File Path:/path/to/cert
Enter the Java Keystore jsseCaCerts File Path:/path/to/jsseCaCert
- ad-connectors
- ad-sparkstats
- ad-streaming
- ad-fsanalyticsv2-connector
These are the only services that connect to the respective Hadoop components of the cluster over the HTTPS URI.
Set the permissions of the above files as 0655 by executing the command
chmod 0655 config/security/*
Configure SSL/TLS for Ad-Connectors and Ad-Sparkstats
If you have configured ad-connectors & ad-sparkstats, execute the following steps.
- Generate the ad-core-connectors configuration file by executing the following command.
accelo admin makeconfig ad-core-connectors
- Navigate to the ad-core-connectors.yml file located in the <$AcceloHome>/config/docker/addons directory and add the following lines under the volumes section of both ad-connectors and ad- sparkstats service blocks.
./config/security/cacerts:/usr/local/openjdk-8/lib/security/cacerts
./config/security/jssecacerts:/usr/local/openjdk-8/lib/security/jssecacerts
Notes:
- If you have only the jssecacert file available in your environment, you must mount the jssecacerts file as the cacerts file inside the container by executing the following command.
./config/security/jssecacerts:/usr/local/openjdk-8/lib/security/cacerts
- If you have only the cacerts file available, you must mount it as the cacerts file itself.
Configure SSL/TLS for Ad-Streaming
If you have configured the ad-streaming, execute the following steps.
- Generate the ad-core-connectors configuration file by executing the command
accelo admin makeconfig ad-core
- Navigate to the ad-core.yml file located in the <$AcceloHome>/config/docker directory and add the following lines under the volumes section on ad-streaming service block.
./config/security/cacerts:/usr/local/openjdk-8/lib/security/cacerts
./config/security/jssecacerts:/usr/local/openjdk-8/lib/security/jssecacerts
- If you only just have the jssecacert file available and not the cacerts file, you must just mount the jssecacerts file as the cacerts file file inside the container by executing the following command.
./config/security/jssecacerts:/usr/local/openjdk-8/lib/security/cacerts
Configure SSL/TLS for Ad-fsanalyticsv2-Connector
If you have configured the ad-fsanalyticsv2-connector, execute the following steps
- Generate the ad-core-connectors configuration file by executing the command
accelo admin makeconfig ad-fsanalyticsv2-connector
- Navigate to the ad-fsanalyticsv2-connector.yml file located in the <$AcceloHome>/config/docker/addons directory and add the following lines under the volumes section.
./config/security/cacerts:/usr/local/openjdk-8/lib/security/cacerts
./config/security/jssecacerts:/usr/local/openjdk-8/lib/security/jssecacerts
- If you only just have the jssecacert file available and not the cacerts file, you must mount the jssecacerts file as the cacerts file inside the container by executing the following command.
./config/security/jssecacerts:/usr/local/openjdk-8/lib/security/cacerts
Deploy the Pulse Core Components
- Create a file
ACCELO_HOME/work/license
and copy the license file contents shared by Acceldata team - Login to the Acceldata container registry and pull the Pulse service container images by executing the following command.
accelo login docker
- Execute the Deploy command.
accelo deploy core
The following questions are displayed.
- Have you verified the acceldata config file at '/data01/acceldata/config/acceldata.conf' ? [y/n]: Press ‘y’ and Press ‘Enter’
- Would you like to initiate DB with the config file '/data01/acceldata/config/acceldata.conf'? [y/n]: Press ‘y’ and Press ‘Enter’
Now all the ‘Core’ components for AccelData Pulse Server is up and running
- Execute the following info command.
accelo info
Initialize Pulse Databases
This section is cluster specific. You must execute the steps in this section each time you add a new cluster to Pulse.
- Initialize and create indices for the DB collections by executing the following command.
accelo admin database index-db
Validate the Pulse Agent Configuration Files
- Verify kerberos auth end-points (applicable only if you have enabled Kerberos), by executing the following.
- Open the vars.yml file located in the <Accelo Home>/work directory.
- Verify each of the component specific jmx end-points have kerberos enabled from the Cloudera Manager configuration.
- If kerberos is enabled for an end-point, set that end-point to true for Kerberos. An example is given below.
datanode_host: localhost
datanode_interval: 60s
datanode_kerberized: "true"
datanode_port: "9865"
datanode_proto: https
datanode_replace_host: "true"
datanode_suffix: /jmx
datanode_timeout: 5s
Set this above configuration to "true" if the end-point https://localhost:9865/jmx is kerberos auth enabled.
- Repeat the step 1.c for all the component specific end-points and impala-agent.
- Currently Pulse does not support kerberos for the hbase_scrapper/hbazer component. You can ignore the Kerberos settings for this component.
- If Kerberos is enabled in your cluster, do not modify the following property.
kerberos_enabled="true"
- Validate the Kerberos Principal in the property "kerberos_principle".
- Login to one of the kerberos enabled cluster nodes and check the date format using the command klist.
The date format must be either MM/DD/YYYY or DD/MM/YYYY or MM/DD/YY
- Update the date format for the klist_date_format property by using the following formatting.
01 => Month
02 => Day
2006 => Year
06 => Year
Example: MM/DD/YYYY will be "01/02/2006"
- Verify that all the log file locations are the same as in Cloudera Manager configuration, by opening the vars.yml file and checking in the section logs_locations, located in the Accelo Home/work directory.
Example:
log_locations_yarn_application="/mnt/resource/yarn/container-logs/*/*/stderr,/mnt/resource/yarn/container-logs/*/*/syslog,/mnt/resource/yarn/container-logs/*/*/stdout,"
This example is for the YARN Application Log Locations. Similarly, ensure that all the log locations are valid.
Hydra Configuration
Run the following command to generate the Hydra Configuration file.
accelo reconfig cluster
Deploy Pulse addon Components
- To enable FS Analytics, execute the following command. This command ensures that the directory is owned by GID and UID 1000.
sudo chown -R 1000:1000 /data01/acceldata/data/fsanalytics
- If you are installing Pulse as a non-root user, execute the following commands.
mkdir -p /data01/acceldata/data/elastic
mkdir -p /data01/acceldata/data/fsanalyticssudo
chown -R 1000:1000 /data01/acceldata/data/elastic
- Run the deploy command
accelo deploy addons
- Select the add-on components to be installed from the list of components. You must use the up/down arrow keys to select various add-ons.
- Once you have selected all the required components, press enter.
Common Pulse feature Addons - Acceldata SQL Analysis service, Alerts, Core Connectors, Director, Dashplots, FS Analytics V2, LogSearch, VMDB
Connector Options (if enabled) - Impala Connector, Kafka Connector (version > 0.10.2), Kafka 0.10.2 Connector, Oozie Connector.
Advanced feature Integration - Notifications, LDAP, Proxy
Select the components you would like to install: [Use arrows to move, space to select, type to filter]
[x] Alerts
[ ] Acceldata Metastore
[ ] Acceldata SQL Analysis service
[x] Director
[x] Core Connectors
[x] Dashplots
[x] FS Analytics V2
[ ] Impala Connector
[ ] Kafka 0.10.2 Connector
[x] Kafka Connector
[ ] LDAP
[ ] Notifications
[x] LogSearch
[x] FS Elastic
[ ] HA GraphQL
[x] Hydra
[ ] Impala Connector
[ ] Job Runner
[ ] Notifications
[ ] Proxy
[ ] VMDB (in Pulse Ver 2.0 this has been removed from addons)
- If you have enabled FSanalytics, execute the following command after waiting for 10 secs
accelo admin fsa load
Deploy Pulse Agent
For CDP or CDH, see Install Agents: CDP - Hydra Services (Parcel)
For Non CDP, see Install Agents: Non-CDP - Hydra Services