Pulse Server Configuration Requirements
Pulse currently support clusters running Hadoop ecosystem components on RHEL, CentOS, Ubuntu or SUSE operating system.
OS & Software Requirements
Pulse Server (Supported Details):
Software | Details |
---|---|
OS |
|
JDK | OpenJDK or OracleJDK and JAVA_HOME set on OS $PATH environment variable |
Docker Versions | Docker CE 20.10.x or latest, check version with docker version |
File-Creation Mode | Run umask to check if its set to 022 else umask 022 and echo umask 0022 >> /etc/profile |
SELINUX | If yes, either disable it or put it on Permissive Mode. If the Docker is installed already, restart the docker daemon after the change. |
DNS | Require DNS resolution from Pulse server to cluster nodes or from cluster nodes to Pulse server. Test with nslookup command to check forward and reverse lookup of hosts. |
Additional OS Properties |
|
*SLES Note: Offline installation with Docker CE RPMs option is currently available and provided for CentOs/RHEL. In other cases enable the subscription manager for downloading Docker CE repositories such as for SLES Containers Module x86_64 extension required to be enabled via SUSE connect or YaST as mentioned here__.
User Requirements
Pulse Server Installation
- Root (preferable) or non-root user with required access to install, configure and start/stop docker services.
- The user must have required access to load/restart/terminate docker container and images.
Pulse Agent Installation
SSH Based Deployment
- SSH user having common password or password-less (RSA private key) based access from core Pulse server to deploy agents on all nodes with single deploy command. The SSH user should have password-less sudo access on all cluster nodes, required for agent package installation on OS
- Disable the TTY requirement for the respective user to run sudo on all cluster nodes.
Custom Installation Steps
- Acceldata team will provide commands to system administrator for installing Hydra agent binaries on all nodes.
Parcel Based Deployment (only available for CDH 6.x / CDP 7.x)
- Pulse provides a parcel based approach as well for Pulse agent deployment to directly activate, distribute and install via Cloudera Manager
Security
For KERBEROS enabled clusters:
Headless keytab file of user HDFS (principal example:
hdfs-cluster@TEST.COM
), required for connectors, file system analytics and exploration.Special case requirements:
- hdfs user not allowed - need a user with additional property change and HDFS path privileges (Refer cluster configuration changes section)
- hdfs user does not have access on Kafka topic metadata - need kafka service keytab present on one broker for accessing Kafka metadata details or user having admin privileges at ACL level (Refer cluster configuration changes section)
- 403 forbidden access error on FS Analytics with hdfs user - Require Namenode service keytab (example:
nn/nn_host@TEST.COM
)
Place
/etc/krb5.conf
file on Pulse server from the respective cluster.Network connectivity from Pulse server to respective KDCs of the realm to acquire required kerberos tickets.
- Run command to test connectivity:
kinit -kt <keytab_path> <principal_name>
- Run command to test connectivity:
In case of SSSD + Kerberos + AD enabled clusters, specified user such as
hdfs
or other user should exist on Pulse linux serversService list enabled with Kerberos on UI endpoints such as HDFS Namenode UI, and Resource Manager UI.
For SSL/HTTPS Enabled Clusters:
- Truststore and cert files available on a node used to connect with SSL enabled services like HDFS, HIVE, YARN etc.
- Service list enabled with TLS/SSL
For LDAP Integration:
- Read-only user for LDAP interface able to query group and user information to validate user login.
- Network connectivity from Pulse server to respective LDAP servers.
For SMTP Integration:
- Network connectivity from Pulse server to SMTP hosts
Network
Pulse port list for Agent to Server and User accessibility:
- Hadoop Service - For alert and connectors module, requires connectivity from Acceldata Pulse Server to Hadoop host and configured ports (request for master list from Acceldata team)
- Internal – Refers to connectivity from Acceldata Pulse Agents to Acceldata Pulse Server(s). The services can be deployed in multi-node pattern for which they need to be exposed to all the Pulse hosted nodes.
- External – Refers to ports that end users within the enterprise network would need access to. This port should be accessible by your laptop once connected directly to the enterprise network or connected via VPN.
The following table provides details of the different ports used by Pulse services:
New Ports requirement for Mongo Sharding and Yarn Optimizer:
- MONGO SHARDING:
30000
(Router),27017
(Config Server),27018
(Shard Server by Default),27019
- YARN OPTIMIZER:
19888
,19889
Component | Port | Internal or External | Protocol |
---|---|---|---|
ad-graphql | 4000/User Defined | external | http/https |
ad-proxy | 443/User Defined | external | http/https |
ad-connectors | 19003, 19025 | internal | tcp |
ad-sparkstats | 19004 | internal | tcp |
ad-streaming | 19005 | external | http |
ad-vminsert | 19043 | external | http/https |
ad-vmselect | 19042 | internal | tcp |
ad-events | 19009, 19008 | external | tcp |
ad-logstash | 19012, 19051 | external | http |
ad-elastic | 19013, 19014 | external | http |
ad-director | 19016 | internal | tcp |
ad-fsanalyticsv2-connector | 19027 | external | http |
ad-fs-elastic | 19038, 19039 | external | http |
ad-ldap | 19020 | internal | http |
ad-db | 27017,27018,27019,27020,30000 | external | tcp |
ad-sql-analyser | 19030 | internal | http |
ad-notification | 8090 | external | http |
ad-hydra | 19072 | external | http |
ad-dashplots | 18080 | internal | http |
ad-alerts | 19015 | internal | tcp |
ad-yarn-optimizer | 19888 | external | http |
pulseyarnmetrics(systemd agent) | 19889 | external | http |
ad-axnserver | 19999 | external | http/https |
Reserved Pool Ports
If Pulse is configured against multiple Hadoop clusters, each with its own KDC servers, the count of connectors will be directly proportional to the number of distinct KDC servers configured.
To support this deployment model, we suggest allocating a block of ports from 19080 to 19099. Pulse services will then be set to select a port from this designated range.
Files and Additional Information
- Ambari/Cloudera Manager admin or read-only credentials, used by Pulse CLI to map service host discovery.
- Place
/etc/hadoop/conf/hdfs-site.xml
and/etc/hadoop/conf/core-site.xml
on Pulse server - Share the size of the FS Image
- Hadoop Service to Port mapping discovery (Optional)
- Various Log Paths for services, applications, and OS-level Syslogs.
- Host to Service Mapping discovery