Deploy Yarn Optimizer
For the YARN Optimizer to function, the Kerberos keytab for the Pulse product must be granted the yarn rmadmin -updateNodeResource permission. For more information, see Apache Hadoop 3.3.6 - Yarn Commands.
The YARN optimizer can be safely deployed by setting the subset to include nodes available in the cluster for optimization.
The behavior of the optimization algorithm can be controlled using the following settings:
- Reserved buffer memory for non-YARN processes
- Maximum percentage of memory to be overcommitted
- Maximum step size for adjusting the amount of overcommitted memory every 5 seconds
To deploy the Yarn Optimizer server, perform the following steps:
Deploying the Add-on
- Execute the command
accelo deploy addonsto start the add-on deployment process.
accelo deploy addons- From the list of available components, choose the Yarn Optimizer service by navigating with the arrow keys and selecting it.
[root@sac04:~ (ad-default)]$ accelo deploy addonsWARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DBINFO: Active Cluster: zeus? Select the components you would like to install: [Use arrows to move, space to select, <right> to all, <left> to none, type to filter] [ ] Oozie Connector [ ] Proxy [ ] QUERY ROUTER DB [ ] Recommendation Service [ ] SHARD SERVER DB [ ] StandAlone Connector> [X] Yarn optimizer- Press Enter to deploy the
Yarn Optimizercontainer.
Generating the Docker Configuration File:
- Generate the Docker configuration file by running
accelo admin makeconfig ad-yarn-optimizer. This action creates a necessary configuration file for the Yarn Optimizer.
accelo admin makeconfig ad-yarn-optimizer- Once generated, you'll need to review and possibly edit the file located at
/data01/acceldata/config/docker/addons/ad-yarn-optimizer.yml. This step ensures the settings align with your specific requirements.
Output:
[root@sac04:~ (ad-default)]$ accelo admin makeconfig ad-yarn-optimizerWARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DB✓ Done, Configuration file generatedIMPORTANT: Please edit/verify the file '/data01/acceldata/config/docker/addons/ad-yarn-optimizer.yml'.If the addon is already up and running, use './accelo deploy addons' to remove and recreate the addon service.[root@sac04:~ (ad-default)]$ cat /data01/acceldata/config/docker/addons/ad-yarn-optimizer.ymlversion: "1"services: ad-yarn-optimizer: image: ad-yarn-optimizer container_name: "" environment: - MONGO_URI=ZN4v8cuUTXYvdnDJIDp+R8Z+ZsVXXjv8zDOvh8UwQXqyScAm+LrS8Y9EWT8A8/30 - NATS_HOST=ad-events volumes: - /etc/localtime:/etc/localtime:ro - /data01/acceldata/config/krb/security:/krb/security - /etc/hosts:/etc/hosts:ro ulimits: {} ports: - 19888:9888 depends_on: [] opts: {} restart: "" extra_hosts: [] network_alias: []label: Yarn optimizerRestarting the Docker Container:
- To restart the Yarn Optimizer container at any point, use the command
accelo restart ad-yarn-optimizer.
accelo restart ad-yarn-optimizerDeploy the Yarn Metrics Agent via Ansible
Deploying the Yarn Metrics Agent:
- Ensure any older versions of the agents are uninstalled by executing
accelo uninstall remote.
accelo uninstall remote- Deploy the new agents using
accelo deploy addons.
accelo deploy addonsConfiguring the Yarn Metrics Agent:
- For enabling or disabling the Yarn Metrics Agent during deployment, utilize the HYDRA HOSTS feature flag.
- For agents already installed (Pulse Version 3.3.20), the enable/disable feature is managed through the VARS YAML file.
Using the HYDRA HOSTS YAML File:
This file allows for the copying of the default file to predetermined locations. Ansible will decide whether to start the agent based on the enabled or disabled state of the feature flag.
To modify the yarn_opt_enable flag:
- Open the
hydra_hosts.ymlfile located at$AcceloHome/work/<clusterName>/hydra_hosts.yml. - Change the
yarn_opt_enablevalue under the vars section totrue. - Save your changes.
- Uninstall the agents that are already running using
accelo uninstall remote. - Deploy the agents again with the new config:
accelo deploy hydra.
Using the VARS YAML File:
- This file generates service and configuration files from the
vars.ymldata. If the agents of3.3.20are already installed withyarn_opt_enableflag asfalseinhydra_hosts.yml, to start thepulseyarnmetricsagent, use thevars.ymlfile to control the same. - For the CDP environment, it is mandatory to configure the correct node ID port. for details, see Configure the OverCommit Timeout Value and Node ID Port.
To enable the Yarn Metrics Agent when the yarn_opt_enable flag is set to false:
- Create an
override.ymlfile if it doesn't exist:
touch $AcceloHome/work/<clustername>/override.yml- Open the
override.ymlfile and add the following configuration, ensuring theyarn_opt_enableis set totrue.
vi $AcceloHome/work/<clustername>/override.ymlbase: yarn_opt_enable: true- Save the file and run
accelo reconfig clusterto apply the new configuration.
accelo reconfig clusterVerifying the Agent's Status:
To confirm if the agent is operational, perform the following steps:
- Log into one of the YARN Node Manager nodes and run
systemctl status pulseyarnmetrics.
systemctl status pulseyarnmetrics- For log details, use
journalctl -u pulseyarnmetrics.
journalctl -u pulseyarnmetricsSample Output:
[root@hdp201 ~]# journalctl -u pulseyarnmetrics-- Logs begin at Mon 2024-01-15 17:22:26 IST, end at Fri 2024-03-22 15:47:45 IST. --Mar 21 22:10:23 hdp201.acceldata.dvl systemd[1]: Started YARN metrics collector service.Mar 21 22:10:23 hdp201.acceldata.dvl yarnmetrics[37475]: Cannot load the yarnMetric config: '/opt/pulse/yarnmetrics/config/yarnmetrics.conf'. Because: unexpecteMar 21 22:10:25 hdp201.acceldata.dvl systemd[1]: pulseyarnmetrics.service holdoff time over, scheduling restart.Mar 21 22:10:25 hdp201.acceldata.dvl systemd[1]: Stopped YARN metrics collector service.Mar 21 22:10:25 hdp201.acceldata.dvl systemd[1]: Started YARN metrics collector service.Mar 21 22:10:25 hdp201.acceldata.dvl yarnmetrics[37594]: Cannot load the yarnMetric config: '/opt/pulse/yarnmetrics/config/yarnmetrics.conf'. Because: unexpecteMar 21 22:10:26 hdp201.acceldata.dvl systemd[1]: pulseyarnmetrics.service holdoff time over, scheduling restart.Mar 21 22:10:26 hdp201.acceldata.dvl systemd[1]: Stopped YARN metrics collector service.Mar 21 22:10:26 hdp201.acceldata.dvl systemd[1]: Started YARN metrics collector service.Mar 21 22:10:26 hdp201.acceldata.dvl yarnmetrics[37656]: Cannot load the yarnMetric config: '/opt/pulse/yarnmetrics/config/yarnmetrics.conf'. Because: unexpecteMar 21 22:10:27 hdp201.acceldata.dvl systemd[1]: pulseyarnmetrics.service holdoff time over, scheduling restart.Mar 21 22:10:27 hdp201.acceldata.dvl systemd[1]: Stopped YARN metrics collector service.Mar 21 22:10:27 hdp201.acceldata.dvl systemd[1]: Started YARN metrics collector service.Mar 21 22:10:27 hdp201.acceldata.dvl yarnmetrics[37759]: Cannot load the yarnMetric config: '/opt/pulse/yarnmetrics/config/yarnmetrics.conf'. Because: unexpecteMar 21 22:10:29 hdp201.acceldata.dvl systemd[1]: pulseyarnmetrics.service holdoff time over, scheduling restart.Mar 21 22:10:29 hdp201.acceldata.dvl systemd[1]: Stopped YARN metrics collector service.Mar 21 22:10:29 hdp201.acceldata.dvl systemd[1]: Started YARN metrics collector service.Mar 21 22:10:29 hdp201.acceldata.dvl yarnmetrics[37942]: Cannot load the yarnMetric config: '/opt/pulse/yarnmetrics/config/yarnmetrics.conf'. Because: unexpecteMar 21 22:10:30 hdp201.acceldata.dvl systemd[1]: pulseyarnmetrics.service holdoff time over, scheduling restart.Deploy Agents via HYStaller
If you are installing agents through HYStaller, follow these instructions:
Environment Variable for Service Control: The YARN_OPT_ENABLE variable is used to enable or disable the service through HYStaller.
Obtain HYStaller:
- Download the HYStaller binary from the License UI.
Install Hydra Agent:
- Before installation, replace
<PULSE_HOSTNAME>with the fully qualified domain name (FQDN) of your Pulse server. - Execute the script below to stop and disable existing services before proceeding with the HYStaller installation:
- Before installation, replace
((sudo systemctl stop hydra && sudo systemctl disable hydra || true) && (sudo systemctl stop pulsenode && sudo systemctl disable pulsenode || true) && (sudo systemctl stop pulsejmx && sudo systemctl disable pulsejmx || true) && (sudo systemctl stop pulselogs && sudo systemctl disable pulselogs || true) || true)sudo chown root:root /tmp/hystallersudo chmod 0700 /tmp/hystallerls -l /tmp/hystallersudo /tmp/hystaller uninstall- Configure and Install: Set up your environment variables and execute the HYStaller with the install command:
PULSE_HOME="/opt/pulse"PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin"HYDRA_SERVER_URL="http://<PULSE_HOSTNAME>:19072"HYDRA_HEARTBEAT_DURATION="60"HYDRA_PARCEL_MODE="False"HYDRA_HOSTNAME_CASE="lower"HYDRA_HOSTNAME_METHOD="CMD"HYDRA_HEARTBEAT_JITTER="10"YARN_OPT_ENABLE="false"sudo env "PULSE_HOME=$PULSE_HOME" "PATH=$PATH" "HYDRA_SERVER_URL=$HYDRA_SERVER_URL" "HYDRA_HEARTBEAT_DURATION=$HYDRA_HEARTBEAT_DURATION" "HYDRA_PARCEL_MODE=$HYDRA_PARCEL_MODE" "HYDRA_HOSTNAME_CASE=$HYDRA_HOSTNAME_CASE" "HYDRA_HOSTNAME_METHOD=$HYDRA_HOSTNAME_METHOD" "HYDRA_HEARTBEAT_JITTER=$HYDRA_HEARTBEAT_JITTER" /tmp/hystaller installConfigure User Account for Running the Pulseyarnmetrics Agent
This is only required if the agent is not running properly and you want to change the user for it.
Perform the following steps:
- Update the User Configuration in
vars.yml:- Locate the
agentparameter within thevars.ymlfile and change theyarnmetrics_uservalue to the desired username (default is "yarn"):
- Locate the
agent: yarnmetrics_user: newuser- Create or Update the
override.ymlFile:
- If absent, generate a new
override.ymlfile within your cluster's working directory:
touch $AcceloHome/work/<clustername>/override.yml- Edit the
override.ymlfile to include the updated user configuration:
vi $AcceloHome/work/<clustername>/override.yml- Ensure the file contains the following lines, adjusting the
yarnmetrics_useras necessary:
agent: yarnmetrics_user: newuser- Apply Configuration Changes: Save the modifications and execute the
accelo reconfig clustercommand to update the cluster with the new user settings.
accelo reconfig cluster- Verify the Change:
- Check the systemd unit file on one of the nodes to confirm the new user is specified:
cat /etc/systemd/system/pulseyarnmetrics.service- The
Userfield in the service configuration should reflect the new user account chosen.
Configure the OverCommit Timeout Value and Node ID Port
Follow the steps to configure the overcommit timeout value to revert the changes if NATs are unreachable. Also, you can configure the Node ID port. This configuration helps you to revert each node to its original value if nats is unreachable for a specific amount of time.
To fetch the node ID port run the command yarn node -list in any cluster node.
The parameters in vars.yml file are as follows. By default, the OverCommit timeout value is set to 300 seconds.
base: yarn_opt_overcommit_timeout_enabled: "true" yarn_opt_overcommit_timeout_seconds: "300" yarn_nodeid_port: "45454"- Create the
override.ymlfile if not present already.
touch $AcceloHome/work/<clustername>/override.yml- Open the
override.ymlfile.
vi $AcceloHome/work/<clustername>/override.yml- Put the following configuration in the
override.ymlfile if not present already.
base: yarn_opt_overcommit_timeout_enabled: "true" yarn_opt_overcommit_timeout_seconds: "100" yarn_nodeid_port: "8041"- Save the file.
- Run the
reconfigcluster.
accelo reconfig cluster- Verify the config file for the pulseyarnmetrics agent.
cat /opt/pulse/yarnmetrics/config/yarnmetrics.conf{ "clusterName": "odp_zoro", "resourcemanagerIP": "odp102.acceldata.dvl", "resourcemanagerPort": 8088, "isKerberosEnabled": false, "keytabPath": "/opt/pulse/node/config/node.keytab", "kerberosPrinciple": "hdfs@ACCELDATA.COM", "kerberosDateFormat": "01/02/2006", "victoriaDBEndpoint": "http://plat02.acceldata.dvl:19043/insert/1385609323/influx", "natsEndpoint": "http://plat02.acceldata.dvl:19009", "natsConsumerAckWait": 20, "isMetricsProcessingNode": false "overcommit_timeout_enabled": true, "overcommit_timeout_seconds": 100, "nodeIdPort": 8041}Enable REST API call for Overcommitment of Node Memory
To enable the Rest API call, perform the following steps:
- Run the following command.
accelo admin makeconfig ad-yarn-optimizer- Add the following environment variable in the ad-yarn-optimizer.yaml file.
IS_REST_EVENT_SUPPORTED=trueEnable Node Limit as Configured in YARN Configurations
You can follow the below steps to consider the memory limit configured in the Ambari UI > Yarn Configurations.
Follow the below steps to enable this feature:
- Set the environment variable to true in
ad-yarn-optimizer.yml.
ENABLE_LIMIT YARN_STATIC_CONFIG to true- Restart the service using the following command.
accelo restart ad-yarn-optimizerEnable Static Queue Optimization
Follow the steps to enable the static Queue Optimization.
- SET "
ENABLE_STATIC_QUEUES_OVERCOMMITMENT" environment variable to True in ad-yarn-optimizer.yml. - In case of the CDP setup, set "
QUEUE_UPDATE_API" to "/ws/v1/cluster/scheduler-conf?user.name=hdfs" in ad-yarn-optimizer.yml. - Restart the
ad-yarn-optimizerservice by running the following command.
accelo restart ad-yarn-optimizer- Set
enableStaticQueueto True in the FEATURE_FLAGS environment variable in ad-graphql.