Deploy Yarn Optimizer

For the YARN Optimizer to function, the Kerberos keytab for the Pulse product must be granted the yarn rmadmin -updateNodeResource permission. For more information, see Apache Hadoop 3.3.6 - Yarn Commands.

The YARN optimizer can be safely deployed by setting the subset to include nodes available in the cluster for optimization.

The behavior of the optimization algorithm can be controlled using the following settings:

Reserved buffer memory for non-YARN processes
Maximum percentage of memory to be overcommitted
Maximum step size for adjusting the amount of overcommitted memory every 5 seconds

To deploy the Yarn Optimizer server, perform the following steps:

Deploying the Add-on

Execute the command accelo deploy addons to start the add-on deployment process.

Bash
    
 
accelo deploy addons
Copy

From the list of available components, choose the Yarn Optimizer service by navigating with the arrow keys and selecting it.

Bash
    
[root@sac04:~ (ad-default)]$ accelo deploy addonsWARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DBINFO: Active Cluster:  zeus? Select the components you would like to install:   [Use arrows to move, space to select, <right> to all, <left> to none, type to filter]  [ ]  Oozie Connector  [ ]  Proxy  [ ]  QUERY ROUTER DB  [ ]  Recommendation Service  [ ]  SHARD SERVER DB  [ ]  StandAlone Connector> [X]  Yarn optimizer
Copy

Press Enter to deploy the Yarn Optimizer container.

Generating the Docker Configuration File:

Generate the Docker configuration file by running accelo admin makeconfig ad-yarn-optimizer. This action creates a necessary configuration file for the Yarn Optimizer.

Bash
    
 
accelo admin makeconfig ad-yarn-optimizer
Copy

Once generated, you'll need to review and possibly edit the file located at /data01/acceldata/config/docker/addons/ad-yarn-optimizer.yml. This step ensures the settings align with your specific requirements.

Output:

Bash
    
 
[root@sac04:~ (ad-default)]$ accelo admin makeconfig ad-yarn-optimizerWARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DB✓ Done, Configuration file generatedIMPORTANT: Please edit/verify the file '/data01/acceldata/config/docker/addons/ad-yarn-optimizer.yml'.If the addon is already up and running, use './accelo deploy addons' to remove and recreate the addon service.[root@sac04:~ (ad-default)]$ cat /data01/acceldata/config/docker/addons/ad-yarn-optimizer.ymlversion: "1"services:  ad-yarn-optimizer:    image: ad-yarn-optimizer    container_name: ""    environment:    - MONGO_URI=ZN4v8cuUTXYvdnDJIDp+R8Z+ZsVXXjv8zDOvh8UwQXqyScAm+LrS8Y9EWT8A8/30    - NATS_HOST=ad-events    volumes:    - /etc/localtime:/etc/localtime:ro    - /data01/acceldata/config/krb/security:/krb/security    - /etc/hosts:/etc/hosts:ro    ulimits: {}    ports:    - 19888:9888    depends_on: []    opts: {}    restart: ""    extra_hosts: []    network_alias: []label: Yarn optimizer
Copy

Restarting the Docker Container:

To restart the Yarn Optimizer container at any point, use the command accelo restart ad-yarn-optimizer.

Bash
    
 
accelo restart ad-yarn-optimizer
Copy

Deploy the Yarn Metrics Agent via Ansible

Deploying the Yarn Metrics Agent:

Ensure any older versions of the agents are uninstalled by executing accelo uninstall remote.

Bash
    
 
accelo uninstall remote
Copy

Deploy the new agents using accelo deploy addons.

Bash
    
 
accelo deploy addons
Copy

Configuring the Yarn Metrics Agent:

For enabling or disabling the Yarn Metrics Agent during deployment, utilize the HYDRA HOSTS feature flag.
For agents already installed (Pulse Version 3.3.20), the enable/disable feature is managed through the VARS YAML file.

Using the HYDRA HOSTS YAML File:

This file allows for the copying of the default file to predetermined locations. Ansible will decide whether to start the agent based on the enabled or disabled state of the feature flag.

To modify the yarn_opt_enable flag:

Open the hydra_hosts.yml file located at $AcceloHome/work/<clusterName>/hydra_hosts.yml.
Change the yarn_opt_enable value under the vars section to true.
Save your changes.
Uninstall the agents that are already running using accelo uninstall remote.
Deploy the agents again with the new config: accelo deploy hydra.

Using the VARS YAML File:

This file generates service and configuration files from the vars.yml data. If the agents of 3.3.20 are already installed with yarn_opt_enable flag as false in hydra_hosts.yml, to start the pulseyarnmetrics agent, use the vars.yml file to control the same.
For the CDP environment, it is mandatory to configure the correct node ID port. for details, see Configure the OverCommit Timeout Value and Node ID Port.

To enable the Yarn Metrics Agent when the yarn_opt_enable flag is set to false:

Create an override.yml file if it doesn't exist:

Bash
    
 
touch $AcceloHome/work/<clustername>/override.yml
Copy

Open the override.yml file and add the following configuration, ensuring the yarn_opt_enable is set to true.

Bash
    
 
vi $AcceloHome/work/<clustername>/override.yml
Copy

Bash
    
 
base:    yarn_opt_enable: true
Copy

Save the file and run accelo reconfig cluster to apply the new configuration.

Bash
    
 
accelo reconfig cluster
Copy

Verifying the Agent's Status:

To confirm if the agent is operational, perform the following steps:

Log into one of the YARN Node Manager nodes and run systemctl status pulseyarnmetrics.

Bash
    
 
systemctl status pulseyarnmetrics
Copy

For log details, use journalctl -u pulseyarnmetrics.

Bash
    
 
journalctl -u pulseyarnmetrics
Copy

Sample Output:

Bash
    
[root@hdp201 ~]# journalctl -u pulseyarnmetrics-- Logs begin at Mon 2024-01-15 17:22:26 IST, end at Fri 2024-03-22 15:47:45 IST. --Mar 21 22:10:23 hdp201.acceldata.dvl systemd[1]: Started YARN metrics collector service.Mar 21 22:10:23 hdp201.acceldata.dvl yarnmetrics[37475]: Cannot load the yarnMetric config: '/opt/pulse/yarnmetrics/config/yarnmetrics.conf'. Because: unexpecteMar 21 22:10:25 hdp201.acceldata.dvl systemd[1]: pulseyarnmetrics.service holdoff time over, scheduling restart.Mar 21 22:10:25 hdp201.acceldata.dvl systemd[1]: Stopped YARN metrics collector service.Mar 21 22:10:25 hdp201.acceldata.dvl systemd[1]: Started YARN metrics collector service.Mar 21 22:10:25 hdp201.acceldata.dvl yarnmetrics[37594]: Cannot load the yarnMetric config: '/opt/pulse/yarnmetrics/config/yarnmetrics.conf'. Because: unexpecteMar 21 22:10:26 hdp201.acceldata.dvl systemd[1]: pulseyarnmetrics.service holdoff time over, scheduling restart.Mar 21 22:10:26 hdp201.acceldata.dvl systemd[1]: Stopped YARN metrics collector service.Mar 21 22:10:26 hdp201.acceldata.dvl systemd[1]: Started YARN metrics collector service.Mar 21 22:10:26 hdp201.acceldata.dvl yarnmetrics[37656]: Cannot load the yarnMetric config: '/opt/pulse/yarnmetrics/config/yarnmetrics.conf'. Because: unexpecteMar 21 22:10:27 hdp201.acceldata.dvl systemd[1]: pulseyarnmetrics.service holdoff time over, scheduling restart.Mar 21 22:10:27 hdp201.acceldata.dvl systemd[1]: Stopped YARN metrics collector service.Mar 21 22:10:27 hdp201.acceldata.dvl systemd[1]: Started YARN metrics collector service.Mar 21 22:10:27 hdp201.acceldata.dvl yarnmetrics[37759]: Cannot load the yarnMetric config: '/opt/pulse/yarnmetrics/config/yarnmetrics.conf'. Because: unexpecteMar 21 22:10:29 hdp201.acceldata.dvl systemd[1]: pulseyarnmetrics.service holdoff time over, scheduling restart.Mar 21 22:10:29 hdp201.acceldata.dvl systemd[1]: Stopped YARN metrics collector service.Mar 21 22:10:29 hdp201.acceldata.dvl systemd[1]: Started YARN metrics collector service.Mar 21 22:10:29 hdp201.acceldata.dvl yarnmetrics[37942]: Cannot load the yarnMetric config: '/opt/pulse/yarnmetrics/config/yarnmetrics.conf'. Because: unexpecteMar 21 22:10:30 hdp201.acceldata.dvl systemd[1]: pulseyarnmetrics.service holdoff time over, scheduling restart.
Copy

Deploy Agents via HYStaller

If you are installing agents through HYStaller, follow these instructions:

Environment Variable for Service Control: The YARN_OPT_ENABLE variable is used to enable or disable the service through HYStaller.

Obtain HYStaller:
1. Download the HYStaller binary from the License UI.
Install Hydra Agent:
1. Before installation, replace <PULSE_HOSTNAME> with the fully qualified domain name (FQDN) of your Pulse server.
2. Execute the script below to stop and disable existing services before proceeding with the HYStaller installation:

Bash
    
((sudo systemctl stop hydra && sudo systemctl disable hydra || true) && (sudo systemctl stop pulsenode && sudo systemctl disable pulsenode || true) && (sudo systemctl stop pulsejmx && sudo systemctl disable pulsejmx || true) && (sudo systemctl stop pulselogs && sudo systemctl disable pulselogs || true) || true)sudo chown root:root /tmp/hystallersudo chmod 0700 /tmp/hystallerls -l /tmp/hystallersudo /tmp/hystaller uninstall
Copy

Configure and Install: Set up your environment variables and execute the HYStaller with the install command:

Bash
    
​x
    
PULSE_HOME="/opt/pulse"PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin"HYDRA_SERVER_URL="http://<PULSE_HOSTNAME>:19072"HYDRA_HEARTBEAT_DURATION="60"HYDRA_PARCEL_MODE="False"HYDRA_HOSTNAME_CASE="lower"HYDRA_HOSTNAME_METHOD="CMD"HYDRA_HEARTBEAT_JITTER="10"YARN_OPT_ENABLE="false"​sudo env "PULSE_HOME=$PULSE_HOME" "PATH=$PATH" "HYDRA_SERVER_URL=$HYDRA_SERVER_URL" "HYDRA_HEARTBEAT_DURATION=$HYDRA_HEARTBEAT_DURATION" "HYDRA_PARCEL_MODE=$HYDRA_PARCEL_MODE" "HYDRA_HOSTNAME_CASE=$HYDRA_HOSTNAME_CASE" "HYDRA_HOSTNAME_METHOD=$HYDRA_HOSTNAME_METHOD" "HYDRA_HEARTBEAT_JITTER=$HYDRA_HEARTBEAT_JITTER" /tmp/hystaller install
Copy

Configure User Account for Running the Pulseyarnmetrics Agent

This is only required if the agent is not running properly and you want to change the user for it.

Perform the following steps:

Update the User Configuration in vars.yml:
1. Locate the agent parameter within the vars.yml file and change the yarnmetrics_user value to the desired username (default is "yarn"):

Bash
    
 
agent:    yarnmetrics_user: newuser
Copy

Create or Update the override.yml File:

If absent, generate a new override.yml file within your cluster's working directory:

Bash
    
 
touch $AcceloHome/work/<clustername>/override.yml
Copy

Edit the override.yml file to include the updated user configuration:

Bash
    
 
vi $AcceloHome/work/<clustername>/override.yml
Copy

Ensure the file contains the following lines, adjusting the yarnmetrics_user as necessary:

Bash
    
 
agent:    yarnmetrics_user: newuser
Copy

Apply Configuration Changes: Save the modifications and execute the accelo reconfig cluster command to update the cluster with the new user settings.

Bash
    
 
accelo reconfig cluster
Copy

Verify the Change:

Check the systemd unit file on one of the nodes to confirm the new user is specified:

Bash
    
 
cat /etc/systemd/system/pulseyarnmetrics.service
Copy

The User field in the service configuration should reflect the new user account chosen.

Configure the OverCommit Timeout Value and Node ID Port

Follow the steps to configure the overcommit timeout value to revert the changes if NATs are unreachable. Also, you can configure the Node ID port. This configuration helps you to revert each node to its original value if nats is unreachable for a specific amount of time.

To fetch the node ID port run the command yarn node -list in any cluster node.

The parameters in vars.yml file are as follows. By default, the OverCommit timeout value is set to 300 seconds.

Bash
    
 
base:    yarn_opt_overcommit_timeout_enabled: "true"    yarn_opt_overcommit_timeout_seconds: "300"    yarn_nodeid_port: "45454"
Copy

Create the override.yml file if not present already.

Bash
    
 
touch $AcceloHome/work/<clustername>/override.yml
Copy

Open the override.yml file.

Bash
    
 
vi $AcceloHome/work/<clustername>/override.yml
Copy

Put the following configuration in the override.yml file if not present already.

Bash
    
 
base:  yarn_opt_overcommit_timeout_enabled: "true"  yarn_opt_overcommit_timeout_seconds: "100"  yarn_nodeid_port: "8041"
Copy

Save the file.
Run the reconfig cluster.

Bash
    
 
accelo reconfig cluster
Copy

Verify the config file for the pulseyarnmetrics agent.

Bash
    
 
cat /opt/pulse/yarnmetrics/config/yarnmetrics.conf{  "clusterName": "odp_zoro",  "resourcemanagerIP": "odp102.acceldata.dvl",  "resourcemanagerPort": 8088,  "isKerberosEnabled": false,  "keytabPath": "/opt/pulse/node/config/node.keytab",  "kerberosPrinciple": "hdfs@ACCELDATA.COM",  "kerberosDateFormat": "01/02/2006",  "victoriaDBEndpoint": "http://plat02.acceldata.dvl:19043/insert/1385609323/influx",  "natsEndpoint": "http://plat02.acceldata.dvl:19009",  "natsConsumerAckWait": 20,  "isMetricsProcessingNode": false  "overcommit_timeout_enabled": true,  "overcommit_timeout_seconds": 100,  "nodeIdPort": 8041}
Copy

Enable REST API call for Overcommitment of Node Memory

To enable the Rest API call, perform the following steps:

Run the following command.

Bash
    
 
accelo admin makeconfig ad-yarn-optimizer
Copy

Add the following environment variable in the ad-yarn-optimizer.yaml file.

Bash
    
 
IS_REST_EVENT_SUPPORTED=true
Copy

Enable Node Limit as Configured in YARN Configurations

You can follow the below steps to consider the memory limit configured in the Ambari UI > Yarn Configurations.

Follow the below steps to enable this feature:

Set the environment variable to true in ad-yarn-optimizer.yml.

Bash
    
 
ENABLE_LIMIT YARN_STATIC_CONFIG to true
Copy

Restart the service using the following command.

Bash
    
 
accelo restart ad-yarn-optimizer
Copy

Enable Static Queue Optimization

Follow the steps to enable the static Queue Optimization.

SET "ENABLE_STATIC_QUEUES_OVERCOMMITMENT" environment variable to True in ad-yarn-optimizer.yml.
In case of the CDP setup, set "QUEUE_UPDATE_API" to "/ws/v1/cluster/scheduler-conf?user.name=hdfs" in ad-yarn-optimizer.yml.
Restart the ad-yarn-optimizer service by running the following command.

Bash
    
 
accelo restart ad-yarn-optimizer
Copy

Set enableStaticQueue to True in the FEATURE_FLAGS environment variable in ad-graphql.

Last updated on

Was this page helpful?