Deploy Yarn Optimizer

For the YARN Optimizer to function, the Kerberos keytab for the Pulse product must be granted the yarn rmadmin -updateNodeResource permission. For more information, see Apache Hadoop 3.3.6 - Yarn Commands.

The YARN optimizer can be safely deployed by setting the subset to include nodes available in the cluster for optimization.

The behavior of the optimization algorithm can be controlled using the following settings:

  • Reserved buffer memory for non-YARN processes
  • Maximum percentage of memory to be overcommitted
  • Maximum step size for adjusting the amount of overcommitted memory every 5 seconds

To deploy the Yarn Optimizer server, perform the following steps:

Deploying the Add-on

  1. Execute the command accelo deploy addons to start the add-on deployment process.
Bash
Copy
  1. From the list of available components, choose the Yarn Optimizer service by navigating with the arrow keys and selecting it.
Bash
Copy
  1. Press Enter to deploy the Yarn Optimizer container.

Generating the Docker Configuration File:

  1. Generate the Docker configuration file by running accelo admin makeconfig ad-yarn-optimizer. This action creates a necessary configuration file for the Yarn Optimizer.
Bash
Copy
  1. Once generated, you'll need to review and possibly edit the file located at /data01/acceldata/config/docker/addons/ad-yarn-optimizer.yml. This step ensures the settings align with your specific requirements.

Output:

Bash
Copy

Restarting the Docker Container:

  1. To restart the Yarn Optimizer container at any point, use the command accelo restart ad-yarn-optimizer.
Bash
Copy

Deploy the Yarn Metrics Agent via Ansible

Deploying the Yarn Metrics Agent:

  1. Ensure any older versions of the agents are uninstalled by executing accelo uninstall remote.
Bash
Copy
  1. Deploy the new agents using accelo deploy addons.
Bash
Copy

Configuring the Yarn Metrics Agent:

  • For enabling or disabling the Yarn Metrics Agent during deployment, utilize the HYDRA HOSTS feature flag.
  • For agents already installed (Pulse Version 3.3.20), the enable/disable feature is managed through the VARS YAML file.

Using the HYDRA HOSTS YAML File:

This file allows for the copying of the default file to predetermined locations. Ansible will decide whether to start the agent based on the enabled or disabled state of the feature flag.

To modify the yarn_opt_enable flag:

  1. Open the hydra_hosts.yml file located at $AcceloHome/work/<clusterName>/hydra_hosts.yml.
  2. Change the yarn_opt_enable value under the vars section to true.
  3. Save your changes.
  4. Uninstall the agents that are already running using accelo uninstall remote.
  5. Deploy the agents again with the new config: accelo deploy hydra.

Using the VARS YAML File:

  • This file generates service and configuration files from the vars.yml data. If the agents of 3.3.20 are already installed with yarn_opt_enable flag as false in hydra_hosts.yml, to start the pulseyarnmetrics agent, use the vars.yml file to control the same.
  • For the CDP environment, it is mandatory to configure the correct node ID port. for details, see Configure the OverCommit Timeout Value and Node ID Port.

To enable the Yarn Metrics Agent when the yarn_opt_enable flag is set to false:

  1. Create an override.yml file if it doesn't exist:
Bash
Copy
  1. Open the override.yml file and add the following configuration, ensuring the yarn_opt_enable is set to true.
Bash
Copy
Bash
Copy
  1. Save the file and run accelo reconfig cluster to apply the new configuration.
Bash
Copy

Verifying the Agent's Status:

To confirm if the agent is operational, perform the following steps:

  1. Log into one of the YARN Node Manager nodes and run systemctl status pulseyarnmetrics.
Bash
Copy
  1. For log details, use journalctl -u pulseyarnmetrics.
Bash
Copy

Sample Output:

Bash
Copy

Deploy Agents via HYStaller

If you are installing agents through HYStaller, follow these instructions:

Environment Variable for Service Control: The YARN_OPT_ENABLE variable is used to enable or disable the service through HYStaller.

  1. Obtain HYStaller:

    1. Download the HYStaller binary from the License UI.
  2. Install Hydra Agent:

    1. Before installation, replace <PULSE_HOSTNAME> with the fully qualified domain name (FQDN) of your Pulse server.
    2. Execute the script below to stop and disable existing services before proceeding with the HYStaller installation:
Bash
Copy
  1. Configure and Install: Set up your environment variables and execute the HYStaller with the install command:
Bash
Copy

Configure User Account for Running the Pulseyarnmetrics Agent

This is only required if the agent is not running properly and you want to change the user for it.

Perform the following steps:

  1. Update the User Configuration in vars.yml:
    1. Locate the agent parameter within the vars.yml file and change the yarnmetrics_user value to the desired username (default is "yarn"):
Bash
Copy
  1. Create or Update the override.yml File:
  • If absent, generate a new override.yml file within your cluster's working directory:
Bash
Copy
  • Edit the override.yml file to include the updated user configuration:
Bash
Copy
  • Ensure the file contains the following lines, adjusting the yarnmetrics_user as necessary:
Bash
Copy
  1. Apply Configuration Changes: Save the modifications and execute the accelo reconfig cluster command to update the cluster with the new user settings.
Bash
Copy
  1. Verify the Change:
  • Check the systemd unit file on one of the nodes to confirm the new user is specified:
Bash
Copy
  • The User field in the service configuration should reflect the new user account chosen.

Configure the OverCommit Timeout Value and Node ID Port

Follow the steps to configure the overcommit timeout value to revert the changes if NATs are unreachable. Also, you can configure the Node ID port. This configuration helps you to revert each node to its original value if nats is unreachable for a specific amount of time.

To fetch the node ID port run the command yarn node -list in any cluster node.

The parameters in vars.yml file are as follows. By default, the OverCommit timeout value is set to 300 seconds.

Bash
Copy
  1. Create the override.yml file if not present already.
Bash
Copy
  1. Open the override.yml file.
Bash
Copy
  1. Put the following configuration in the override.yml file if not present already.
Bash
Copy
  1. Save the file.
  2. Run the reconfig cluster.
Bash
Copy
  1. Verify the config file for the pulseyarnmetrics agent.
Bash
Copy

Enable REST API call for Overcommitment of Node Memory

To enable the Rest API call, perform the following steps:

  1. Run the following command.
Bash
Copy
  1. Add the following environment variable in the ad-yarn-optimizer.yaml file.
Bash
Copy

Enable Node Limit as Configured in YARN Configurations

You can follow the below steps to consider the memory limit configured in the Ambari UI > Yarn Configurations.

Follow the below steps to enable this feature:

  1. Set the environment variable to true in ad-yarn-optimizer.yml.
Bash
Copy
  1. Restart the service using the following command.
Bash
Copy

Enable Static Queue Optimization

Follow the steps to enable the static Queue Optimization.

  1. SET "ENABLE_STATIC_QUEUES_OVERCOMMITMENT" environment variable to True in ad-yarn-optimizer.yml.
  2. In case of the CDP setup, set "QUEUE_UPDATE_API" to "/ws/v1/cluster/scheduler-conf?user.name=hdfs" in ad-yarn-optimizer.yml.
  3. Restart the ad-yarn-optimizer service by running the following command.
Bash
Copy
  1. Set enableStaticQueue to True in the FEATURE_FLAGS environment variable in ad-graphql.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard