Deploy Yarn Optimizer
For the YARN Optimizer to function, the Kerberos keytab for the Pulse product must be granted the yarn rmadmin -updateNodeResource
permission. For more information, see Apache Hadoop 3.3.6 - Yarn Commands.
The YARN optimizer can be safely deployed by setting the subset to include nodes available in the cluster for optimization.
The behavior of the optimization algorithm can be controlled using the following settings:
- Reserved buffer memory for non-YARN processes
- Maximum percentage of memory to be overcommitted
- Maximum step size for adjusting the amount of overcommitted memory every 5 seconds
To deploy the Yarn Optimizer server, perform the following steps:
Deploying the Add-on
- Execute the command
accelo deploy addons
to start the add-on deployment process.
accelo deploy addons
- From the list of available components, choose the Yarn Optimizer service by navigating with the arrow keys and selecting it.
[root@sac04:~ (ad-default)]$ accelo deploy addons
WARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DB
INFO: Active Cluster: zeus
? Select the components you would like to install: [Use arrows to move, space to select, <right> to all, <left> to none, type to filter]
[ ] Oozie Connector
[ ] Proxy
[ ] QUERY ROUTER DB
[ ] Recommendation Service
[ ] SHARD SERVER DB
[ ] StandAlone Connector
> [X] Yarn optimizer
- Press Enter to deploy the
Yarn Optimizer
container.
Generating the Docker Configuration File:
- Generate the Docker configuration file by running
accelo admin makeconfig ad-yarn-optimizer
. This action creates a necessary configuration file for the Yarn Optimizer.
accelo admin makeconfig ad-yarn-optimizer
- Once generated, you'll need to review and possibly edit the file located at
/data01/acceldata/config/docker/addons/ad-yarn-optimizer.yml
. This step ensures the settings align with your specific requirements.
Output:
[root@sac04:~ (ad-default)]$ accelo admin makeconfig ad-yarn-optimizer
WARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DB
✓ Done, Configuration file generated
IMPORTANT: Please edit/verify the file '/data01/acceldata/config/docker/addons/ad-yarn-optimizer.yml'.
If the addon is already up and running, use './accelo deploy addons' to remove and recreate the addon service.
[root@sac04:~ (ad-default)]$ cat /data01/acceldata/config/docker/addons/ad-yarn-optimizer.yml
version: "1"
services:
ad-yarn-optimizer:
image: ad-yarn-optimizer
container_name: ""
environment:
- MONGO_URI=ZN4v8cuUTXYvdnDJIDp+R8Z+ZsVXXjv8zDOvh8UwQXqyScAm+LrS8Y9EWT8A8/30
- NATS_HOST=ad-events
volumes:
- /etc/localtime:/etc/localtime:ro
- /data01/acceldata/config/krb/security:/krb/security
- /etc/hosts:/etc/hosts:ro
ulimits: {}
ports:
- 19888:9888
depends_on: []
opts: {}
restart: ""
extra_hosts: []
network_alias: []
label: Yarn optimizer
Restarting the Docker Container:
- To restart the Yarn Optimizer container at any point, use the command
accelo restart ad-yarn-optimizer
.
accelo restart ad-yarn-optimizer
Deploy the Yarn Metrics Agent via Ansible
Deploying the Yarn Metrics Agent:
- Ensure any older versions of the agents are uninstalled by executing
accelo uninstall remote
.
accelo uninstall remote
- Deploy the new agents using
accelo deploy addons
.
accelo deploy addons
Configuring the Yarn Metrics Agent:
- For enabling or disabling the Yarn Metrics Agent during deployment, utilize the HYDRA HOSTS feature flag.
- For agents already installed (Pulse Version 3.3.20), the enable/disable feature is managed through the VARS YAML file.
Using the HYDRA HOSTS YAML File:
This file allows for the copying of the default file to predetermined locations. Ansible will decide whether to start the agent based on the enabled or disabled state of the feature flag.
To modify the yarn_opt_enable
flag:
- Open the
hydra_hosts.yml
file located at$AcceloHome/work/<clusterName>/hydra_hosts.yml
. - Change the
yarn_opt_enable
value under the vars section totrue
. - Save your changes.
- Uninstall the agents that are already running using
accelo uninstall remote
. - Deploy the agents again with the new config:
accelo deploy hydra
.
Using the VARS YAML File:
- This file generates service and configuration files from the
vars.yml
data. If the agents of3.3.20
are already installed withyarn_opt_enable
flag asfalse
inhydra_hosts.yml
, to start thepulseyarnmetrics
agent, use thevars.yml
file to control the same. - For the CDP environment, it is mandatory to configure the correct node ID port. for details, see Configure the OverCommit Timeout Value and Node ID Port.
To enable the Yarn Metrics Agent when the yarn_opt_enable
flag is set to false
:
- Create an
override.yml
file if it doesn't exist:
touch $AcceloHome/work/<clustername>/override.yml
- Open the
override.yml
file and add the following configuration, ensuring theyarn_opt_enable
is set totrue
.
vi $AcceloHome/work/<clustername>/override.yml
base:
yarn_opt_enable: true
- Save the file and run
accelo reconfig cluster
to apply the new configuration.
accelo reconfig cluster
Verifying the Agent's Status:
To confirm if the agent is operational, perform the following steps:
- Log into one of the YARN Node Manager nodes and run
systemctl status pulseyarnmetrics
.
systemctl status pulseyarnmetrics
- For log details, use
journalctl -u pulseyarnmetrics
.
journalctl -u pulseyarnmetrics
Sample Output:
[root@hdp201 ~]# journalctl -u pulseyarnmetrics
-- Logs begin at Mon 2024-01-15 17:22:26 IST, end at Fri 2024-03-22 15:47:45 IST. --
Mar 21 22:10:23 hdp201.acceldata.dvl systemd[1]: Started YARN metrics collector service.
Mar 21 22:10:23 hdp201.acceldata.dvl yarnmetrics[37475]: Cannot load the yarnMetric config: '/opt/pulse/yarnmetrics/config/yarnmetrics.conf'. Because: unexpecte
Mar 21 22:10:25 hdp201.acceldata.dvl systemd[1]: pulseyarnmetrics.service holdoff time over, scheduling restart.
Mar 21 22:10:25 hdp201.acceldata.dvl systemd[1]: Stopped YARN metrics collector service.
Mar 21 22:10:25 hdp201.acceldata.dvl systemd[1]: Started YARN metrics collector service.
Mar 21 22:10:25 hdp201.acceldata.dvl yarnmetrics[37594]: Cannot load the yarnMetric config: '/opt/pulse/yarnmetrics/config/yarnmetrics.conf'. Because: unexpecte
Mar 21 22:10:26 hdp201.acceldata.dvl systemd[1]: pulseyarnmetrics.service holdoff time over, scheduling restart.
Mar 21 22:10:26 hdp201.acceldata.dvl systemd[1]: Stopped YARN metrics collector service.
Mar 21 22:10:26 hdp201.acceldata.dvl systemd[1]: Started YARN metrics collector service.
Mar 21 22:10:26 hdp201.acceldata.dvl yarnmetrics[37656]: Cannot load the yarnMetric config: '/opt/pulse/yarnmetrics/config/yarnmetrics.conf'. Because: unexpecte
Mar 21 22:10:27 hdp201.acceldata.dvl systemd[1]: pulseyarnmetrics.service holdoff time over, scheduling restart.
Mar 21 22:10:27 hdp201.acceldata.dvl systemd[1]: Stopped YARN metrics collector service.
Mar 21 22:10:27 hdp201.acceldata.dvl systemd[1]: Started YARN metrics collector service.
Mar 21 22:10:27 hdp201.acceldata.dvl yarnmetrics[37759]: Cannot load the yarnMetric config: '/opt/pulse/yarnmetrics/config/yarnmetrics.conf'. Because: unexpecte
Mar 21 22:10:29 hdp201.acceldata.dvl systemd[1]: pulseyarnmetrics.service holdoff time over, scheduling restart.
Mar 21 22:10:29 hdp201.acceldata.dvl systemd[1]: Stopped YARN metrics collector service.
Mar 21 22:10:29 hdp201.acceldata.dvl systemd[1]: Started YARN metrics collector service.
Mar 21 22:10:29 hdp201.acceldata.dvl yarnmetrics[37942]: Cannot load the yarnMetric config: '/opt/pulse/yarnmetrics/config/yarnmetrics.conf'. Because: unexpecte
Mar 21 22:10:30 hdp201.acceldata.dvl systemd[1]: pulseyarnmetrics.service holdoff time over, scheduling restart.
Deploy Agents via HYStaller
If you are installing agents through HYStaller, follow these instructions:
Environment Variable for Service Control: The YARN_OPT_ENABLE
variable is used to enable or disable the service through HYStaller.
Obtain HYStaller:
- Download the HYStaller binary from the License UI.
Install Hydra Agent:
- Before installation, replace
<PULSE_HOSTNAME>
with the fully qualified domain name (FQDN) of your Pulse server. - Execute the script below to stop and disable existing services before proceeding with the HYStaller installation:
- Before installation, replace
((sudo systemctl stop hydra && sudo systemctl disable hydra || true) && (sudo systemctl stop pulsenode && sudo systemctl disable pulsenode || true) && (sudo systemctl stop pulsejmx && sudo systemctl disable pulsejmx || true) && (sudo systemctl stop pulselogs && sudo systemctl disable pulselogs || true) || true)
sudo chown root:root /tmp/hystaller
sudo chmod 0700 /tmp/hystaller
ls -l /tmp/hystaller
sudo /tmp/hystaller uninstall
- Configure and Install: Set up your environment variables and execute the HYStaller with the install command:
PULSE_HOME="/opt/pulse"
PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin"
HYDRA_SERVER_URL="http://<PULSE_HOSTNAME>:19072"
HYDRA_HEARTBEAT_DURATION="60"
HYDRA_PARCEL_MODE="False"
HYDRA_HOSTNAME_CASE="lower"
HYDRA_HOSTNAME_METHOD="CMD"
HYDRA_HEARTBEAT_JITTER="10"
YARN_OPT_ENABLE="false"
sudo env "PULSE_HOME=$PULSE_HOME" "PATH=$PATH" "HYDRA_SERVER_URL=$HYDRA_SERVER_URL" "HYDRA_HEARTBEAT_DURATION=$HYDRA_HEARTBEAT_DURATION" "HYDRA_PARCEL_MODE=$HYDRA_PARCEL_MODE" "HYDRA_HOSTNAME_CASE=$HYDRA_HOSTNAME_CASE" "HYDRA_HOSTNAME_METHOD=$HYDRA_HOSTNAME_METHOD" "HYDRA_HEARTBEAT_JITTER=$HYDRA_HEARTBEAT_JITTER" /tmp/hystaller install
Configure User Account for Running the Pulseyarnmetrics Agent
This is only required if the agent is not running properly and you want to change the user for it.
Perform the following steps:
- Update the User Configuration in
vars.yml
:- Locate the
agent
parameter within thevars.yml
file and change theyarnmetrics_user
value to the desired username (default is "yarn"):
- Locate the
agent:
yarnmetrics_user: newuser
- Create or Update the
override.yml
File:
- If absent, generate a new
override.yml
file within your cluster's working directory:
touch $AcceloHome/work/<clustername>/override.yml
- Edit the
override.yml
file to include the updated user configuration:
vi $AcceloHome/work/<clustername>/override.yml
- Ensure the file contains the following lines, adjusting the
yarnmetrics_user
as necessary:
agent:
yarnmetrics_user: newuser
- Apply Configuration Changes: Save the modifications and execute the
accelo reconfig cluster
command to update the cluster with the new user settings.
accelo reconfig cluster
- Verify the Change:
- Check the systemd unit file on one of the nodes to confirm the new user is specified:
cat /etc/systemd/system/pulseyarnmetrics.service
- The
User
field in the service configuration should reflect the new user account chosen.
Configure the OverCommit Timeout Value and Node ID Port
Follow the steps to configure the overcommit timeout value to revert the changes if NATs are unreachable. Also, you can configure the Node ID port. This configuration helps you to revert each node to its original value if nats is unreachable for a specific amount of time.
To fetch the node ID port run the command yarn node -list
in any cluster node.
The parameters in vars.yml
file are as follows. By default, the OverCommit timeout value is set to 300 seconds.
base:
yarn_opt_overcommit_timeout_enabled: "true"
yarn_opt_overcommit_timeout_seconds: "300"
yarn_nodeid_port: "45454"
- Create the
override.yml
file if not present already.
touch $AcceloHome/work/<clustername>/override.yml
- Open the
override.yml
file.
vi $AcceloHome/work/<clustername>/override.yml
- Put the following configuration in the
override.yml
file if not present already.
base:
yarn_opt_overcommit_timeout_enabled: "true"
yarn_opt_overcommit_timeout_seconds: "100"
yarn_nodeid_port: "8041"
- Save the file.
- Run the
reconfig
cluster.
accelo reconfig cluster
- Verify the config file for the pulseyarnmetrics agent.
cat /opt/pulse/yarnmetrics/config/yarnmetrics.conf
{
"clusterName": "odp_zoro",
"resourcemanagerIP": "odp102.acceldata.dvl",
"resourcemanagerPort": 8088,
"isKerberosEnabled": false,
"keytabPath": "/opt/pulse/node/config/node.keytab",
"kerberosPrinciple": "hdfs@ACCELDATA.COM",
"kerberosDateFormat": "01/02/2006",
"victoriaDBEndpoint": "http://plat02.acceldata.dvl:19043/insert/1385609323/influx",
"natsEndpoint": "http://plat02.acceldata.dvl:19009",
"natsConsumerAckWait": 20,
"isMetricsProcessingNode": false
"overcommit_timeout_enabled": true,
"overcommit_timeout_seconds": 100,
"nodeIdPort": 8041
}
Enable REST API call for Overcommitment of Node Memory
To enable the Rest API call, perform the following steps:
- Run the following command.
accelo admin makeconfig ad-yarn-optimizer
- Add the following environment variable in the ad-yarn-optimizer.yaml file.
IS_REST_EVENT_SUPPORTED=true
Enable Node Limit as Configured in YARN Configurations
You can follow the below steps to consider the memory limit configured in the Ambari UI > Yarn Configurations.
Follow the below steps to enable this feature:
- Set the environment variable to true in
ad-yarn-optimizer.yml
.
ENABLE_LIMIT YARN_STATIC_CONFIG to true
- Restart the service using the following command.
accelo restart ad-yarn-optimizer
Enable Static Queue Optimization
Follow the steps to enable the static Queue Optimization.
- SET "
ENABLE_STATIC_QUEUES_OVERCOMMITMENT
" environment variable to True in ad-yarn-optimizer.yml. - In case of the CDP setup, set "
QUEUE_UPDATE_API
" to "/ws/v1/cluster/scheduler-conf?user.name=hdfs
" in ad-yarn-optimizer.yml. - Restart the
ad-yarn-optimizer
service by running the following command.
accelo restart ad-yarn-optimizer
- Set
enableStaticQueue
to True in the FEATURE_FLAGS environment variable in ad-graphql.