Enabling Oozie High Availability

This document outlines the steps to configure Oozie High Availability (HA) to target the load balancer server. Oozie servers utilize Zookeeper for database access coordination and for intercommunication.

Prerequisites

For setting up Oozie HA in the ODP cluster, consider the following prerequisites:

  • Apache ZooKeeper, an open-source distributed coordination service, facilitates communication and database access coordination among Oozie servers. For complete HA, a minimum of three ZooKeeper servers are recommended.
  • A database capable of handling multiple simultaneous connections. For optimal HA, the database must also support HA to prevent it from being a potential single point of failure.
  • Although it's not mandatory for all settings, maintaining uniform configuration across all servers is advised for consistency.
  • Configure a load balancer, Virtual IP, or Round-Robin DNS to offer users and callbacks from the job tracker or resource manager as a unified access point. The load balancer must be set up for round-robin distribution among the Oozie servers. Users, whether they're using the Oozie client, a web browser, or the REST API, must connect via the load balancer. To achieve comprehensive HA, the load balancer must also be HA compliant, otherwise, it becomes a vulnerability as a single point of failure.

Steps to Enable Oozie HA

To enable Oozie HA, perform the following:

  1. In Ambari Web, navigate to the host where you intend to set up an additional Oozie Server.
  2. On the Host page, click +Add.
  3. Select Oozie Server from the provided list. Ambari will then proceed with the Oozie Server installation.
  4. Configure your external load balancer.
  5. To update the Oozie configuration settings, perform the following:
    1. Navigate to Services > Oozie > Configs
    2. In the Custom oozie-site, add the following property values:
Bash
Copy

List of ZooKeeper hosts with ports: For example:

Bash
Copy
Bash
Copy
  1. c. In the Advanced oozie-site, Update the following property values:
Bash
Copy

For example:

Bash
Copy
Bash
Copy
  1. In the Advanced oozie-env, uncomment the oozie_base_url property and change its value to point to the load balancer. export oozie_base_url="http://<Oozie_LoadBalancer_HOSTNAME>:11000/oozie"
  2. Restart Oozie.
  3. Update the HDFS configuration properties for the Oozie proxy user. Browse to Services > HDFS > Configs. In core-site, update the hadoop.proxyuser.oozie.hosts property to include the newly added Oozie Server host. Use commas to separate multiple host names. Example: hadoop.proxyuser.oozie.hosts = Oozie_Server_FQDN-1, Oozie_Server_FQDN-2 Or hadoop.proxyuser.oozie.hosts = *
  4. Restart required services.

Troubleshooting Steps for Oozie HA Configuration in ODP

  1. When setting up the Oozie LB without updating the oozie.services.ext Zookeeper classes, Coordinator jobs may be submitted to both Active-Active Oozie Servers. This can result in a database ERROR due to attempts at updating duplicate entries in the Oozie COORDINATOR Schema Table. This pertains to fields with restrictions within the Oozie Database.
  2. Upon completing the Oozie HA configuration using the previously mentioned steps, the Oozie Server might encounter authentication issues with the Zookeeper. This can be identified by the following error message in the /var/log/oozie/oozie.log:
    1. INFO ZKUtils:520 - SERVER[odp-master.sre-lab.acceldata.com] Connecting to ZooKeeper with SASL/Kerberos and using 'sasl' ACLs
    2. In the /var/log/oozie/jetty.log file, the following error related to the curator framework is displayed:
Bash
Copy

Root Cause Analysis for the Oozie HA Issue

Within the Oozie Libraries, there exists an outdated version of the curator jars. This version lacks the ListeningExecutorService and ProtectACLCreateModePathAndBytesable methods in its code. As a result, the aforementioned error arises when starting the Oozie Server post-configuration of the Oozie HA using Zookeeper Classes.

Bash
Copy

Steps to Solve the Oozie HA Issue

To solve the above Oozie HA issue, perform the following:

  1. Download the latest version of curator jars from here and place them in the /tmp/OozieHA directory on both Oozie Server nodes. This version addresses the aforementioned issues.
  2. Backup the lib directory found below on both Oozie Servers, as the oozie-setup.sh script incorporates all JARs within the Jetty's webapp/WEB_INF/lib/ directory.
Bash
Copy
  1. Move the four old jars provided below from the above lib directory on both Oozie Servers:
Bash
Copy
  1. Copy the all-new curator downloaded jars to Jetty’s webapp/WEB_INF/lib/ directory on both Oozie Servers using the code below:
Bash
Copy

The expected output for the above code ls -lrt cura* is displayed below:

Copy
  1. Delete the original Oozie lib file from both Oozie Server nodes. This ensures that during a restart, a new one will be generated from Jetty's webapp/WEB_INF/lib/ directory.
Bash
Copy
  1. Restart the Oozie Service.
  2. Confirm the creation of the znode for the Oozie Service using the following code:
Bash
Copy

Steps to Test the Oozie Service

To test the configuration, run sample jobs and check the logs of both Oozie Servers to ensure that requests are being received on both servers.

Bash
Copy

Note Having an older Hadoop build installed in the cluster might lead to the following error in the /var/log/oozie/jetty.log file even after addressing the aforementioned issues.

Bash
Copy
Bash
Copy

The issue arises due to the hadoop-auth-3.2.3.3.2.2.0-1.jar jar, which has an authentication problem stemming from the org.apache.hadoop.security.authentication.util.ZKSignerSecretProvider class.

To address this, you must:

  1. Download the latest version of hadoop-auth-3.2.3.3.2.2.0-1.jar from the provided Gdrive link.
  2. Replace the existing jar in the /usr/odp/3.2.3.3-2/oozie/embedded-oozie-server/webapp/WEB-INF/lib directory with the newly downloaded one.
  3. Backup the current Oozie lib directory located at /usr/odp/3.2.3.3-2/oozie/lib. This ensures that during a restart, a new directory is created from the aforementioned jetty webapp/WEB-INF/lib directory.
  4. Post replacing the jar and backing up the lib directory, restart the Oozie Server. This should resolve the issue.
  5. Execute these steps on both Oozie Server nodes.
  6. Ensure the jar file permissions are set to 755 and the ownership is designated as oozie:hadoop.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
  Last updated