Enabling Oozie High Availability
This document outlines the steps to configure Oozie High Availability (HA) to target the load balancer server. Oozie servers utilize Zookeeper for database access coordination and for intercommunication.
Prerequisites
For setting up Oozie HA in the ODP cluster, consider the following prerequisites:
- Apache ZooKeeper, an open-source distributed coordination service, facilitates communication and database access coordination among Oozie servers. For complete HA, a minimum of three ZooKeeper servers are recommended.
- A database capable of handling multiple simultaneous connections. For optimal HA, the database must also support HA to prevent it from being a potential single point of failure.
- Although it's not mandatory for all settings, maintaining uniform configuration across all servers is advised for consistency.
- Configure a load balancer, Virtual IP, or Round-Robin DNS to offer users and callbacks from the job tracker or resource manager as a unified access point. The load balancer must be set up for round-robin distribution among the Oozie servers. Users, whether they're using the Oozie client, a web browser, or the REST API, must connect via the load balancer. To achieve comprehensive HA, the load balancer must also be HA compliant, otherwise, it becomes a vulnerability as a single point of failure.
Steps to Enable Oozie HA
To enable Oozie HA, perform the following:
- In Ambari Web, navigate to the host where you intend to set up an additional Oozie Server.
- On the Host page, click +Add.
- Select Oozie Server from the provided list. Ambari will then proceed with the Oozie Server installation.
- Configure your external load balancer.
- To update the Oozie configuration settings, perform the following:
- Navigate to Services > Oozie > Configs
- In the Custom oozie-site, add the following property values:
oozie.zookeeper.connection.string
List of ZooKeeper hosts with ports: For example:
oozie.zookeeper.connection.string = ODP-Zookeeper-1_FQDN:2181,ODP-Zookeeper-2_FQDN:2181,ODP-Zookeeper-3_FQDN:2181
oozie.service.ZKLocksService.lockTimeout = 30000
oozie.zookeeper.namespace = oozie
oozie.zookeeper.secure = true ( This is required for the Kerberos-based Cluster)
- c. In the Advanced oozie-site, Update the following property values:
oozie.services.ext
org.apache.oozie.service.ZKLocksService,
org.apache.oozie.service.ZKXLogStreamingService,
org.apache.oozie.service.ZKJobsConcurrencyService
org.apache.oozie.service.ZKUUIDService
For example:
oozie.services.ext = org.apache.oozie.service.JMSAccessorService,org.apache.oozie.service.PartitionDependencyManagerService,org.apache.oozie.service.HCatAccessorService,org.apache.oozie.service.ZKLocksService,org.apache.oozie.service.ZKXLogStreamingService,org.apache.oozie.service.ZKJobsConcurrencyService,org.apache.oozie.service.ZKUUIDService
Oozie.base.url = http://<Oozie_LoadBalancer_HOSTNAME>:11000/oozie
- In the Advanced oozie-env, uncomment the oozie_base_url property and change its value to point to the load balancer. export oozie_base_url="http://<Oozie_LoadBalancer_HOSTNAME>:11000/oozie"
- Restart Oozie.
- Update the HDFS configuration properties for the Oozie proxy user. Browse to Services > HDFS > Configs. In core-site, update the hadoop.proxyuser.oozie.hosts property to include the newly added Oozie Server host. Use commas to separate multiple host names. Example: hadoop.proxyuser.oozie.hosts = Oozie_Server_FQDN-1, Oozie_Server_FQDN-2 Or hadoop.proxyuser.oozie.hosts = *
- Restart required services.
Troubleshooting Steps for Oozie HA Configuration in ODP
- When setting up the Oozie LB without updating the oozie.services.ext Zookeeper classes, Coordinator jobs may be submitted to both Active-Active Oozie Servers. This can result in a database ERROR due to attempts at updating duplicate entries in the Oozie COORDINATOR Schema Table. This pertains to fields with restrictions within the Oozie Database.
- Upon completing the Oozie HA configuration using the previously mentioned steps, the Oozie Server might encounter authentication issues with the Zookeeper. This can be identified by the following error message in the /var/log/oozie/oozie.log:
- INFO ZKUtils:520 - SERVER[odp-master.sre-lab.acceldata.com] Connecting to ZooKeeper with SASL/Kerberos and using 'sasl' ACLs
- In the /var/log/oozie/jetty.log file, the following error related to the curator framework is displayed:
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.util.concurrent.MoreExecutors.sameThreadExecutor()Lcom/google/common/util/concurrent/ListeningExecutorService;
at org.apache.curator.framework.listen.ListenerContainer.addListener(ListenerContainer.java:40)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.start(CuratorFrameworkImpl.java:262)
at org.apache.oozie.util.ZKUtils.createClient(ZKUtils.java:213)
at org.apache.oozie.util.ZKUtils.<init>(ZKUtils.java:150)
at org.apache.oozie.util.ZKUtils.register(ZKUtils.java:164)
at org.apache.oozie.service.ZKLocksService.init(ZKLocksService.java:72)
Root Cause Analysis for the Oozie HA Issue
Within the Oozie Libraries, there exists an outdated version of the curator jars. This version lacks the ListeningExecutorService and ProtectACLCreateModePathAndBytesable methods in its code. As a result, the aforementioned error arises when starting the Oozie Server post-configuration of the Oozie HA using Zookeeper Classes.
ls -lrt /usr/odp/3.2.3.3-2/oozie/lib/cur*
curator-client-2.5.0.jar
curator-framework-2.5.0.jar
curator-x-discovery-2.5.0.jar
curator-recipes-2.5.0.jar
Steps to Solve the Oozie HA Issue
To solve the above Oozie HA issue, perform the following:
- Download the latest version of curator jars from here and place them in the /tmp/OozieHA directory on both Oozie Server nodes. This version addresses the aforementioned issues.
- Backup the lib directory found below on both Oozie Servers, as the oozie-setup.sh script incorporates all JARs within the Jetty's webapp/WEB_INF/lib/ directory.
cd /usr/odp/3.2.3.3-2/oozie/embedded-oozie-server/
cp webapp/WEB-INF/lib webapp/WEB-INF/lib_backup_TIMESTAMP
- Move the four old jars provided below from the above lib directory on both Oozie Servers:
curator-client-2.5.0.jar
curator-framework-2.5.0.jar
curator-recipes-2.5.0.jar
curator-x-discovery-2.5.0.jar
mv /usr/odp/3.2.3.3-2/oozie/embedded-oozie-server/webapp/WEB-INF/lib/curator-* /tmp
- Copy the all-new curator downloaded jars to Jetty’s webapp/WEB_INF/lib/ directory on both Oozie Servers using the code below:
cp /tmp/OozieHA/curator-*-4.3.0.jar /usr/odp/3.2.3.3-2/oozie/embedded-oozie-server/webapp/WEB-INF/lib/
chmod -R 755 /usr/odp/3.2.3.3-2/oozie/embedded-oozie-server/webapp/WEB-INF/lib
chown -R oozie:hadoop /usr/odp/3.2.3.3-2/oozie/embedded-oozie-server/webapp/WEB-INF/lib
cd /usr/odp/3.2.3.3-2/oozie/embedded-oozie-server/webapp/WEB-INF/lib
pwd /usr/odp/3.2.3.3-2/oozie/embedded-oozie-server/webapp/WEB-INF/lib
ls -lrt cura*
The expected output for the above code ls -lrt cura*
is displayed below:
- Delete the original Oozie lib file from both Oozie Server nodes. This ensures that during a restart, a new one will be generated from Jetty's webapp/WEB_INF/lib/ directory.
mv /usr/odp/3.2.3.3-2/oozie/lib /usr/odp/3.2.3.3-2/oozie/lib_backup_TIMESTAMP
- Restart the Oozie Service.
- Confirm the creation of the znode for the Oozie Service using the following code:
[zk:(CONNECTED) 0] ls /oozie
[locks, services]
Steps to Test the Oozie Service
To test the configuration, run sample jobs and check the logs of both Oozie Servers to ensure that requests are being received on both servers.
oozie admin -status
# System mode: NORMAL
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.curator.framework.api.CreateBuilder.creatingParentsIfNeeded()Lorg/apache/curator/framework/api/ProtectACLCreateModePathAndBytesable;
at org.apache.hadoop.security.authentication.util.ZKSignerSecretProvider.init(ZKSignerSecretProvider.java:193)
at org.apache.hadoop.security.authentication.server.AuthenticationFilter.constructSecretProvider(AuthenticationFilter.java:253)
at org.apache.hadoop.security.authentication.server.AuthenticationFilter.initializeSecretProvider(AuthenticationFilter.java:209)
at org.apache.hadoop.security.authentication.server.AuthenticationFilter.init(AuthenticationFilter.java:178)
at org.apache.oozie.servlet.AuthFilter.init(AuthFilter.java:69)
oozie admin -status
java.io.FileNotFoundException: Error while authenticating with endpoint: http://hdpfsappr1.true.care:11000/oozie/versions
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at org.apache.oozie.cli.OozieCLI.adminCommand(OozieCLI.java:2027)
at org.apache.oozie.cli.OozieCLI.processCommand(OozieCLI.java:733)
at org.apache.oozie.cli.OozieCLI.run(OozieCLI.java:682)
at org.apache.oozie.cli.OozieCLI.main(OozieCLI.java:245)
Caused by: java.io.FileNotFoundException: http://oozie-server-1:11000/oozie/versions?user.name=oozie
at org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:393)
The issue arises due to the hadoop-auth-3.2.3.3.2.2.0-1.jar jar, which has an authentication problem stemming from the org.apache.hadoop.security.authentication.util.ZKSignerSecretProvider
class.
To address this, you must:
- Download the latest version of
hadoop-auth-3.2.3.3.2.2.0-1.jar
from the provided Gdrive link. - Replace the existing jar in the
/usr/odp/3.2.3.3-2/oozie/embedded-oozie-server/webapp/WEB-INF/lib
directory with the newly downloaded one. - Backup the current Oozie lib directory located at
/usr/odp/3.2.3.3-2/oozie/lib
. This ensures that during a restart, a new directory is created from the aforementioned jettywebapp/WEB-INF/lib
directory. - Post replacing the jar and backing up the lib directory, restart the Oozie Server. This should resolve the issue.
- Execute these steps on both Oozie Server nodes.
- Ensure the jar file permissions are set to
755
and the ownership is designated asoozie:hadoop
.