Scheduler and Webserver High Availability for Airflow

This document describes how to configure High Availability (HA) for Apache Airflow Schedulers and Webservers in an ODP deployment. Scheduler HA improves reliability and fault tolerance, while Webserver HA ensures consistent UI access through a load balancer.

High Availability Support in ODP Airflow

ODP Airflow supports running multiple schedulers and webservers in an HA configuration.

  • During a fresh installation, users can choose to add multiple schedulers and webservers.
  • For an existing installation, additional scheduler or webserver components can be added via Ambari → Hosts → Add, and assigning the required roles.

Additional Configurations for Scheduler HA

The following configurations are recommended to ensure consistent and predictable behavior when running multiple schedulers.

Configure these settings in Advanced airflow-scheduler-site.

In Advanced airflow-scheduler-site

Bash
Copy
  • Mandatory for Scheduler HA.
  • Enables consistent job scheduling across multiple schedulers.
Bash
Copy
  • Prevents schedulers from disabling DAGs managed by other schedulers.
Bash
Copy
  • Improves DAG parsing throughput through parallelism
Bash
Copy
  • Defines an HA-friendly scheduler liveness threshold (in seconds).

Additional Steps and Configurations for Webserver HA

For Webserver HA, deploying a Load Balancer in front of Airflow webservers is strongly recommended.

Ambari may not enforce all required configurations automatically in HA setups; therefore, the following settings should be verified and applied manually if necessary.

Webserver Configuration Updates

Update the following settings in Advanced airflow-webserver-site.

Bash
Copy
  • Required when Airflow is deployed behind a load balancer.
  • Ensures proper handling of X-Forwarded-* HTTP headers.

Load Balancer URL Configuration

Update the Load Balancer URL in the following locations.

In Advanced airflow-webserver-site

Bash
Copy

In Custom airflow-webserver-site

Bash
Copy

These values must point to the Load Balancer endpoint, not individual web server hosts.

HAProxy Setup for Airflow Webservers (Load Balancer)

The following example demonstrates configuring HAProxy as a load balancer for Airflow web servers on RHEL 8. Similar steps can be applied to Ubuntu or other Linux distributions.

Install HAProxy

Bash
Copy

HAProxy Configuration Example

Update /etc/haproxy/haproxy.cfg as shown below. Replace hostnames and ports with values from your Airflow deployment.

Bash
Copy

Replace poc1.acceldata.ce and poc2.acceldata.ce with the actual hostnames and ports of your Airflow webservers.

Start HAProxy

Bash
Copy
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
  Last updated