Title
Create new category
Edit page index title
Edit category
Edit link
Scheduler and Webserver High Availability for Airflow
This document describes how to configure High Availability (HA) for Apache Airflow Schedulers and Webservers in an ODP deployment. Scheduler HA improves reliability and fault tolerance, while Webserver HA ensures consistent UI access through a load balancer.
High Availability Support in ODP Airflow
ODP Airflow supports running multiple schedulers and webservers in an HA configuration.
- During a fresh installation, users can choose to add multiple schedulers and webservers.
- For an existing installation, additional scheduler or webserver components can be added via Ambari → Hosts → Add, and assigning the required roles.
Additional Configurations for Scheduler HA
The following configurations are recommended to ensure consistent and predictable behavior when running multiple schedulers.
Configure these settings in Advanced airflow-scheduler-site.
Required and Recommended Properties
In Advanced airflow-scheduler-site
use_job_schedule: True- Mandatory for Scheduler HA.
- Enables consistent job scheduling across multiple schedulers.
deactivate_unknown_dags: False- Prevents schedulers from disabling DAGs managed by other schedulers.
parsing_processes: 4- Improves DAG parsing throughput through parallelism
scheduler_health_check_threshold: 90- Defines an HA-friendly scheduler liveness threshold (in seconds).
Additional Steps and Configurations for Webserver HA
For Webserver HA, deploying a Load Balancer in front of Airflow webservers is strongly recommended.
Ambari may not enforce all required configurations automatically in HA setups; therefore, the following settings should be verified and applied manually if necessary.
Webserver Configuration Updates
Update the following settings in Advanced airflow-webserver-site.
enable_proxy_fix: True- Required when Airflow is deployed behind a load balancer.
- Ensures proper handling of
X-Forwarded-*HTTP headers.
Load Balancer URL Configuration
Update the Load Balancer URL in the following locations.
In Advanced airflow-webserver-site
base_url: <Load Balancer URL for Airflow UI>In Custom airflow-webserver-site
web_server_url: <Load Balancer URL for Airflow UI>These values must point to the Load Balancer endpoint, not individual web server hosts.
HAProxy Setup for Airflow Webservers (Load Balancer)
The following example demonstrates configuring HAProxy as a load balancer for Airflow web servers on RHEL 8. Similar steps can be applied to Ubuntu or other Linux distributions.
Install HAProxy
yum install haproxyHAProxy Configuration Example
Update /etc/haproxy/haproxy.cfg as shown below. Replace hostnames and ports with values from your Airflow deployment.
global log _dev_log local0 log _dev_log local1 notice maxconn 50000 daemon user haproxy group haproxydefaults log global mode http option httplog option dontlognull timeout connect 5s timeout client 50s timeout server 50sfrontend http_front bind *:8080 mode http default_backend airflow_web_backendsbackend airflow_web_backends mode http balance roundrobin option httpchk GET _health server web1 poc1.acceldata.ce:8889 check server web2 poc2.acceldata.ce:8889 checkReplace poc1.acceldata.ce and poc2.acceldata.ce with the actual hostnames and ports of your Airflow webservers.
Start HAProxy
sudo systemctl start haproxy