Single Node Installation

RHEL 8/9 Setup

Prerequisites

To establish the necessary development tools and Python packages, ensure the installation of Python packages with versions 3.11 and above. Note that while this documentation employs Python 3.11, Apache Airflow is compatible with Python versions starting from 3.11 and higher.

Bash
Copy

Database Setup

For an optimal test drive experience of Airflow, choose a robust database backend such as PostgreSQL or MySQL. By default, Airflow uses SQLite, which is mainly intended for development.

Airflow supports only specific versions of database engines. Verify that your database version is compatible as older versions might not support certain SQL features:

  • PostgreSQL: Versions 12, 13, 14, 15, 16
  • MySQL: Version 8.0, Innovation
  • MSSQL (experimental, support ending in version 2.9.0): 2017, 2019
  • SQLite: Version 3.15.0 or later

Before installing Apache Airflow, set up a compatible database. Choose PostgreSQL or MySQL based on your needs and ensure that your database version meets the following minimum requirements:

  • MySQL: Version 8.0 or higher
  • PostgreSQL: Version 12 or higher

NOTE Apache Airflow does not support Oracle databases.

Refer: Setup a Database Backend-Apache Airflow

Refer to the guidelines below for your selected database to prepare and configure it for Apache Airflow.

PostgreSQL Database Setup

To integrate PostgreSQL with Apache Airflow, complete the following steps to install and configure it:

Install the psycopg2-binary Python package:

Bash
Copy

Install PostgreSQL:

  1. Install the repository RPM:
Bash
Copy
  1. Disable the built-in PostgreSQL module:
Bash
Copy
  1. Install PostgreSQL:
Bash
Copy

Initialize and start PostgreSQL:

  1. Access the PostgreSQL shell:
Bash
Copy
  1. Inside the PostgreSQL shell, execute the following commands:
  • Create the Airflow database:
SQL
Copy
  • Create the Airflow user with a password:
SQL
Copy
  • Set client encoding, default transaction isolation, and timezone for the Airflow user:
SQL
Copy
  • Grant all privileges on the Airflow database to the Airflow user:
SQL
Copy
  • Exit the PostgreSQL shell:
SQL
Copy

Configure PostgreSQL settings for Airflow:

  1. Open the PostgreSQL configuration file:
Bash
Copy
  1. Inside the file, modify the following settings:
  • Change and uncomment the listen_addresses to '*':
Bash
Copy
  • Uncomment the following line (remove the '#' at the beginning):
Bash
Copy
  • Save and close the file.
  1. Open the pg_hba.conf file:
Bash
Copy
  • Add this line at the end of the file:
text
Copy
  • Replace {host_IP} with the actual IP address of the machine running Apache Airflow.
  • Save and close the file.

Restart PostgreSQL to apply changes:

Bash
Copy

MySQL Database Setup for Airflow

To configure MySQL as the database backend for Apache Airflow, follow these steps:

Install MySQL Server:

Bash
Copy

Install the mysqlclient Python package:

Bash
Copy

Start the MySQL service:

Bash
Copy

Install MySQL Connector for Python:

Bash
Copy

Secure MySQL Installation (Optional but Recommended):

Bash
Copy

Follow the on-screen prompts to complete the security setup.

Create a database and user for Airflow:

  1. Access the MySQL shell:
Bash
Copy

Enter the root password when prompted.

  1. Inside the MySQL shell, execute the following commands:
  • Create the Airflow database:
SQL
Copy
  • Create a user for Airflow and set permissions:
SQL
Copy

Restart MySQL to apply changes:

Bash
Copy

With these steps, the MySQL database named 'airflow' and a user named 'airflow' are now set up with the necessary privileges. You can now proceed to configure Apache Airflow to use this MySQL database as its backend.

Ubuntu 20/22 Setup

Prerequisites

Update the package list:

Bash
Copy

Add the Deadsnakes PPA repository to install newer Python versions:

Bash
Copy

Install Python 3.11 and the Python virtual environment package:

Bash
Copy

Verify the installation of Python and pip:

  1. Check the Python version:
Bash
Copy
  1. Check the pip version:
Bash
Copy

Database Setup

Follow the respective instructions below to configure your chosen database system for use with Apache Airflow.

PostgreSQL Database Setup

Install the psycopg2-binary Python package:

Bash
Copy

Install PostgreSQL:

Bash
Copy

Create a PostgreSQL database and user for Airflow:

  1. Access the PostgreSQL shell:
Bash
Copy
  1. Inside the PostgreSQL shell, execute the following commands:
  • Create the Airflow database:
SQL
Copy
  • Create the Airflow user with a password:
SQL
Copy
  • Configure settings for the Airflow user:
SQL
Copy
  • Grant all privileges on the Airflow database to the Airflow user:
SQL
Copy
  • Exit the PostgreSQL shell:
SQL
Copy

Configure PostgreSQL settings for Airflow:

  1. Open and edit the PostgreSQL configuration file:
Bash
Copy
  • Change and uncomment the listen_addresses to allow all connections:
Bash
Copy
  • Uncomment the port line to use the default port 5432:
Bash
Copy
  1. Modify the pg_hba.conf file to allow specific connections:
Bash
Copy
  • Add this line to permit connections from the Airflow server:
Bash
Copy
  • Replace {host_IP} with the actual IP address of the machine running Apache Airflow.

Restart PostgreSQL to apply changes:

Bash
Copy

These steps have prepared your PostgreSQL database named 'airflow' and a user named 'airflow' with the necessary settings and privileges. You can now proceed to integrate this setup into Apache Airflow's configuration.

MySQL Database Setup for Airflow

Install MySQL Server:

  1. Download and install the MySQL APT repository:
Bash
Copy
  1. Update the package list and import the repository key:
Bash
Copy
  1. Check the MySQL server version available and install it:
Bash
Copy
  1. Install the MySQL Connector for Java:
Bash
Copy

Install the mysqlclient Python package:

Bash
Copy

Start the MySQL service:

Bash
Copy

Install MySQL Connector for Python:

Bash
Copy

Secure MySQL Installation (Optional but Recommended):

Run the command to secure your MySQL installation, including setting a root password:

Bash
Copy

Follow the on-screen prompts to complete the security setup.

Create a database and user for Airflow:

  1. Access the MySQL shell:
Bash
Copy

Enter the root password when prompted.

  1. Inside the MySQL shell, execute the following commands:
  • Create the Airflow database with UTF-8 encoding:
SQL
Copy
  • Create a user for Airflow and grant privileges:
SQL
Copy

Restart MySQL to apply changes:

SQL
Copy

With these steps completed, the MySQL database named 'airflow' and a user named 'airflow' are set up with the necessary privileges. You can now proceed to configure Apache Airflow to use this MySQL database as its backend.

Before proceeding with the Apache Airflow installation from Ambari, ensure the Apache Airflow repository is set up correctly.

Apache Airflow Installation using Mpack on Ambari

Create symbolic links for Python to use Python 3.11:

Bash
Copy

This following provides the steps for installing and setting up Apache Airflow using Management Pack (Mpack) on an Ambari-managed cluster.

Install and Configure Mpack:

  1. Install Mpack:
Bash
Copy
  1. Uninstall Previous Mpack (if needed):
Bash
Copy
  1. Change Symlinks:
  • Navigate to the services directory and update the Airflow symlink for each service version:
Bash
Copy
  1. Restart Ambari Server:
Bash
Copy

Your Apache Airflow installation is now configured and ready for use on your Ambari-managed cluster.

Steps to install Apache Airflow from the Ambari UI

  1. Add the Airflow service from the Ambari UI.
  1. Specify the host details for the Airflow Scheduler and Airflow Webserver.
  1. Choose the slave client configuration.
  1. Modify or customize the fields as needed.

Database Options:

Choose between MySQL or PostgreSQL as the backend database:

Configuring the Airflow backend database connection string and Celery settings. Users will be prompted to input specific information, including the database name, password, username, database type (choose between MySQL or PostgreSQL), and host IP. The provided script will then automatically generate the necessary configuration details for the database connection string and Celery settings.

Enter Database Information in ambari UI.

  • Database Name
  • Password
  • Username
  • Database Type: Choose between mysql or postgresql.
  • Host IP

If you are using RabbitMQ then you have to setup and add RabbitMQ configurations.

  • RabbitMQ Username
  • RabbitMQ Password
  • RabbitMQ virtual host
  • Celery Broker

Once you have provided all the necessary details, click on the Next button.

  1. Deploy the Airflow service.
  1. This step will install all the necessary components and initiate the service.
  1. The Airflow webserver is up and running. To access the UI, you need to create a username and password. To create the admin user, you must run the initdb command from the Ambari UI.

This command will generate an admin user named "airflow" with the password "airflow."

You can utilize these credentials (username: "airflow", password: "airflow") to log in and access the Airflow webserver UI.

On completion of the database initialization, you can access the Airflow Webserver UI. Enter the provided credentials ("airflow" as the username and "airflow" as the password) to log in and access the Airflow webserver UI.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
  Last updated