Single Node Installation

RHEL 8 Setup

Prerequisites

To establish the necessary development tools and Python packages, ensure the installation of Python packages with versions 3.8 and above. Note that while this documentation employs Python 3.8, Apache Airflow is compatible with Python versions starting from 3.8 and higher.

Bash
Copy

Check if the Python 3.8 executables are present in both locations, as shown, before proceeding with the Airflow installation.

Bash
Copy

Database Setup

For an optimal test drive experience of Airflow, choose a robust database backend such as PostgreSQL or MySQL. By default, Airflow uses SQLite, which is mainly intended for development.

Airflow supports only specific versions of database engines. Verify that your database version is compatible as older versions might not support certain SQL features:

  • PostgreSQL: Versions 12, 13, 14, 15, 16
  • MySQL: Version 8.0, Innovation
  • MSSQL (experimental, support ending in version 2.9.0): 2017, 2019
  • SQLite: Version 3.15.0 or later

Before installing Apache Airflow, set up a compatible database. Choose PostgreSQL or MySQL based on your needs and ensure that your database version meets the following minimum requirements:

  • MySQL: Version 8.0 or higher
  • PostgreSQL: Version 12 or higher

NOTE Apache Airflow does not support Oracle databases.

Refer: Setup a Database Backend-Apache Airflow

Follow the respective instructions below for your chosen database system to initialise and configure it for use with Apache Airflow.

PostgreSQL Database Setup

To integrate PostgreSQL with Apache Airflow, complete the following steps to install and configure it:

  1. Install PostgreSQL:
Bash
Copy
  1. Initialize and Start PostgreSQL:
Bash
Copy
  1. Create PostgreSQL Database and User for Airflow:

To set up the database and user for Apache Airflow in PostgreSQL, follow these steps:

Bash
Copy
Bash
Copy

Now, the PostgreSQL database named airflow and the user airflow with the specified settings and privileges have been created. Proceed with the next steps to configure Apache Airflow with this PostgreSQL database.

  1. Configure PostgreSQL Settings for Airflow: After creating the Airflow database and user in PostgreSQL, modify the PostgreSQL configuration to allow connections from the Apache Airflow server. Follow these steps:
Bash
Copy
Bash
Copy

Save and close the file.

Bash
Copy
Bash
Copy

Save and close the file.

  1. Restart PostgreSQL to Apply Changes:
Bash
Copy

MySQL Database Setup for Airflow

To configure MySQL as the database backend for Apache Airflow, follow these steps:

  1. Install MySQL Server:
Bash
Copy
  1. Install the mysqlclient Python package:
Bash
Copy
  1. Start the MySQL service:
Bash
Copy
  1. Secure MySQL Installation (Optional but Recommended):
Bash
Copy

Follow the prompts to secure the MySQL installation, including setting a root password.

  1. Create Database and User for Airflow:
Bash
Copy

Enter the root password when prompted. Inside the MySQL shell:

Bash
Copy
Bash
Copy
  1. Restart MySQL to Apply Changes:
Bash
Copy

Now, the MySQL database is set up with a database named airflow and a user named airflow with the necessary privileges. Proceed to configure Apache Airflow to use this MySQL database as its backend.

CentOS 7 Setup

Prerequisites

Bash
Copy
Bash
Copy
Bash
Copy

Check if the Python 3.8 executables are present in both locations, as shown, before proceeding with the Airflow installation.

Bash
Copy

Database Setup

Follow the respective instructions below for your chosen database system to initialize and configure it for use with Apache Airflow.

PostgreSQL Database Setup.

To use PostgreSQL with Apache Airflow, follow these steps to install and configure it:

  1. Install psycopg2-binary Python Package:
Bash
Copy
  1. Install PostgreSQL:
Bash
Copy
  1. Initialize and Start PostgreSQL:
Bash
Copy
  1. Create PostgreSQL Database and User for Airflow.

To set up the database and user for Apache Airflow in PostgreSQL, follow these steps:

Bash
Copy
Bash
Copy

Now, the PostgreSQL database named airflow and the user airflow with the specified settings and privileges have been created. Proceed with the next steps to configure Apache Airflow with this PostgreSQL database.

  1. Configure PostgreSQL Settings for Airflow:

After creating the Airflow database and user in PostgreSQL, modify the PostgreSQL configuration to allow connections from the Apache Airflow server. Follow these steps:

Bash
Copy
Bash
Copy

Save and close the file.

Bash
Copy
Bash
Copy

Save and close the file.

  1. Restart PostgreSQL to Apply Changes:
Bash
Copy

MySQL Database Setup for Airflow

To set up MySQL as the database backend for Apache Airflow, follow these steps:

  1. Install MySQL Server:
Bash
Copy
  1. Install the mysqlclient Python package:
Bash
Copy
  1. Start the MySQL service:
Bash
Copy
  1. Install MySQL Connector for Python:
Bash
Copy
  1. Secure MySQL Installation (Optional but Recommended):
Bash
Copy

Follow the prompts to secure the MySQL installation, including setting a root password.

  1. Create Database and User for Airflow:
Bash
Copy

Enter the root password when prompted. Inside the MySQL shell:

Bash
Copy
  1. Restart MySQL to Apply Changes:
Bash
Copy

Now, the MySQL database is set up with a database named airflow and a user named airflow with the necessary privileges. Proceed to configure Apache Airflow to use this MySQL database as its backend.

Ubuntu20.04 Setup

Prerequisites

Bash
Copy

Check if the Python 3.8 executables are present in both locations, as shown, before proceeding with the Airflow installation.

Bash
Copy

Database Setup

Follow the respective instructions below for your chosen database system to initialize and configure it for use with Apache Airflow.

PostgreSQL Database Setup

To use PostgreSQL with Apache Airflow, follow these steps to install and configure it:

  1. Install psycopg2-binary Python Package:
Bash
Copy
  1. Install PostgreSQL:
Bash
Copy
  1. Create PostgreSQL Database and User for Airflow:

To set up the database and user for Apache Airflow in PostgreSQL, follow these steps:

Bash
Copy
Bash
Copy

Now, the PostgreSQL database named airflow and the user airflow with the specified settings and privileges have been created. Proceed with the next steps to configure Apache Airflow with this PostgreSQL database.

  1. Configure PostgreSQL Settings for Airflow:

After creating the Airflow database and user in PostgreSQL, modify the PostgreSQL configuration to allow connections from the Apache Airflow server. Follow these steps:

Bash
Copy
Bash
Copy

Save and close the file.

Bash
Copy
Bash
Copy

Save and close the file.

  1. Restart PostgreSQL to Apply Changes:
Bash
Copy

MySQL Database Setup for Airflow (Optional)

To set up MySQL as the database backend for Apache Airflow, follow these steps:

  1. Install MySQL Server:
Bash
Copy
  1. Install mysqlclient Python Package:
Bash
Copy
  1. Start MySQL Service:
Bash
Copy
  1. Install MySQL Connector for Python:
Bash
Copy
  1. Secure MySQL Installation (Optional but Recommended):
Bash
Copy
  1. Create Database and User for Airflow:
Bash
Copy

Enter the root password when prompted. Inside the MySQL shell:

Bash
Copy
  1. Restart MySQL to Apply Changes:
Bash
Copy
  1. Now, the MySQL database is set up with a database named airflow and a user named airflow with the necessary privileges. Proceed to configure Apache Airflow to use this MySQL database as its backend.

Apache Airflow Installation using Mpack on Ambari

Create symbolic links for Python to use Python 3.8:

Bash
Copy

This following provides the steps for installing and setting up Apache Airflow using Management Pack (Mpack) on an Ambari-managed cluster.

Install and Configure Mpack:

  1. Install Mpack:
Bash
Copy
  1. Uninstall Previous Mpack (if needed):
Bash
Copy
  1. Change Symlinks:
  • Navigate to the services directory and update the Airflow symlink for each service version:
Bash
Copy
  1. Restart Ambari Server:
Bash
Copy

Your Apache Airflow installation is now configured and ready for use on your Ambari-managed cluster.

Steps to install Apache Airflow from the Ambari UI

  1. Add the Airflow service from the Ambari UI.
  1. Specify the host details for the Airflow Scheduler and Airflow Webserver.
  1. Choose the slave client configuration.
  1. Modify or customize the fields as needed.

Database Options:

Choose between MySQL or PostgreSQL as the backend database:

Configuring the Airflow backend database connection string and Celery settings. Users will be prompted to input specific information, including the database name, password, username, database type (choose between MySQL or PostgreSQL), and host IP. The provided script will then automatically generate the necessary configuration details for the database connection string and Celery settings.

Enter Database Information in ambari UI.

  • Database Name
  • Password
  • Username
  • Database Type: Choose between mysql or postgresql.
  • Host IP

If you are using RabbitMQ then you have to setup and add RabbitMQ configurations.

  • RabbitMQ Username
  • RabbitMQ Password
  • RabbitMQ virtual host
  • Celery Broker

Once you have provided all the necessary details, click on the Next button.

  1. Deploy the Airflow service.
  1. This step will install all the necessary components and initiate the service.
  1. The Airflow webserver is up and running. To access the UI, you need to create a username and password. To create the admin user, you must run the initdb command from the Ambari UI.

This command will generate an admin user named "airflow" with the password "airflow."

You can utilize these credentials (username: "airflow", password: "airflow") to log in and access the Airflow webserver UI.

On completion of the database initialization, you can access the Airflow Webserver UI. Enter the provided credentials ("airflow" as the username and "airflow" as the password) to log in and access the Airflow webserver UI.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
  Last updated