Advanced Configurations

Apache Airflow LDAP Integration

To install required packages and configure Docker for LDAP Server and PHPLDAPADMIN, perform the following:

  1. Install and Configure Packages
  • Install the OpenLDAP development package:
Bash
Copy
  • Install Apache Airflow with LDAP support:
Bash
Copy
  • Register the system and manage subscriptions:
Bash
Copy
  1. Set Up Docker for LDAP
  • Add the Docker repository and install Docker CE with required configurations:
Bash
Copy
  • Start and verify Docker service:
Bash
Copy
  1. Download and Configure Docker Compose
  • Obtain Docker Compose and set permissions:
Bash
Copy
  1. Prepare LDAP Directory Structure.
  • Create directories and prepare the LDAP configuration in Airflow home directory:
Bash
Copy
  • Add the following content in the bootstrap.ldif file:
Bash
Copy
  1. Configure Docker Services for LDAP
  • Create and configure docker-compose.yml in the airflow_ldap directory for LDAP and PHPLDAPADMIN:
Bash
Copy
  • Deploy services:
Bash
Copy
  1. Access PHPLDAPADMIN UI.
  • Use the URL http://{hostname/IP}:8888/ to configure Apache Airflow LDAP authentication.
  1. Once the User Interface is accessible, modify Airflow Configuration
  • Adjust webserver_config.py for LDAP integration:
Bash
Copy
  1. Update Airflow through Ambari UI
  • Navigate to the Ambari UI and update LDAP settings as specified:
Bash
Copy

Advanced airflow-ldap-site:

Advanced airflow-api-site:

Advanced airflow-webserver-site:

  1. Restart Airflow Components
  • Restart to apply the new configurations and access the Airflow webserver UI using LDAP credentials.

It is advised to set up your own LDAP server with custom user configurations for security and management purposes.

Bash
Copy

Apache Airflow SSL and Kerberos Integration

This section provides step by step instructions for setting up SSL within the Airflow environment. Beginning with generating SSL certificates and keys using OpenSSL, the process continues with configuring Airflow's web server settings via the Ambari UI. Following these steps ensures a robust SSL setup, enabling users to access the Airflow web server securely. Additionally, this guide touches on Kerberos integration, clarifying automated setup within Airflow and offering instructions for disabling Kerberos if needed.

SSL Setup

To set up SSL, follow these steps:

  1. Navigate to the Airflow home directory.
  2. Generate a new SSL certificate and key by running the following OpenSSL command:
Bash
Copy
  1. After generating the SSL certificate and key, update the Airflow webserver configurations from the Ambari UI with the following settings:

    1. Set SSL Enable to True.
    2. Specify the web server host IP address.
  2. Additionally, in the Ambari UI for the web server configuration, update the following parameters:

    1. Web server SSL certificate: /usr/odp/3.2.3.3-3/airflow/airflow.crt
    2. Web server SSL key: /usr/odp/3.2.3.3-3/airflow/domain.key

Specify the file path for the certificate and key files in the Ambari UI. __

  1. Adjust the ownership and permissions of the SSL certificate and key files using the following commands:
Bash
Copy

Ensure that the Certificate and Key files have the correct permissions assigned to them for both the airflow user and group.

  1. Restart the Airflow components and then access the Airflow webserver UI using the HTTPS protocol and the specified host IP and port (8889).
  2. Check if the HTTPS URL is functioning properly.

Kerberos Setup

Kerberos integration in Apache Airflow is automated when the cluster has Kerberos enabled. You need not manually configure Kerberos for Airflow. To disable or stop Kerberos after it's been enabled, simply toggle the Disable Kerberos checkbox. Kerberos will then be deactivated for the Apache Airflow service.

The checkbox to disable Kerberos can be found in the Advanced Kerberos Site section. Navigate to this section, locate the checkbox labeled Disable Kerberos, and select it to deactivate Kerberos for the Apache Airflow service.

Apache Airflow RabbitMQ Setup

RabbitMQ, an implementation of the Advanced Message Queuing Protocol (AMQP), serves as a queueing service. It stands as a fast and reliable open-source message server, accommodating diverse scenarios such as reliable integration, content-based routing, global data dissemination, and robust monitoring alongside high-capacity data ingestion.

To configure RabbitMQ, perform the following:

  1. Install the EPEL repository and the curl utility:
Bash
Copy
  1. Add RabbitMQ repositories and proceed to install Erlang:
Bash
Copy
  1. Install the RabbitMQ server:
Bash
Copy
  1. Initiate and enable the RabbitMQ server:
Bash
Copy
  1. Create an administrator user:
Bash
Copy
  1. Configure tags for the administrator user:
Bash
Copy
  1. View the list of users:
Bash
Copy
  1. Establish a virtual host specifically for Airflow:
Bash
Copy
  1. Review the list of virtual hosts:
Bash
Copy
  1. Define permissions for the administrator user on the Airflow virtual host:
Bash
Copy
  1. Enable the RabbitMQ management plugin:
Bash
Copy
  1. Restart the RabbitMQ server:
Bash
Copy
  1. Install the pyamqp package for Python 3:
Bash
Copy
  1. Verify the current status:
Bash
Copy

Update RabbitMQ Configuration via Ambari UI

For updating RabbitMQ configuration via the Ambari UI, perform the following:

  1. Log in to the Ambari UI.
  2. Navigate to the RabbitMQ service within the database configuration section.
  3. Locate the configuration section for the RabbitMQ service.
  4. Adjust relevant parameters in the RabbitMQ configuration, including host, port, credentials, and virtual host.
  5. Save the changes made.
  6. Restart the Airflow service, if necessary, to implement the modifications.

By utilizing the Ambari UI to update RabbitMQ configuration, ensure that the RabbitMQ service aligns with your specific needs.

Initiate the Airflow database initialization process through the Ambari UI.

Restart the Airflow services.

Check within the RabbitMQ UI to confirm the successful establishment of the connection.

Monitor Celery Workers Using Flower UI

To monitor Celery workers, utilize the Flower UI.

Flower is a web-based tool designed for monitoring and managing Celery clusters. It features a user interface that provides comprehensive insights into all Celery workers. The interface offers clear statistics regarding active tasks and processed tasks, indicating their success or failure status, along with the load average of the tasks. Additionally, Flower maintains detailed information such as task names, arguments, results, and task completion times.

The Flower UI for Celery Worker is currently operational on port 5555. However, it is incompatible with SQL brokers. To enable monitoring with the Flower UI for Celery workers, it is necessary to configure RabbitMQ as the message queuing broker.

Once RabbitMQ is configured and the connection is established as visible in the RabbitMQ UI, attempt to access the Flower UI. This should allow you to observe the Celery workers running within the Flower UI.

Note If you are utilizing MySQL or PostgreSQL as the message broker or Celery broker, note that Flower is not compatible with SQL brokers. Therefore, attempting to access the Flower UI will result in encountering the following error message within the Flower UI.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
  Last updated