Configure Airflow and Pulse to Monitor Standalone Airflow on Kubernetes

This page describes how to configure both Airflow and Pulse to enable Airflow to emit metrics and events, and for Pulse to collect, process, and visualize them from a standalone Airflow deployment running on Kubernetes.

This is supported in Pulse 4.1.0 and later.

For details, see the following sections.

Configure Airflow on Kubernetes to Emit Metrics

This configuration enables Airflow to emit events and metrics that Pulse can collect and visualize.

Step 1: Install the Acceldata Event Listener Plugin

  1. Copy the Acceldata event listener JAR into the Airflow event listener plugin directory.
  2. Update config.ini with the correct cluster name and Events endpoint.

Example config.ini:

Bash
Copy

Verify the plugin is loaded:

Bash
Copy

Step 2: Create a ConfigMap for the Plugin and Mount It to the Scheduler

  1. Create a ConfigMap from the event listener plugin files:
Bash
Copy
  1. Mount the ConfigMap in the Airflow scheduler deployment:
Bash
Copy

Step 3: Install nats-py in the Scheduler Container

Update the scheduler container args to install the nats-py dependency:

Bash
Copy

Step 4: Distribute the Plugin to All Task Pods

Ensure all task pods receive the event listener plugin by using a pod-template.yaml.

Example pod-template.yaml:

Bash
Copy

Step 5: Validate Event Streaming to NATS

Trigger a DAG run from the Airflow UI, then verify Airflow streams are created in NATS:

Bash
Copy

Example output:

  • airflow_events_airflowonk8s
  • processed_airflow_events_0

Configure Pulse to Monitor Airflow on Kubernetes

This configuration enables Pulse to collect Airflow events and StatsD metrics from Kubernetes for monitoring and visualization.

Prerequisites

  • PulseNode DaemonSet deployed on the Kubernetes cluster

Configure PulseNode to Collect Airflow StatsD Metrics

PulseNode (Telegraf) collects Airflow StatsD metrics over UDP port 8125 and writes them to the VM DB.

  1. On the Pulse node, update:

Path: /opt/pulse/node/config/node.conf

  1. Add the following Telegraf StatsD input configuration:
Bash
Copy

After completing these steps, Pulse collects Airflow metrics through StatsD and receives Airflow events through NATS for monitoring and analysis.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard