Convert Oozie Workflows to Airflow DAGs
System Requirements
To run the converter, you must have Python 3.8 or later installed.
Install the Oozie-to-Airflow (O2A) Converter
Build the required pip packages and install them on all Airflow workers and on the node where you're converting workflows to DAGs.
source /usr/odp/3.3.6.x-x/airflow/bin/activate
pip install acceldata-o2a
pip install acceldata-o2a-lib
pip install --upgrade wrapt
If you're installing the Oozie-to-Airflow converter in an air-gapped environment, use the provided tarballs to install the required pip packages.
pip install https://mirror.odp.acceldata.dev/v2/standalone_tarballs/o2a/1.0.0/acceldata_o2a-1.0.0.tar.gz
pip install https://mirror.odp.acceldata.dev/v2/standalone_tarballs/o2a/1.0.0/acceldata_o2a_lib-1.0.0.tar.gz
pip install --upgrade wrapt
Below is the comprehensive usage guide for the Oozie-to-Airflow (O2A) converter.
x
(airflow) [root@airflowdemonode01 o2a]# ./bin/o2a -help
usage: o2a [-h] -i INPUT_DIRECTORY_PATH -o OUTPUT_DIRECTORY_PATH [-n DAG_NAME] [-u USER] [-s START_DAYS_AGO] [-x SCHEMA_VERSION] [-skv SKIP_VALIDATION]
[-v SCHEDULE_INTERVAL] [-d]
Convert Apache Oozie workflows to Apache Airflow workflows.
options:
-h, --help show this help message and exit
-i INPUT_DIRECTORY_PATH, --input-directory-path INPUT_DIRECTORY_PATH
Path to input directory
-o OUTPUT_DIRECTORY_PATH, --output-directory-path OUTPUT_DIRECTORY_PATH
Desired output directory
-n DAG_NAME, --dag-name DAG_NAME
Desired DAG name [defaults to input directory name]
-u USER, --user USER The user to be used in place of all ${user.name} [defaults to user who ran the conversion]
-s START_DAYS_AGO, --start-days-ago START_DAYS_AGO
Desired DAG start as number of days ago
-x SCHEMA_VERSION, --schema-version SCHEMA_VERSION
Desired Oozie all schema version.[1.0,0.4]
-skv SKIP_VALIDATION, --skip-validation SKIP_VALIDATION
skip validation
-v SCHEDULE_INTERVAL, --schedule-interval SCHEDULE_INTERVAL
Desired DAG schedule interval as number of days
-d, --dot Renders workflow files in DOT format
Application Folder Structure for Workflows
The input application directory has to follow the structure defined as follows.
<APPLICATION>/
|- job.properties - job properties that are used to run the job
|- hdfs - folder with application - should be copied to HDFS
| |- workflow.xml - Oozie workflow xml (1.0 schema)
| |- ... - additional folders required to be copied to HDFS
|- configuration.template.properties - template of configuration values used during conversion
|- configuration.properties - generated properties for configuration values
Once the Oozie-to-Airflow (o2a) converter is installed, you can begin converting Oozie workflows to Airflow DAGs.
./bin/o2a -i ../oozie_sample/ -o airflow_sample/ -x 0.4 -skv true
[2025-04-09T15:45:42.663+0530] {workflow_xml_parser.py:242} INFO - Parsed EmailError as Action Node of type email.
[2025-04-09T15:45:42.663+0530] {workflow_xml_parser.py:81} INFO - Parsed fail as Kill Node.
[2025-04-09T15:45:42.664+0530] {workflow_xml_parser.py:81} INFO - Parsed Kill as Kill Node.
[2025-04-09T15:45:42.664+0530] {workflow_xml_parser.py:94} INFO - Parsed End as End Node.
[2025-04-09T15:45:42.664+0530] {oozie_converter.py:189} INFO - Applying pre-convert transformers
[2025-04-09T15:45:42.664+0530] {oozie_converter.py:125} INFO - Converting nodes to tasks and inner relations
[2025-04-09T15:45:42.693+0530] {oozie_converter.py:194} INFO - Applying post-convert transformers
[2025-04-09T15:45:42.693+0530] {oozie_converter.py:173} INFO - Adding error handlers
[2025-04-09T15:45:42.693+0530] {oozie_converter.py:155} INFO - Converting relations between tasks groups.
[2025-04-09T15:45:42.693+0530] {oozie_converter.py:150} INFO - Converting dependencies.
[2025-04-09T15:45:42.694+0530] {renderers.py:104} INFO - Saving to file: dishtioriginaljob2/.py
Fixing /root/o2a/dishtioriginaljob2/.py
Upon successful conversion, the generated Airflow DAG will be located in the specified output directory.
Was this page helpful?