Errors You May Encounter

The following errors you might encounter during the migration.

HDFS Filepath exception

Bash
Copy

Add variables that resolve to HDFS locations, such as ${tmpLocation}, into the resolve_name_node method located in o2a/utils/el_utils.py.

Bash
Copy

HDFS File Paths in Hiveconfs

Using HDFS File Paths in HiveOperator hiveconfs

When using the HiveOperator in Apache Airflow, it's important to correctly format the HDFS file paths passed as Hive configuration variables (hiveconfs). The incorrect path references can lead to runtime errors during Hive query execution.

Problem:

Users often pass HDFS paths dynamically via variables like JOB_PROPS['user.name'] or JOB_PROPS['examplesRoot'], which may not resolve correctly at runtime, especially if the paths assume a different user context.

For example:

Bash
Copy

If JOB_PROPS['user.name'] resolves to something other than airflow, the query might fail with errors as below.

Bash
Copy

Recommended Fix:

Explicitly construct your HDFS paths to align with the runtime context of the Airflow user, as Airflow's Hive tasks often execute under that user's permissions.

Use paths as below:

Bash
Copy

Ensure that the input and output directories exist in HDFS under the correct user path (/user/airflow/...).

Hive DAG: Cannot modify ***.ctx.try_number at runtime

The below image shows the Hive DAG logs.

By default, the HiveOperator includes all context variables in the hiveconf. To enable Airflow to pass these variables, you can update the configuration shown below in your hive-site.xml.

Bash
Copy

Skeleton Transformation

The O2A conversion supports several action and control nodes. The control nodes include fork, join, start, end, and kill. Among the action nodes, fs, map-reduce, and pig are supported.

Most of these are already handled, but when the program encounters a node it does not know how to parse, it performs a kind of "skeleton transformation" — converting all unknown nodes into dummy nodes. This allows users to manually parse those nodes later if they wish, as the control flow remains intact.

Decision Node

The decision node is not fully functional as there is not currently support for all EL functions. So in order for it to run in Airflow you may need to edit the Python output file and change the decision node expression.

The reference is here in the: Implement decision node.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
  Last updated