Workflow Tasks

Workflow Tasks

A task is the fundamental building block of a workflow. Each node on the workflow canvas represents one task; dependency edges define execution order. xDP supports four task types.

Task TypeDescription
JobRun an existing Spark or notebook job, or create a new one inline.
Sub-WorkflowNest an existing workflow as a task (max one level deep).
ShellRun a shell command or script on the Airflow worker.
Branch (If/Else)Conditional routing — evaluate a Python expression and route to different downstream paths.

Coming soon: Switch/Case, Loop, and Sensor task types are visible in the Add Task menu but are not yet available.

Adding a Task

  1. Open the Create Workflow or Edit Workflow page.
  2. Click + Add Task in the toolbar and select a task type.
Add Task menu — task types

Add Task menu — task types

  1. A new task node appears on the canvas with a configuration panel on the right.
  2. Fill in the task-specific settings (see each type below).
  3. Draw dependencies by dragging from the output handle (right side) of one task to the input handle (left side) of another.

Common Task Settings

Every task type shares these fields:

FieldRequiredDescription
Task NameYesUnique within the workflow. 2–100 characters, alphanumeric with spaces, hyphens, and underscores.
DescriptionNoBrief explanation of what this task does.
TagsNoKey-value metadata for filtering and organization.
DependenciesAutoRead-only — upstream tasks derived from canvas edges.

Advanced Settings (expand at the bottom of any task panel):

FieldDescriptionDefault
Start DelayMinutes to wait after dependencies complete before starting.0
RetriesRetry attempts if the task fails.0
Retry DelayMinutes between retry attempts.5
Execution TimeoutMaximum run time in minutes before the task is marked failed.None

Job Task

Runs a Spark or notebook job on the compute cluster.

Job task configuration panel

Job task configuration panel

  • Select Existing Job — Opens a picker showing all jobs on the selected cluster.
  • Create New Job — Opens a job creation modal to configure the Spark application inline.

Supported job types:

Job TypeDescription
Spark JavaCompiled JAR with a main class
Spark PythonPySpark script (.py file)
Notebook JavaJupyter notebook with Java/Scala kernel
Notebook PythonJupyter notebook with Python kernel

Override Configuration: When using an existing job, click Override Configuration to customize driver/executor resources, Spark properties, and data store selections for this specific workflow run.

Sub-Workflow Task

Nests an existing workflow inside the current one. The sub-workflow's tasks run inline as part of the same DAG execution.

Click Select Existing Workflow to open the picker:

Constraint: Only one level of nesting is supported. Workflows that already contain sub-workflows are excluded from the picker.

Shell Task

Runs a shell command or script on the Airflow worker. Use this for lightweight operations — file management, API calls, notifications, or data validation — that don't require Spark.

Shell task configuration panel

Shell task configuration panel

FieldRequiredDescription
ScriptYesShell script to execute. Supports multi-line scripts with #!/bin/bash. Use Upload to load a file or Expand for a full-screen editor.
Working DirectoryNoDirectory to run the script in on the Airflow worker.
Environment VariablesNoKey-value pairs injected as env vars into the script.

Note: Scripts are validated for safety before saving. Patterns such as recursive deletions or modifications to system files are rejected.

Branch Task (If/Else)

Adds conditional logic to your workflow. Evaluates a Python expression and routes execution to different downstream paths.

Branch task configuration panel

Branch task configuration panel

FieldRequiredDescription
Operator TypeYesCurrently only Python (BranchPythonOperator) is supported.
ConditionYesA Python function that returns the task ID of the branch to execute. Receives **kwargs for Airflow context.

In the Branch Routing section, assign each downstream task to the True or False path. At least one downstream task must be assigned to True. A task cannot appear in both paths.

Python
Copy

Constraints:

  • A branch task cannot be a leaf node — it must have at least one downstream task.
  • The condition is evaluated in a restricted environment for safety.

Best Practices

  • Name tasks descriptively — use names like extract_customer_data rather than Task 1. Task names become Airflow task IDs.
  • Use Shell tasks for lightweight operations — reserve Job tasks for Spark workloads.
  • Set execution timeouts — prevents runaway tasks from blocking the entire workflow.
  • Keep branch conditions simple — complex logic is harder to debug; move multi-step logic into a separate Shell or Job task.
  • Limit sub-workflow nesting — only one level is supported. For deeply layered pipelines, consider separate workflows triggered independently.
VariableType to search · ESC to discard
GlossaryType to search · ESC to discard
InsertType to search · ESC to discard
No matches