Title
Create new category
Edit page index title
Edit category
Edit link
Pipeline Observability
Pipelines in ADOC give you a unified view of how data flows across your ecosystem. They connect jobs, assets, runs, spans, and events into an end-to-end model of your workflows, so you can see not only where data is moving, but also where reliability risks are building up.
Why Pipeline Observability Matters
Without pipeline monitoring, you may notice issues only when:
- Data does not land where it is supposed to
- Dashboards show stale or missing data
- Business teams raise reliability concerns
With Pipeline Observability, you can:
- Monitor workflows end to end, from ingestion to transformation to storage
- Detect failures early by catching job errors and bottlenecks before they cascade downstream
- Understand reliability trends by tracking execution success, delays, and policy violations over time
- Pinpoint root causes by identifying which jobs, assets, or dependencies are driving failures
- Communicate health clearly by using pipeline views to share delivery status with engineering and business stakeholders
Example: A nightly ETL pipeline that feeds a customer dashboard fails silently. Without pipeline observability, the issue is discovered only when business users report stale data the next morning. With ADOC, a monitoring policy detects the failure immediately and sends an alert to the appropriate notification channel, giving your team time to fix the issue before it affects downstream consumers.
How Pipelines Get Into ADOC
Pipelines can enter ADOC in three ways, depending on the orchestration tool you use.
1. OpenLineage Pipelines
ADOC supports OpenLineage as a standard for pipeline event ingestion. OpenLineage is an open standard for collecting lineage and run metadata from data pipelines. ADOC processes OpenLineage-compatible events to build pipeline views, lineage, and run details.
OpenLineage is the preferred integration approach because it provides a seamless, low-effort, near real-time way to bring pipeline metadata into ADOC.
2. Datasource-Based Pipelines
Some orchestration tools require a datasource to be configured in Control Center > Integrations. Once connected, ADOC automatically fetches pipeline metadata and run information through the tool’s API.
Supported tools include:
3. SDK-Based Pipelines
Some orchestration tools, such as Airflow and Spark, can also be integrated by instrumenting pipeline code with the SDK documentation. The SDK reports pipeline, run, span, and event data directly to ADOC. Pipelines can also be created manually through the ADOC UI before any runs are reported.
When to Use Which
Use the following guidance when deciding how to bring pipelines into ADOC:
- Use OpenLineage for Airflow, Trino, dbt Core, and AWS Glue Jobs
- Use datasource-based integration when your orchestration tool is supported through a datasource configuration
- Use SDK-based integration when OpenLineage or datasource-based integration does not work for your specific setup
Key Concepts
Pipeline
A pipeline is a logical representation of a data workflow. A pipeline groups together jobs, runs, and associated metadata. Each pipeline has a unique identifier (UID), a name, and optional attributes such as owner, team, tags, and description.
Pipeline Run
A pipeline run is a single execution of a pipeline. Each run captures status (Success, Failed, Running, or Cancelled), start time, end time, duration, and the spans and events that occurred during execution. Think of a pipeline as the blueprint and a run as one execution of that blueprint.
Span
A span represents a unit of work within a pipeline run. Every run has a root span that covers the entire execution. Child spans represent individual tasks or steps within the run.
- In Airflow, the root span corresponds to the DAG run and each child span corresponds to a task instance.
- In Spark, the root span corresponds to the application and each child span corresponds to a Spark job.
Spans capture timing, status, and can carry custom metadata. They are what you see on the timeline chart when investigating a run.
Job
A job is a logical representation of a task within a pipeline. In Airflow, a job corresponds to a task operator. Jobs are visible as nodes in the lineage graph and carry input and output asset information. This is how ADOC identifies which data assets your pipeline reads from and writes to.
Event
An event is a discrete record of something that happened during a run. Common event types include pipeline start, pipeline complete, task start, task complete, task failed, and input or output asset events. Events can also carry custom business or process metadata sent through the SDK. Events form the chronological log you review when investigating what happened during a specific run.
The Pipelines Listing Page
When you navigate to Data Observability Cloud -> Discover -> Pipelines in the left sidebar, you land on the pipelines Listing page. This is your central view of all registered pipelines.
Summary Cards
At the top of the page, summary cards provide an at-a-glance health snapshot:
- Total Pipelines: Count of all registered pipelines in the current namespace
- Total Runs (Last 7 Days): Number of pipeline executions in the last week
- Pipeline Run Status: Breakdown by Success, Failure, Running, and Cancelled
- Alert Severity: Breakdown of alerts by Critical, High, Medium, and Low
- Alert Type: Breakdown by Time, Status, Event Metadata, and Reliability
- All Policies: Status of policies (Successful, Aborted, Errored, Running, Warning, Waiting, Skipped)
Pipeline Table
Below the summary cards, a table lists every pipeline registered in the current namespace. Each row represents one pipeline and shows the following details:
- Pipeline Name: The name of the pipeline, along with a source icon if the pipeline was created through a specific integration. Click the name to open the pipeline run details page.
- Status: The outcome of the most recent run, such as Success, Failure, Running, or Cancelled
- Open Alerts: The number of unresolved alerts currently active for that pipeline
- Execution Time: How long the most recent run took to complete
- Reliability: The overall reliability score based on policy evaluations
- Total Runs: The total number of executions recorded for the pipeline since it was registered
- Last Run: The date and time of the most recent execution
- Recent Runs: A visual indicator showing the outcomes of the last several runs
- Composition: The breakdown of jobs and assets within the pipeline
- User: The owner or creator of the pipeline
- Team: The team the pipeline is assigned to
You can use the checkbox on each row to select one or more pipelines for bulk actions. The
Filtering and Searching
When you have many pipelines, filters and search help you quickly find what you need. The filter bar runs across the top of the table and provides filter categories. Each filter is a dropdown where you can select one or more values, then click Apply Filter to update the table.
- Pipeline Source: Filter by the orchestration tool that created the pipeline. Options include Airflow - OpenLineage, Airflow - SDK, Apache Spark, Autosys, Azure Data Factory, dbt, Fivetran, SnapLogic, Trino, and Others. This filter also includes a search box so you can quickly find a specific source.
- Data Source: Filter by the configured datasource in the Register section
- Data Factory: Filter by the data factory associated with the pipeline
- Namespace: Filter by the ADOC namespace the pipeline belongs to. This filter is shown only when Pipeline Source is set to Airflow - OpenLineage. The namespace value corresponds to the namespace of the related Airflow cluster, as configured in the transport YAML used for the ADOC connection.
- Tags: Filter by custom tags assigned to pipelines during creation or editing
- Pipeline Run Status: Filter by the outcome of the most recent run. Options are Failure, Success, Running, and Cancelled
- Alert Severity: Filter by alert priority level. Options are Critical, High, Medium, and Low
- Alert Type: Filter by the kind of alert triggered. Options are Time, Status, and Event Metadata
- Policy Type: Filter by the type of data reliability policy attached to pipelines. Options are Data Quality, Reconciliation, Data Drift, and Schema Drift
- Policy Status: Filter by the execution result of attached policies. Options include Successful, Aborted, Errored, Running, Warning, Waiting, and Skipped
To search for a specific pipeline by name, use the search bar on the right side of the filter bar. Enter the pipeline name, or part of it, and click search. The table updates to show only pipelines whose names match your query.
Example: Your team manages multiple Airflow pipelines and needs to identify which ones failed overnight. Set Pipeline Source to Airflow - OpenLineage or Airflow - SDK, set Pipeline Run Status to Failure, and click Apply Filter. The table narrows to the pipelines that require attention.
The table supports pagination at the bottom of the page. You can choose to display 10, 20, 50, or 100 rows per page and move between pages by using the Previous and Next buttons.
Two tabs are available at the top of the Pipelines page:
- Listing: The default view showing all pipelines
- Monitoring Policy: Used for creating and managing Bulk Monitoring Policies that apply across multiple pipelines
What's Next
After exploring the Pipelines listing page:
- Create or register a new pipeline, see Working with Pipelines
- Configure monitoring policies and automated data reliability checks, see Manage Pipelines
- Set up alert rules across multiple pipelines, see Bulk Monitoring Policy
- Investigate a specific pipeline run, see Pipeline Run Details
For additional help, contact www.acceldata.force.com OR call our service desk +1 844 9433282
Copyright © 2025