Create and Execute Your First Pipeline

This is your hands-on guide to creating a complete data pipeline from scratch in ADOC. You'll learn the full lifecycle: defining the pipeline structure, setting up jobs, creating execution tracking, and monitoring the results.

Why This Matters

Creating a pipeline in ADOC isn't just about moving data - it's about establishing observability. Every step you define here becomes visible, traceable, and debuggable. When something goes wrong (and it will), you'll know exactly where, when, and why.

Real-World Scenario

The Challenge

Your company needs to sync customer data daily from your Athena data lake to Redshift for analytics. The business team needs:

  • Fresh data every morning by 8 AM
  • Quality checks to catch issues early
  • Clear visibility when something breaks
  • Ability to trace data lineage

The Solution

Build a fully observable pipeline with three jobs:

  1. Extract: Pull customer data from Athena
  2. Transform: Clean and validate the data
  3. Load: Write to Redshift for analytics

Each step will be tracked with spans and events, giving you complete visibility into execution.

What You'll Build

Copy

Prerequisites

  • API Credentials: accessKey and secretKey
  • Data Source UID: Your Athena table identifier (e.g., AwsDataCatalog.production.customers)
  • Data Destination UID: Your Redshift table identifier (e.g., warehouse.public.customers)
  • Understanding: Basic ETL concepts

Tip

The Complete Workflow

We'll execute 8 steps using 6 APIs:

  1. Design your pipeline (planning)
  2. Create the pipeline structure
  3. Create a pipeline run
  4. Define job nodes (3 jobs)
  5. Create spans for tracking (4 spans)
  6. Start execution
  7. Record events as work progresses
  8. Mark completion

Step 1: Design Your Pipeline

Before touching any APIs, map out your pipeline on paper.

Questions to Answer

  1. What data are you moving?

    • Source: Athena table AwsDataCatalog.production.customers
    • Destination: Redshift table warehouse.public.customers
  2. What transformations are needed?

    • Remove duplicate customer IDs
    • Validate email formats
    • Calculate customer lifetime value
  3. What are the dependencies?

    • Extract must complete before Transform
    • Transform must complete before Load
  4. Who owns this?

Pipeline Design

Copy

Checkpoint: You should have job names, UIDs, and data sources documented.

Step 2: Create the Pipeline

Register your pipeline in ADOC.

API Call

Bash
Copy

Parameters: None

Request

JSON
Copy

Field Explanations

FieldPurposeYour Value
nameDisplay nameCustomer ETL Pipeline
uidUnique identifier (use for lookups)customer-etl-daily
enabledCan this pipeline run?true
scheduledRuns automatically?false (manual for now)
schedulerTypeWho manages scheduling?INTERNAL (ADOC manages)
tagsFor filtering/organizingproduction, daily, customer-data
meta.ownerWho to contactdata-team@company.com
meta.slaExpected completion time30 minutes

Success Response

JSON
Copy

Save This: pipeline.id = 15 - You'll need this for the next steps!

Step 3: Create a Pipeline Run

A "run" is a single execution instance of your pipeline. Think of it like pressing "play" - you're about to execute all the jobs.

API Call

Bash
Copy

Path Parameters:

ParameterTypeRequiredDescription
pipelineIdintegerYesThe numeric ID from Step 2 (e.g., 15)

Request

JSON
Copy

What is continuationId? A unique identifier for this specific run. Use format: run-YYYY-MM-DD-NNN where NNN is a sequence number.

Success Response

JSON
Copy

Save This: run.id = 109133 - You'll use this for jobs and spans!

Step 4: Define Job Nodes

Jobs are the actual work units in your pipeline. You'll create three jobs that form the ETL chain.

API Call

Bash
Copy

Path Parameters:

ParameterTypeRequiredDescription
pipelineIdintegerYesThe numeric ID of the pipeline (e.g., 15)

Important: Call this endpoint three times (once per job).

Job 1: Extract Customer Data

This job reads from your Athena table.

JSON
Copy

Why no inputs? This is the first job - it starts from a data source, not from another job.

What's asset_uid? The fully qualified name of your Athena table in ADOC's asset catalog.

Job 2: Transform Customer Data

This job processes the data from the Extract job.

JSON
Copy

Key Point: inputs references the Extract job by its jobUid. This creates the dependency chain.

Job 3: Load to Redshift

This job writes the transformed data to Redshift.

JSON
Copy

The Flow: Extract → Transform → Load

Checkpoint: You should have created 3 jobs. They define WHAT to do, but haven't executed yet.

Step 5: Create Spans to Track Execution

Spans are how ADOC tracks execution. Each span represents a unit of work being performed.

API Call

Bash
Copy

Path Parameters:

ParameterTypeRequiredDescription
runIdintegerYesThe numeric ID of the run (e.g., 109133)

Important: Call this endpoint four times (1 root + 3 job spans).

Span 1: Root Span (Pipeline Level)

This represents the entire pipeline execution.

JSON
Copy

Response:

JSON
Copy

Save This: span.id = 5000 - You'll use this as parentSpanId for job spans!

Span 2: Extract Span

JSON
Copy

Span 3: Transform Span

JSON
Copy

Span 4: Load Span

JSON
Copy

Understanding Span Hierarchy:

Copy

Checkpoint: You've created the execution tracking structure. Now it's time to actually run!

Step 6: Start Execution

Mark the run as started - this signals that work is beginning.

API Call

Bash
Copy

Path Parameters:

ParameterTypeRequiredDescription
runIdintegerYesThe numeric ID of the run (e.g., 109133)

Request

JSON
Copy

Status Change: CREATED → STARTED

Step 7: Record Span Events

As each job executes, record events to track progress. This is what makes your pipeline observable!

API Call

Bash
Copy

Path Parameters:

ParameterTypeRequiredDescription
spanIdintegerYesThe numeric ID of the span (e.g., 5001, 5002, 5003)

Event Flow for Each Span

For each of your 3 job spans (extract, transform, load), record these events:

1. Start Event (When job begins)

JSON
Copy

2. Progress Events (Optional, during execution)

JSON
Copy

3. End Event (When job completes successfully)

JSON
Copy

4. Error Event (If something fails)

JSON
Copy

Complete Event Sequence Example

For Extract Job (span 5001):

Bash
Copy

For Transform Job (span 5002):

Bash
Copy

For Load Job (span 5003):

Bash
Copy

Pro Tip: The more events you log, the better visibility you'll have when debugging issues later!

Step 8: Mark Run Completion

When all jobs finish, update the run status.

API Call

Bash
Copy

Path Parameters:

ParameterTypeRequiredDescription
runIdintegerYesThe numeric ID of the run (e.g., 109133)

Request (Success)

JSON
Copy

Request (Failure)

JSON
Copy

Status Changes:

  • Success: STARTED → COMPLETED
  • Failure: STARTED → FAILED

API Call Summary

You used 6 APIs:

  1. PUT /pipelines - Created pipeline → Got pipeline.id = 15
  2. POST /pipelines/15/runs - Created run → Got run.id = 109133
  3. PUT /pipelines/15/jobs - Created 3 jobs
  4. POST /pipelines/runs/109133/spans - Created 4 spans
  5. PUT /pipelines/runs/109133 - Updated run status (2x: START, COMPLETE)
  6. POST /pipelines/spans/:spanId/events - Recorded events (multiple times)

Troubleshooting

IssueCauseSolution
Pipeline creation failsUID already existsChoose a unique uid or delete old pipeline
Job creation failsInvalid asset_uidVerify asset exists in ADOC catalog
Span creation failsInvalid parentSpanIdEnsure root span created first
Wrong field errorUsing "ID" instead of "uid"Always use lowercase "uid" in requests
Events not showingWrong span IDVerify span.id from creation response
Run stuck in STARTEDNever marked completeAlways send final status update
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard