Update an Existing Pipeline

Pipelines evolve - requirements change, data sources move, optimizations are needed. This guide shows you how to safely update pipeline configurations, modify job structures, and verify your changes.

Why This Matters

Updating a production pipeline is risky. Done wrong, you can:

Break downstream systems
Lose data
Create silent failures
Disrupt on-call schedules

This workflow minimizes risk by showing you how to update pipelines safely and verify changes before they impact production.

Real-World Scenarios

Scenario 1: New Team Ownership

"Data engineering is splitting into two teams. We need to update pipeline ownership."

Change: Update metadata (owner, team, Slack channel)

Risk: Low

Solution: Simple PUT /pipelines with new metadata

Scenario 2: Add Data Quality Check

"We're getting bad data. Need to add a validation step between Extract and Load."

Change: Insert new job into pipeline

Risk: Medium (changes data flow)

Solution: Create new job, update dependencies, verify graph

Scenario 3: Migrate Data Source

"We're moving from Athena to Snowflake. Update all pipelines."

Change: Update asset_uid in job configurations

Risk: High (wrong UID = data loss)

Solution: Test in staging, verify connections, gradual rollout

Scenario 4: Enable Scheduling

"Manual pipeline is stable. Time to automate with daily schedule."

Change: Update scheduled: true and add cron expression

Risk: Low

Solution: Update pipeline config, monitor first few automated runs

Prerequisites

Pipeline ID or UID to update
New configuration values
Understanding of current pipeline structure
(Recommended) Testing environment

Update Strategies

Use these 3 APIs to modify pipelines:

GET /pipelines/:identity - Get current config
PUT /pipelines - Update pipeline
GET /pipelines/:pipelineId/graph - Verify changes

Overview

This workflow covers:

Retrieving current pipeline configuration
Updating pipeline metadata and settings
Modifying the pipeline graph structure
Verifying changes were applied

APIs Used: 3 endpoints

Prerequisites

Pipeline ID or UID to update
API credentials
Understanding of desired changes

Step 1: Get Current Pipeline Configuration

Before making changes, retrieve the current configuration.

API Call

Bash
    
 
GET /torch-pipeline/api/pipelines/customer-etl-daily
Copy

Response

JSON
    
 
{  "pipeline": {    "id": 15,    "uid": "customer-etl-daily",    "name": "Customer ETL Pipeline",    "description": "Daily customer data sync from Athena to Redshift",    "enabled": true,    "scheduled": false,    "schedulerType": "INTERNAL",    "tags": ["production", "daily"],    "createdAt": "2024-08-20T05:15:46.569Z",    "updatedAt": "2024-12-05T10:00:00Z",    "meta": {      "owner": "data-team@company.com",      "team": "data-engineering",      "codeLocation": "https://github.com/company/pipelines/customer-etl"    }  }}
Copy

Save this configuration - you'll modify and send it back.

Step 2: Modify Pipeline Configuration

Update the pipeline using the same endpoint as creation.

API Call

Bash
    
 
PUT /torch-pipeline/api/pipelines
Copy

Update Scenarios

Scenario 1: Enable Scheduling

JSON
    
 
{  "pipeline": {    "uid": "customer-etl-daily",    "name": "Customer ETL Pipeline",    "description": "Daily customer data sync from Athena to Redshift",    "enabled": true,    "scheduled": true,    "schedulerType": "INTERNAL",    "schedule": "0 2 * * *",    "tags": ["production", "daily", "scheduled"],    "meta": {      "owner": "data-team@company.com",      "team": "data-engineering",      "codeLocation": "https://github.com/company/pipelines/customer-etl"    }  }}
Copy

Changes:

scheduled: false → true
schedule: Added cron expression (2 AM daily)
tags: Added "scheduled" tag

Scenario 2: Change Ownership

JSON
    
 
{  "pipeline": {    "uid": "customer-etl-daily",    "name": "Customer ETL Pipeline",    "description": "Daily customer data sync from Athena to Redshift",    "enabled": true,    "scheduled": true,    "schedulerType": "INTERNAL",    "schedule": "0 2 * * *",    "tags": ["production", "daily", "scheduled"],    "meta": {      "owner": "analytics-team@company.com",      "team": "analytics",      "codeLocation": "https://github.com/company/pipelines/customer-etl"    }  }}
Copy

Changes:

meta.owner: data-team → analytics-team
meta.team: data-engineering → analytics

Scenario 3: Disable Pipeline

JSON
    
 
{  "pipeline": {    "uid": "customer-etl-daily",    "name": "Customer ETL Pipeline",    "description": "Daily customer data sync from Athena to Redshift - TEMPORARILY DISABLED",    "enabled": false,    "scheduled": true,    "schedulerType": "INTERNAL",    "schedule": "0 2 * * *",    "tags": ["production", "daily", "scheduled", "disabled"],    "meta": {      "owner": "analytics-team@company.com",      "team": "analytics",      "codeLocation": "https://github.com/company/pipelines/customer-etl"    }  }}
Copy

Changes:

enabled: true → false
description: Added "TEMPORARILY DISABLED" note
tags: Added "disabled" tag

Step 3: Update Pipeline Jobs

Modify job structure by creating/updating job nodes.

API Call

Bash
    
 
PUT /torch-pipeline/api/pipelines/15/jobs
Copy

Scenario: Add a New Job

Add a data quality validation job between extract and transform.

JSON
    
 
{  "name": "Validate Customer Data",  "uid": "job-validate-customers",  "pipeLineRunId": 109135,  "inputs": [    {      "jobUid": "job-extract-customers"    }  ],  "outputs": [],  "meta": {    "owner": "analytics-team@company.com",    "team": "analytics",    "validationType": "schema_and_quality"  }}
Copy

Scenario: Update Existing Job

Update the transform job to take input from validation instead of extract.

JSON
    
 
{  "name": "Transform Customer Data",  "uid": "job-transform-customers",  "pipeLineRunId": 109135,  "inputs": [    {      "jobUid": "job-validate-customers"    }  ],  "outputs": [],  "meta": {    "owner": "analytics-team@company.com",    "team": "analytics"  }}
Copy

Result: Pipeline flow is now: Extract → Validate → Transform → Load

Step 4: Verify Pipeline Graph

Check that your changes are reflected in the pipeline graph.

API Call

Bash
    
 
GET /torch-pipeline/api/pipelines/15/graph
Copy

Response

JSON
    
 
{  "graph": {    "nodes": [      {        "id": 101,        "uid": "job-extract-customers",        "name": "Extract Customer Data",        "type": "JOB"      },      {        "id": 104,        "uid": "job-validate-customers",        "name": "Validate Customer Data",        "type": "JOB"      },      {        "id": 102,        "uid": "job-transform-customers",        "name": "Transform Customer Data",        "type": "JOB"      },      {        "id": 103,        "uid": "job-load-redshift",        "name": "Load to Redshift",        "type": "JOB"      }    ],    "edges": [      {        "source": "job-extract-customers",        "target": "job-validate-customers",        "type": "FLOW"      },      {        "source": "job-validate-customers",        "target": "job-transform-customers",        "type": "FLOW"      },      {        "source": "job-transform-customers",        "target": "job-load-redshift",        "type": "FLOW"      }    ]  }}
Copy

Verification:

New validation job (104) is present
Flow goes: extract → validate → transform → load
All connections are correct

Step 5: Verify Configuration Changes

Retrieve the pipeline again to confirm your updates.

API Call

Bash
    
 
GET /torch-pipeline/api/pipelines/customer-etl-daily
Copy

Response

JSON
    
 
{  "pipeline": {    "id": 15,    "uid": "customer-etl-daily",    "name": "Customer ETL Pipeline",    "description": "Daily customer data sync from Athena to Redshift - TEMPORARILY DISABLED",    "enabled": false,    "scheduled": true,    "schedulerType": "INTERNAL",    "schedule": "0 2 * * *",    "tags": ["production", "daily", "scheduled", "disabled"],    "createdAt": "2024-08-20T05:15:46.569Z",    "updatedAt": "2024-12-05T15:30:00Z",    "meta": {      "owner": "analytics-team@company.com",      "team": "analytics",      "codeLocation": "https://github.com/company/pipelines/customer-etl"    }  }}
Copy

Verification:

enabled: false (as requested)
scheduled: true with cron schedule
meta.owner and meta.team: Updated
updatedAt: Timestamp reflects recent change
tags: Includes all new tags

Common Update Patterns

Pattern 1: Gradual Rollout

Disable production pipeline

JSON
    
 
{"enabled": false}
Copy

Test changes in dev/staging
Re-enable with new configuration

JSON
    
 
{"enabled": true}
Copy

Pattern 2: Add Monitoring

Update pipeline with notification channels

JSON
    
 
{  "pipeline": {    "notificationChannels": "slack-data-team",    ...  }}
Copy

Set baseline metrics

JSON
    
 
{  "pipeline": {    "pipelineBaselineMetric": {      "includeSuccessfulRunsOnly": true,      "metrics": 10,      "unit": "RUNS"    },    ...  }}
Copy

Pattern 3: Modify Data Flow

Get current graph structure

Bash
    
 
GET /pipelines/15/graph
Copy

Add/modify jobs

Bash
    
 
PUT /pipelines/15/jobs
Copy

Verify new graph

Bash
    
 
GET /pipelines/15/graph
Copy

Update Workflow Summary

Simple Configuration Update (2 API calls)

Bash
    
​x
 
1. GET /pipelines/:identity   → Get current config​2. PUT /pipelines   → Send modified config
Copy

Complex Structure Update (4 API calls)

Bash
    
 
1. GET /pipelines/:identity   → Get current config​2. GET /pipelines/:pipelineId/graph   → View current structure​3. PUT /pipelines/:pipelineId/jobs   → Add/modify jobs​4. GET /pipelines/:pipelineId/graph   → Verify changes
Copy

Complete API Call Sequence

GET /torch-pipeline/api/pipelines/:identity - Get current configuration
PUT /torch-pipeline/api/pipelines - Update pipeline
PUT /torch-pipeline/api/pipelines/:pipelineId/jobs - Modify jobs (optional)
GET /torch-pipeline/api/pipelines/:pipelineId/graph - Verify structure (optional)

Important Notes

Update vs Create

The PUT /pipelines endpoint does both create and update:

If uid exists → Update
If uid doesn't exist → Create

Immutable Fields

These fields cannot be changed after creation:

id (system-assigned)
uid (unique identifier)
createdAt (creation timestamp)

Versioning

Pipeline updates don't create versions automatically. If you need versioning:

Use different uid values (e.g., customer-etl-v2)
Store version info in meta object
Track changes in your version control system

Troubleshooting

Issue	Solution
Update not applied	Verify `uid` matches exactly
Graph not updating	Job changes require new run to take effect
Schedule not working	Check `scheduled: true` and valid cron expression
Changes lost	Ensure you're sending complete pipeline object

Last updated on

Was this page helpful?