Monitor an Existing Pipeline

This guide teaches you how to monitor pipelines in production - tracking their health, performance, and execution history. You'll learn to spot problems before they become incidents and understand exactly what's happening inside your data pipelines.

Why This Matters

A pipeline that runs silently is dangerous. Without monitoring, you won't know:

  • If it failed overnight
  • If it's running slower than usual
  • If data quality issues are creeping in
  • Which step is the bottleneck

Good monitoring means catching issues in minutes, not hours or days.

Real-World Scenarios

Scenario 1: Daily Health Check

"It's 8 AM. Did our customer pipeline run successfully last night?"

Solution: Check GET /pipelines/15/latestRun to see status, duration, and any errors. Takes 5 seconds.

Scenario 2: Performance Degradation

"Our pipeline used to finish in 20 minutes, now it takes 45. What changed?"

Solution: Use GET /pipelines/15/runs?limit=30 to see the last 30 runs and spot when slowdown began. Then drill into spans to find the bottleneck.

Scenario 3: Debugging for On-Call

"I got paged at 2 AM. The pipeline failed but I need details fast."

Solution: Get latest run → list spans → find failed span → get events → get error logs. Full investigation in under 2 minutes.

Scenario 4: Capacity Planning

"Should we add more resources? Is our pipeline hitting limits?"

Solution: Analyze historical runs to see execution time trends, event counts, and identify patterns.

Prerequisites

  • Pipeline ID or UID you want to monitor
  • API credentials
  • Understanding of what the pipeline does

Monitoring Dashboard - API Workflow

Build a complete monitoring view using these 6 APIs:

  1. GET /pipelines/:pipelineId/latestRun - Current status
  2. GET /pipelines/:pipelineId/runs - Historical runs
  3. GET /pipelines/runs/:runId/spans - Execution breakdown
  4. GET /pipelines/spans/:spanId/events - Event details
  5. GET /pipelines/spans/events/:eventId/log - Deep logs
  6. GET /pipelines/runs/:runId/span-job-associations - Job mappings

Overview

This workflow covers:

  • Listing all runs for a pipeline
  • Getting the latest run status
  • Viewing span execution details
  • Querying span events and logs
  • Understanding job-span associations

APIs Used: 5 endpoints

Prerequisites

  • Pipeline ID or UID
  • API credentials
  • Understanding of pipeline execution concepts

Step 1: Get Latest Run Status

Check the most recent execution of your pipeline.

API Call

Bash
Copy

Response

JSON
Copy

Key Metrics

FieldDescription
statusCurrent execution status (CREATED, RUNNING, COMPLETED, FAILED)
startedAtWhen execution began
avgExecutionTimeAverage execution time in milliseconds
successEventsCount of successful span events
errorEventsCount of error events
warningEventsCount of warning events

Use Cases

  • Dashboard displays showing current pipeline status
  • Quick health checks
  • Alerting based on execution metrics

Step 2: List All Pipeline Runs

View historical execution data for analysis.

API Call

Bash
Copy

Query Parameters

ParameterTypeDescriptionDefault
limitintegerNumber of runs to return50
offsetintegerPagination offset0

Example with Pagination

Bash
Copy

Response

JSON
Copy

Use Cases

  • Analyzing execution trends over time
  • Identifying performance degradation
  • Generating historical reports
  • Debugging recurring failures

Step 3: List All Spans for a Run

View the execution tree of a specific run.

API Call

Bash
Copy

Response

JSON
Copy

Use Cases

  • Understanding execution flow
  • Identifying bottlenecks
  • Debugging span-level issues
  • Visualizing execution timeline

Step 4: Get Events for a Specific Span

View detailed events that occurred during span execution.

API Call

Bash
Copy

Response

JSON
Copy

Event Types

TypeDescription
STARTSpan execution began
ENDSpan execution completed successfully
FAILEDSpan execution failed
LOGInformational log message
ABORTSpan execution was aborted

Use Cases

  • Debugging span failures
  • Understanding execution steps
  • Tracking data quality issues
  • Performance analysis

Step 5: Get Detailed Event Logs

Retrieve detailed logs for a specific event.

API Call

Bash
Copy

Response

JSON
Copy

Use Cases

  • Investigating specific warnings or errors
  • Root cause analysis
  • Compliance and audit trails

Step 6: Get Job-Span Associations

Understand which jobs are associated with which spans.

API Call

Bash
Copy

Response

JSON
Copy

Use Cases

  • Mapping execution to pipeline structure
  • Debugging job-specific issues
  • Understanding execution flow

Monitoring Dashboard Workflow

Build a complete monitoring view:

Real-time Status

Bash
Copy

Execution Timeline

Bash
Copy

Drill-down Investigation

Bash
Copy

Deep Analysis

Bash
Copy

Performance Monitoring Pattern

Track execution trends:

Bash
Copy

Complete API Call Sequence

  1. GET /torch-pipeline/api/pipelines/:pipelineId/latestRun - Current status
  2. GET /torch-pipeline/api/pipelines/:pipelineId/runs - Historical data
  3. GET /torch-pipeline/api/pipelines/runs/:runId/spans - Execution tree
  4. GET /torch-pipeline/api/pipelines/spans/:spanId/events - Event details
  5. GET /torch-pipeline/api/pipelines/spans/events/:spanEventId/log - Deep logs
  6. GET /torch-pipeline/api/pipelines/runs/:runId/span-job-associations - Job mappings

Troubleshooting

IssueSolution
No runs returnedVerify pipeline has been executed at least once
Missing spansCheck that run ID is correct
No eventsVerify spans have recorded events during execution
Empty logsCheck that event ID is valid
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard