Documentation
ODP 3.3.6.4-1
Release Notes
What is ODP
Installation
Component User guide and Installation Instructions
Upgrade Instructions
Downgrade Instructions
Reference Guide
Security Guide
Troubleshooting Guide
Uninstall ODP
Title
Message
Create new category
What is the title of your new category?
Edit page index title
What is the title of the page index?
Edit category
What is the new title of your category?
Edit link
What is the new title and URL of your link?
Monitoring and Metrics
Summarize Page
Copy Markdown
Open in ChatGPT
Open in Claude
Connect to Cursor
Connect to VS Code
Prometheus Endpoints
Bash
xxxxxxxxxx# Master metricshttp://<master-host>:9098/metrics/prometheus # Worker metricshttp://<worker-host>:9096/metrics/prometheusKey Metrics
Master Metrics
| Metric | Description |
|---|---|
| RegisteredShuffleCount | Total registered shuffles across all applications |
| RunningApplicationCount | Number of currently active applications |
| WorkerCount | Total registered workers |
| AvailableWorkerCount | Workers in a healthy state |
| ActiveShuffleSize | Total bytes of active shuffle data |
| IsActiveMaster | 1 if this node is the Raft leader, 0 otherwise |
Worker Metrics
| Metric | Description |
|---|---|
| ActiveShuffleSize | Bytes of shuffle data currently stored on this worker |
| ActiveShuffleFileCount | Number of shuffle files on this worker |
| PausePushDataStatus | Backpressure status — non-zero means writer is being throttled |
| DiskBuffer | In-memory buffer used for pending disk writes |
| NettyMemory | Off-heap memory consumed by Netty networking layer |
| DeviceCelebornFreeBytes | Remaining free disk space per device |
Prometheus Scrape Configuration
Bash
xxxxxxxxxx# prometheus.yml scrape_configs: - job_name: 'celeborn' metrics_path: /metrics/prometheus scrape_interval: 15s static_configs: - targets: - 'master1:9098' - 'master2:9098' - 'master3:9098' - 'worker1:9096' - 'worker2:9096' - 'worker3:9096'Grafana Dashboards
Import pre-built Grafana dashboards from the Celeborn installation:
- $CELEBORN_HOME/assets/grafana/celeborn-dashboard.json — Internal Celeborn metrics
- $CELEBORN_HOME/assets/grafana/celeborn-jvm-dashboard.json — JVM / GC metrics
REST API Reference
| Endpoint | Method | Description |
|---|---|---|
| /api/v1/masters | GET | List all master nodes and their roles (leader/follower) |
| /api/v1/workers | GET | List all registered workers and their status |
| /api/v1/shuffles | GET | List active shuffles with size and partition info |
| /api/v1/applications | GET | List running applications using Celeborn |
| /ping | GET | Health check — returns HTTP 200 when service is healthy |
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
Last updated on Apr 23, 2026
Was this page helpful?
Next to read:
Performance Tuningnull
Discard Changes
Do you want to discard your current changes and overwrite with the template?
Archive Synced Block
Message
Create new Template
What is this template's title?
Delete Template
Message