Documentation
ODP 3.3.6.4-1
Release Notes
What is ODP
Installation
Component User guide and Installation Instructions
Upgrade Instructions
Downgrade Instructions
Reference Guide
Security Guide
Troubleshooting Guide
Uninstall ODP
Title
Message
Create new category
What is the title of your new category?
Edit page index title
What is the title of the page index?
Edit category
What is the new title of your category?
Edit link
What is the new title and URL of your link?
Engine Integration
Summarize Page
Copy Markdown
Open in ChatGPT
Open in Claude
Connect to Cursor
Connect to VS Code
Apache Spark
- Deploy Client JAR
Bash
xxxxxxxxxxcp $CELEBORN_HOME/spark/celeborn-client-spark-3-shaded_2.12-0.6.2.jar $SPARK_HOME/jars/Core Spark Configuration
Bash
xxxxxxxxxx# Core settings spark.shuffle.manager=org.apache.spark.shuffle.celeborn.SparkShuffleManagerspark.celeborn.master.endpoints=master1:9097,master2:9097,master3:9097spark.shuffle.service.enabled=falsespark.serializer=org.apache.spark.serializer.KryoSerializer # Performance tuning spark.celeborn.client.push.replicate.enabled=truespark.celeborn.client.spark.shuffle.writer=hashspark.sql.adaptive.localShuffleReader.enabled=false # Dynamic allocation (Spark 3.5+) spark.dynamicAllocation.enabled=truespark.dynamicAllocation.shuffleTracking.enabled=falsespark.shuffle.sort.io.plugin.class=org.apache.spark.shuffle.celeborn.CelebornShuffleDataIO- Quick Test with spark-shell
Bash
/usr/odp/3.3.6.4-1/spark3/bin/spark-shell --conf spark.shuffle.manager=org.apache.spark.shuffle.celeborn.SparkShuffleManager --conf spark.shuffle.service.enabled=false --conf spark.sql.adaptive.localShuffleReader.enabled=false --conf spark.dynamicAllocation.shuffleTracking.enabled=falseApache Flink
- Deploy Client JAR.
Bash
xxxxxxxxxxcp $CELEBORN_HOME/flink/celeborn-client-flink-1.19-shaded_2.12-0.6.2.jar $FLINK_HOME/lib/- Configure Flink.
Bash
xxxxxxxxxxshuffle-service-factory.class=org.apache.celeborn.plugin.flink.RemoteShuffleServiceFactoryexecution.batch-shuffle-mode=ALL_EXCHANGES_BLOCKINGceleborn.master.endpoints=master1:9097,master2:9097,master3:9097 # Performance celeborn.client.shuffle.batchHandleReleasePartition.enabled=trueceleborn.client.push.maxReqsInFlight=128celeborn.data.io.numConnectionsPerPeer=16celeborn.data.io.threads=32Apache Tez
Append following property in yarn-site xml under yarn.application.classpath and under Mapreduce mapreduce.application.classpath.
Bash
/usr/odp/3.3.6.4-1/celeborn/tez/celeborn-client-tez-shaded_2.12-0.6.2.3.3.6.4-1.jar:/usr/odp/3.3.6.4-1/celeborn/mr/celeborn-client-mr-shaded_2.12-0.6.2.3.3.6.4-1.jarBash
# In tez-site.xml tez.am.launch.cluster-default.cmd-opts=org.apache.tez.dag.app.CelebornDagAppMaster tez.celeborn.master.endpoints=master1:9097,master2:9097,master3:9097 tez.cluster.additional.classpath.prefix append /usr/odp/3.3.6.3-101/celeborn/tez/celeborn-client-tez-shaded_2.12-0.6.2.3.3.6.3-101.jarMapReduce
Need to add above tez configuration for classpath addition and then following additional properties in Mapreduce configuration.
Bash
xxxxxxxxxxyarn.app.mapreduce.am.command-opts=org.apache.celeborn.mapreduce.v2.app.MRAppMasterWithCeleborn mapreduce.job.reduce.slowstart.completedmaps=1 mapreduce.celeborn.master.endpoints=master1:9097,master2:9097,master3:9097mapreduce.job.map.output.collector.class=org.apache.hadoop.mapred.CelebornMapOutputCollectormapreduce.job.reduce.shuffle.consumer.plugin.class=org.apache.hadoop.mapreduce.task.reduce.CelebornShuffleConsumeryarn.app.mapreduce.am.job.recovery.enable=falseHigh Availability
Master HA Overview
Celeborn Master achieves HA using Apache Ratis (Raft consensus protocol). A minimum of 3 Master nodes is required. Odd numbers (3, 5, or 7) are required so the cluster can elect a leader by majority vote.
Bash
xxxxxxxxxxHA Requirement Minimum: 3 Master nodes. Always use odd numbers (3, 5, 7). Ratis storage: /var/lib/celeborn/ratis — ensure this path has adequate I/O performance. Worker fault tolerance: Enable replication (spark.celeborn.client.push.replicate.enabled=true).Master HA Configuration
Bash
xxxxxxxxxx# Enable HA modeceleborn.master.ha.enabled=true # Node 1celeborn.master.ha.node.1.host=master1.example.comceleborn.master.ha.node.1.port=9097celeborn.master.ha.node.1.ratis.port=9872 # Node 2celeborn.master.ha.node.2.host=master2.example.comceleborn.master.ha.node.2.port=9097celeborn.master.ha.node.2.ratis.port=9872 # Node 3celeborn.master.ha.node.3.host=master3.example.comceleborn.master.ha.node.3.port=9097celeborn.master.ha.node.3.ratis.port=9872 # Ratis metadata storageceleborn.master.ha.ratis.raft.server.storage.dir=/var/lib/celeborn/ratisAmbari Mpack HA Settings
| Property (in celeborn-ha) | Value |
|---|---|
| celeborn.master.ha.enabled | true |
| celeborn.master.ha.node.mapping | master1:1,master2:2,master3:3 |
| celeborn.master.ha.rpc.port | 9097 |
| celeborn.master.ha.ratis.port | 9872 |
Worker Fault Tolerance
Enable data replication on the Spark client to ensure shuffle data survives a worker node failure:
Bash
xxxxxxxxxxspark.celeborn.client.push.replicate.enabled=trueType to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
Last updated on Apr 23, 2026
Was this page helpful?
Next to read:
Security Configurationnull
Discard Changes
Do you want to discard your current changes and overwrite with the template?
Archive Synced Block
Message
Create new Template
What is this template's title?
Delete Template
Message