Architecture Overview

Component Diagram

Celeborn consists of three primary components: Masters (cluster coordinator), Workers (shuffle data storage), and Clients (embedded in compute engines).

Component Details

Master

  • Role: Cluster coordinator and metadata manager
  • Manages worker registration and health monitoring
  • Allocates shuffle slots to applications
  • Coordinates shuffle lifecycle and resource cleanup
  • Achieves HA via Raft consensus — 3, 5, or 7 nodes recommended
PortProtocolPurpose
9097TCPMaster RPC — Client and Worker communication
9098TCPMaster HTTP — Web UI and REST API
9872TCPRatis — Master-to-Master Raft consensus

Worker

  • Role: Shuffle data storage and serving
  • Receives and buffers shuffle data from mapper executors
  • Replicates data to peer workers for fault tolerance
  • Serves shuffle data chunks to reducer executors
  • Supports local disk, HDFS, S3, and tiered storage backends
PortProtocolPurpose
9094TCPWorker RPC — Client to Worker communication
9096TCPWorker HTTP — Worker Web UI and metrics

Client

The client is embedded in compute engines and consists of two sub-components.

ComponentLocationResponsibility
LifecycleManagerDriver / JobMasteControl plane — manages shuffle metadata and slot allocation
ShuffleClientExecutor / TaskManagerData plane — pushes and fetches shu

Shuffle Lifecycle

Every shuffle operation follows a structured seven-step lifecycle managed between the Client, Master, and Workers:

StepDirectionDescription
  1. RegisterShuffle
Client → MasterRequest slot allocation for the shuffle
  1. ReserveSlots
Client → WorkersReserve resources on selected workers
  1. PushData
Executors → WorkersMapper executors stream shuffle data to workers
  1. CommitFiles
Client → WorkersFlush buffers, finalize and commit partitions
  1. OpenStream
Client → WorkersPrepare workers to serve data for reducers
  1. ChunkFetchRequest
Executors → WorkersReducer executors fetch data chunks by partition
  1. UnregisterShuffle
Client → MasterRelease all resources after job completion
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
  Last updated