Troubleshooting

Master Not Starting

Bash
Copy

Common causes:

  • Port conflict — ensure ports 9097, 9098, and 9872 are available
  • Java heap too small — increase CELEBORN_MASTER_MEMORY
  • Ratis storage permission issue — check /var/lib/celeborn/ratis ownership

Worker Not Registering

Bash
Copy

Common causes:

  • Cannot connect to master — verify network connectivity to port 9097
  • Storage directory permission issues — ensure celeborn user has write access
  • Disk full or unhealthy — check disk health and free space

Shuffle Failures

Bash
Copy

Diagnostic REST API Commands

Bash
Copy

Log Locations

ComponentLog Path
Celeborn Master/var/log/celeborn/celeborn-master-*.log
Celeborn Worker/var/log/celeborn/celeborn-worker-*.log
Ambari Server/var/log/ambari-server/ambari-server.log
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
  Last updated