Back Up and Restore Kudu Tables

Kudu tables can be backed up and restored using Spark 3. This process requires a Kudu backup JAR file distributed with Kudu.

Prerequisites

Before performing a backup, ensure the following:

  1. ODP version
Bash
Copy
  1. Backup location – HDFS path where backups will be stored.
  2. Permissions – The Kudu user must have permission to submit Spark jobs.
  3. Tables to back up – You can specify one or multiple tables at the same time.
  4. Kudu backup JAR – Located at:
Bash
Copy
  1. List of Kudu masters – Required to connect to the cluster.

Back Up Tables

Run the following Spark command to back up one or more tables:

Bash
Copy

Parameters:

  • --kuduMasterAddresses – Comma-separated list of Kudu master addresses.
  • --rootPath – HDFS path where backups will be stored.
  • --failOnFirstError – Stops the backup if any table fails.

Restore Tables

Restoring from a backup requires similar inputs as backing up, but with a few key differences.

The --class org.apache.kudu.backup.KuduRestore option specifies that the operation restores tables instead of backing them up.

Bash
Copy

Parameters:

  • --class org.apache.kudu.backup.KuduRestore → Runs the restore job.
  • --kuduMasterAddresses → Comma-separated list of Kudu master addresses.
  • --rootPath → HDFS path where the backups are stored.
  • --failOnFirstError → Ensures the restore stops on the first encountered error.
  • foo bar → Placeholder arguments; replace with your specific restore targets.

For more details, see Apache Documentation.

VariableType to search · ESC to discard
GlossaryType to search · ESC to discard
InsertType to search · ESC to discard
No matches
  Last updated