Back Up and Restore Kudu Tables

Kudu tables can be backed up and restored using Spark 3. This process requires a Kudu backup JAR file distributed with Kudu.

Prerequisites

Before performing a backup, ensure the following:

  1. ODP version
Bash
Copy
  1. Backup location – HDFS path where backups will be stored.
  2. Permissions – The Kudu user must have permission to submit Spark jobs.
  3. Tables to back up – You can specify one or multiple tables at the same time.
  4. Kudu backup JAR – Located at:
Bash
Copy
  1. List of Kudu masters – Required to connect to the cluster.

Back Up Tables

Run the following Spark command to back up one or more tables:

Bash
Copy

Parameters:

  • --kuduMasterAddresses – Comma-separated list of Kudu master addresses.
  • --rootPath – HDFS path where backups will be stored.
  • --failOnFirstError – Stops the backup if any table fails.

Restore Tables

Restoring from a backup requires similar inputs as backing up, but with a few key differences.

The --class org.apache.kudu.backup.KuduRestore option specifies that the operation restores tables instead of backing them up.

Bash
Copy

Parameters:

  • --class org.apache.kudu.backup.KuduRestore → Runs the restore job.
  • --kuduMasterAddresses → Comma-separated list of Kudu master addresses.
  • --rootPath → HDFS path where the backups are stored.
  • --failOnFirstError → Ensures the restore stops on the first encountered error.
  • foo bar → Placeholder arguments; replace with your specific restore targets.

For more details, see Apache Documentation.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
  Last updated