Back Up and Restore Kudu Tables
Kudu tables can be backed up and restored using Spark 3. This process requires a Kudu backup JAR file distributed with Kudu.
Prerequisites
Before performing a backup, ensure the following:
- ODP version
odp-select --version- Backup location – HDFS path where backups will be stored.
- Permissions – The Kudu user must have permission to submit Spark jobs.
- Tables to back up – You can specify one or multiple tables at the same time.
- Kudu backup JAR – Located at:
kudu-backup3_2.12-1.17.0.$(odp-select --version)-all.jar- List of Kudu masters – Required to connect to the cluster.
Back Up Tables
Run the following Spark command to back up one or more tables:
spark-submit --class org.apache.kudu.backup.KuduBackup /usr/odp/$(odp-select --version)/kudu/jars/kudu-backup3_2.12-1.17.0.3.3.6.2-1-all.jar \ --kuduMasterAddresses $YOUR_KUDU_MASTERS \ --rootPath hdfs:////kudu_backups \ --failOnFirstError true \ foo barParameters:
--kuduMasterAddresses– Comma-separated list of Kudu master addresses.--rootPath– HDFS path where backups will be stored.--failOnFirstError– Stops the backup if any table fails.
Restore Tables
Restoring from a backup requires similar inputs as backing up, but with a few key differences.
The --class org.apache.kudu.backup.KuduRestore option specifies that the operation restores tables instead of backing them up.
spark-submit --class org.apache.kudu.backup.KuduRestore /usr/odp/$(odp-select --version)/kudu/jars/kudu-backup3_2.12-1.17.0.3.3.6.2-1-all.jar \ --kuduMasterAddresses $YOUR_KUDU_MASTERS \ --rootPath hdfs:///kudu_backups \ --failOnFirstError true \ foo barParameters:
--class org.apache.kudu.backup.KuduRestore→ Runs the restore job.--kuduMasterAddresses→ Comma-separated list of Kudu master addresses.--rootPath→ HDFS path where the backups are stored.--failOnFirstError→ Ensures the restore stops on the first encountered error.foo bar→ Placeholder arguments; replace with your specific restore targets.
For more details, see Apache Documentation.
Was this page helpful?