Spark Job Manage

Managing Spark Jobs

Use this guide to create, edit, clone, and delete Spark jobs in xDP.

Create a Spark Job

From the side navigation, go to Spark Jobs and click + New Job.
Basic Information — Enter a unique Job Name and optional Description. Click Continue.
Job Type — Select the application type: Spark (Python), Spark (Java), or Notebook (Python). Click Continue.
Configuration — Fill in the runtime parameters for your job type:

Field	Description
Image	Full path to the container image (must include the Spark app and all dependencies).
Python Script	(Spark Python) Path to `.py` file inside the container using `local:///` scheme.
Main Application File	(Spark Java) Path to the JAR file inside the container using `local:///` scheme.
Main Class	(Spark Java) Fully qualified class name of the Spark application entry point.
Notebook Path	(Notebook Python) Path to `.ipynb` file inside the container using `local:/` scheme.
Notebook Kernel Name	(Notebook Python) Jupyter kernel to use (e.g., `pyspark`).
Data Store Dependencies	Registered Data Stores the job needs to access (S3, HDFS, Hive Metastore, etc.).
Arguments	Command-line arguments to pass to the application.
Python / Spark Version	Versions matching your application.
Image Pull Policy	Kubernetes pull policy: `IfNotPresent`, `Always`, or `Never`.
Batch Scheduler	`Kubernetes Native` or `YuniKorn`.
Image Pull Secrets	Kubernetes secrets for private registries.

Click Show Advanced Settings to configure driver/executor resources, dynamic allocation, environment variables, and plugins (History Server, Gluten).

Scheduling — Choose Run Immediately or Schedule with a cron expression. Click Continue.
Review & Create — Verify all settings and click Create Job.

Edit a Spark Job

Info

Note: Job Name and Job Type cannot be changed when editing. To change either, clone the job instead.

Navigate to Spark Jobs, click the job name to open its details, then click Edit.
The wizard opens pre-filled. Update any editable field — description, image, script/JAR path, Data Store Dependencies, schedule, or advanced settings.
Step through to Review & Update and click Update Job.

Clone a Spark Job

Cloning creates a full copy of an existing job with all fields pre-filled and editable — including Job Name and Job Type. Use it to:

Create a staging vs. production variant of a job.
Preserve a job's configuration before making significant changes.
Quickly bootstrap a new job that shares most settings with an existing one.

In the Spark Jobs list, click the ⋯ (Actions) menu for the job and select Clone.
The creation wizard opens with all fields pre-filled. The Job Name is set to <original-name> (Clone) — update it to a unique name.
Modify any settings, then click Create Job on the Review page.

Capability	Clone	Edit
Change Job Name	Yes	No
Change Job Type	Yes	No
Creates a new job	Yes	No
Affects original job	No	Yes

Delete a Spark Job

Info

Warning: Deletion is permanent. Any workflows referencing the deleted job will lose that dependency. Clone the job first if you may need its configuration again.

In the Spark Jobs list, click the ⋯ (Actions) menu for the job and select Delete.
Confirm in the dialog. The job definition, schedule, and Data Store dependency links are permanently removed. Run history retention depends on your platform's data retention policy.

Configuration Reference

Parameter	Description	Default	Required
Job Name	Unique name for the job.	—	Yes
Description	Brief summary of the job's purpose.	—	No
Image	Full path to the container image.	—	Yes
Python Script	Path to `.py` script inside container (`local:///` scheme).	—	Spark Python only
Main Application File	Path to JAR inside container (`local:///` scheme).	—	Spark Java only
Main Class	Fully qualified class name.	—	Spark Java only
Notebook Path	Path to `.ipynb` inside container (`local:/` scheme).	—	Notebook only
Notebook Kernel Name	Jupyter kernel (e.g., `pyspark`).	—	Notebook only
Data Store Dependencies	Registered Data Stores for data access.	—	No
Arguments	Command-line arguments, one per line.	—	No
Python Version	Python version for the application.	`Python 3.9`	Yes
Spark Version	Spark version for the job.	`3.3.3`	Yes
Image Pull Policy	Kubernetes image pull policy.	`If Not Present`	Yes
Batch Scheduler	Scheduler for batch workloads.	`YuniKorn`	Yes
Image Pull Secrets	Kubernetes secrets for private registries.	—	No
Driver Cores	CPU cores for the Spark driver.	`1`	Yes
Driver Memory	Memory for the Spark driver.	`1g`	Yes
Memory Overhead	Additional off-heap memory for the driver.	`204m`	Yes
Environment Variables	Key-value pairs injected as env vars.	—	No
Enable History Server	Send Spark event logs to History Server.	Checked	No
Enable Gluten	Enable Gluten plugin for accelerated Spark SQL.	Unchecked	No

Best Practices

Use specific image tags (e.g., my-app:1.2.3) instead of latest for repeatable deployments.
Parameterize jobs via Arguments or Environment Variables — avoid hardcoding paths, table names, or connection details in Spark code.
Declare all Data Store Dependencies explicitly — xDP uses these for credential management and lineage tracking.
Clone before experimenting — create a clone of production jobs before testing configuration changes.

Last updated on

Was this page helpful?