Spark Job Manage

Managing Spark Jobs

Use this guide to create, edit, clone, and delete Spark jobs in xDP.

Create a Spark Job

  1. From the side navigation, go to Spark Jobs and click + New Job.
  2. Basic Information — Enter a unique Job Name and optional Description. Click Continue.
  3. Job Type — Select the application type: Spark (Python), Spark (Java), or Notebook (Python). Click Continue.
  4. Configuration — Fill in the runtime parameters for your job type:
FieldDescription
ImageFull path to the container image (must include the Spark app and all dependencies).
Python Script(Spark Python) Path to .py file inside the container using local:/// scheme.
Main Application File(Spark Java) Path to the JAR file inside the container using local:/// scheme.
Main Class(Spark Java) Fully qualified class name of the Spark application entry point.
Notebook Path(Notebook Python) Path to .ipynb file inside the container using local:/ scheme.
Notebook Kernel Name(Notebook Python) Jupyter kernel to use (e.g., pyspark).
Data Store DependenciesRegistered Data Stores the job needs to access (S3, HDFS, Hive Metastore, etc.).
ArgumentsCommand-line arguments to pass to the application.
Python / Spark VersionVersions matching your application.
Image Pull PolicyKubernetes pull policy: IfNotPresent, Always, or Never.
Batch SchedulerKubernetes Native or YuniKorn.
Image Pull SecretsKubernetes secrets for private registries.

Click Show Advanced Settings to configure driver/executor resources, dynamic allocation, environment variables, and plugins (History Server, Gluten).

  1. Scheduling — Choose Run Immediately or Schedule with a cron expression. Click Continue.
  2. Review & Create — Verify all settings and click Create Job.

Edit a Spark Job

Note: Job Name and Job Type cannot be changed when editing. To change either, clone the job instead.

  1. Navigate to Spark Jobs, click the job name to open its details, then click Edit.
  2. The wizard opens pre-filled. Update any editable field — description, image, script/JAR path, Data Store Dependencies, schedule, or advanced settings.
  3. Step through to Review & Update and click Update Job.

Clone a Spark Job

Cloning creates a full copy of an existing job with all fields pre-filled and editable — including Job Name and Job Type. Use it to:

  • Create a staging vs. production variant of a job.
  • Preserve a job's configuration before making significant changes.
  • Quickly bootstrap a new job that shares most settings with an existing one.
  1. In the Spark Jobs list, click the (Actions) menu for the job and select Clone.
  2. The creation wizard opens with all fields pre-filled. The Job Name is set to <original-name> (Clone) — update it to a unique name.
  3. Modify any settings, then click Create Job on the Review page.
CapabilityCloneEdit
Change Job NameYesNo
Change Job TypeYesNo
Creates a new jobYesNo
Affects original jobNoYes

Delete a Spark Job

Warning: Deletion is permanent. Any workflows referencing the deleted job will lose that dependency. Clone the job first if you may need its configuration again.

  1. In the Spark Jobs list, click the (Actions) menu for the job and select Delete.
  2. Confirm in the dialog. The job definition, schedule, and Data Store dependency links are permanently removed. Run history retention depends on your platform's data retention policy.

Configuration Reference

ParameterDescriptionDefaultRequired
Job NameUnique name for the job.Yes
DescriptionBrief summary of the job's purpose.No
ImageFull path to the container image.Yes
Python ScriptPath to .py script inside container (local:/// scheme).Spark Python only
Main Application FilePath to JAR inside container (local:/// scheme).Spark Java only
Main ClassFully qualified class name.Spark Java only
Notebook PathPath to .ipynb inside container (local:/ scheme).Notebook only
Notebook Kernel NameJupyter kernel (e.g., pyspark).Notebook only
Data Store DependenciesRegistered Data Stores for data access.No
ArgumentsCommand-line arguments, one per line.No
Python VersionPython version for the application.Python 3.9Yes
Spark VersionSpark version for the job.3.3.3Yes
Image Pull PolicyKubernetes image pull policy.If Not PresentYes
Batch SchedulerScheduler for batch workloads.YuniKornYes
Image Pull SecretsKubernetes secrets for private registries.No
Driver CoresCPU cores for the Spark driver.1Yes
Driver MemoryMemory for the Spark driver.1gYes
Memory OverheadAdditional off-heap memory for the driver.204mYes
Environment VariablesKey-value pairs injected as env vars.No
Enable History ServerSend Spark event logs to History Server.CheckedNo
Enable GlutenEnable Gluten plugin for accelerated Spark SQL.UncheckedNo

Best Practices

  • Use specific image tags (e.g., my-app:1.2.3) instead of latest for repeatable deployments.
  • Parameterize jobs via Arguments or Environment Variables — avoid hardcoding paths, table names, or connection details in Spark code.
  • Declare all Data Store Dependencies explicitly — xDP uses these for credential management and lineage tracking.
  • Clone before experimenting — create a clone of production jobs before testing configuration changes.
VariableType to search · ESC to discard
GlossaryType to search · ESC to discard
InsertType to search · ESC to discard
No matches