xDP Documentation
xDP
Get Started
Deployment Guide
Requirement Guide
xCentral - Platform Management
xStore Catalog and Metadata Management
xCompute - Compute Layer
xObserve - Observability Layer
Data Management
Applications
SQL (Trino)
Notebooks
Developer Guide
Migration Guide
Title
Message
Create new category
What is the title of your new category?
Edit page index title
What is the title of the page index?
Edit category
What is the new title of your category?
Edit link
What is the new title and URL of your link?
Spark Job Manage
Summarize Page
Copy Markdown
Open in ChatGPT
Open in Claude
Connect to Cursor
Connect to VS Code
Managing Spark Jobs
Use this guide to create, edit, clone, and delete Spark jobs in xDP.
Create a Spark Job
- From the side navigation, go to Spark Jobs and click + New Job.
- Basic Information — Enter a unique Job Name and optional Description. Click Continue.
- Job Type — Select the application type: Spark (Python), Spark (Java), or Notebook (Python). Click Continue.
- Configuration — Fill in the runtime parameters for your job type:
| Field | Description |
|---|---|
| Image | Full path to the container image (must include the Spark app and all dependencies). |
| Python Script | (Spark Python) Path to .py file inside the container using local:/// scheme. |
| Main Application File | (Spark Java) Path to the JAR file inside the container using local:/// scheme. |
| Main Class | (Spark Java) Fully qualified class name of the Spark application entry point. |
| Notebook Path | (Notebook Python) Path to .ipynb file inside the container using local:/ scheme. |
| Notebook Kernel Name | (Notebook Python) Jupyter kernel to use (e.g., pyspark). |
| Data Store Dependencies | Registered Data Stores the job needs to access (S3, HDFS, Hive Metastore, etc.). |
| Arguments | Command-line arguments to pass to the application. |
| Python / Spark Version | Versions matching your application. |
| Image Pull Policy | Kubernetes pull policy: IfNotPresent, Always, or Never. |
| Batch Scheduler | Kubernetes Native or YuniKorn. |
| Image Pull Secrets | Kubernetes secrets for private registries. |
Click Show Advanced Settings to configure driver/executor resources, dynamic allocation, environment variables, and plugins (History Server, Gluten).
- Scheduling — Choose Run Immediately or Schedule with a cron expression. Click Continue.
- Review & Create — Verify all settings and click Create Job.
Edit a Spark Job
Note: Job Name and Job Type cannot be changed when editing. To change either, clone the job instead.
- Navigate to Spark Jobs, click the job name to open its details, then click Edit.
- The wizard opens pre-filled. Update any editable field — description, image, script/JAR path, Data Store Dependencies, schedule, or advanced settings.
- Step through to Review & Update and click Update Job.
Clone a Spark Job
Cloning creates a full copy of an existing job with all fields pre-filled and editable — including Job Name and Job Type. Use it to:
- Create a staging vs. production variant of a job.
- Preserve a job's configuration before making significant changes.
- Quickly bootstrap a new job that shares most settings with an existing one.
- In the Spark Jobs list, click the ⋯ (Actions) menu for the job and select Clone.
- The creation wizard opens with all fields pre-filled. The Job Name is set to
<original-name> (Clone)— update it to a unique name. - Modify any settings, then click Create Job on the Review page.
| Capability | Clone | Edit |
|---|---|---|
| Change Job Name | Yes | No |
| Change Job Type | Yes | No |
| Creates a new job | Yes | No |
| Affects original job | No | Yes |
Delete a Spark Job
Warning: Deletion is permanent. Any workflows referencing the deleted job will lose that dependency. Clone the job first if you may need its configuration again.
- In the Spark Jobs list, click the ⋯ (Actions) menu for the job and select Delete.
- Confirm in the dialog. The job definition, schedule, and Data Store dependency links are permanently removed. Run history retention depends on your platform's data retention policy.
Configuration Reference
| Parameter | Description | Default | Required |
|---|---|---|---|
| Job Name | Unique name for the job. | — | Yes |
| Description | Brief summary of the job's purpose. | — | No |
| Image | Full path to the container image. | — | Yes |
| Python Script | Path to .py script inside container (local:/// scheme). | — | Spark Python only |
| Main Application File | Path to JAR inside container (local:/// scheme). | — | Spark Java only |
| Main Class | Fully qualified class name. | — | Spark Java only |
| Notebook Path | Path to .ipynb inside container (local:/ scheme). | — | Notebook only |
| Notebook Kernel Name | Jupyter kernel (e.g., pyspark). | — | Notebook only |
| Data Store Dependencies | Registered Data Stores for data access. | — | No |
| Arguments | Command-line arguments, one per line. | — | No |
| Python Version | Python version for the application. | Python 3.9 | Yes |
| Spark Version | Spark version for the job. | 3.3.3 | Yes |
| Image Pull Policy | Kubernetes image pull policy. | If Not Present | Yes |
| Batch Scheduler | Scheduler for batch workloads. | YuniKorn | Yes |
| Image Pull Secrets | Kubernetes secrets for private registries. | — | No |
| Driver Cores | CPU cores for the Spark driver. | 1 | Yes |
| Driver Memory | Memory for the Spark driver. | 1g | Yes |
| Memory Overhead | Additional off-heap memory for the driver. | 204m | Yes |
| Environment Variables | Key-value pairs injected as env vars. | — | No |
| Enable History Server | Send Spark event logs to History Server. | Checked | No |
| Enable Gluten | Enable Gluten plugin for accelerated Spark SQL. | Unchecked | No |
Best Practices
- Use specific image tags (e.g.,
my-app:1.2.3) instead oflatestfor repeatable deployments. - Parameterize jobs via Arguments or Environment Variables — avoid hardcoding paths, table names, or connection details in Spark code.
- Declare all Data Store Dependencies explicitly — xDP uses these for credential management and lineage tracking.
- Clone before experimenting — create a clone of production jobs before testing configuration changes.
VariableType to search · ESC to discard
GlossaryType to search · ESC to discard
InsertType to search · ESC to discard
No matches
Last updated on
Was this page helpful?
Next to read:
Spark Job DetailsFor additional help, contact our Support Team!
©2026, Acceldata Inc — All Rights Reserved.
Discard Changes
Do you want to discard your current changes and overwrite with the template?
Archive Synced Block
Message
Create new Template
What is this template's title?
Delete Template
Message