Spark Jobs

Spark Jobs in xDP provides a managed environment for creating, running, and monitoring Apache Spark applications. It handles cluster submission and resource management so your team can focus on building data pipelines.

Key Concepts

Concept	Description
Job Definition	A reusable template for a Spark application — container image, executable script or JAR, resource requirements, and Spark configuration.
Job Run	A single execution of a Job Definition. Each run has its own lifecycle, logs, and metrics.
Execution Configuration	The full set of parameters applied to a run: runtime arguments, driver/executor resources, and Spark property overrides.

Jobs Listing

From the side navigation, go to Spark > Spark Jobs to see all jobs across clusters:

Filter by Job Type, Plugin Type, Date Range, and Cluster. Use the ⋯ Actions menu on any row to Edit, Clone, Run Now, or Delete a job.

Capabilities

Create and manage Spark Python, Java, and Notebook jobs from a single interface.
Submit jobs to Kubernetes without writing spark-submit commands.
Monitor every run with execution timelines, live log streaming, and resource metrics.
Launch directly into the Spark History Server or Live Spark UI for deep diagnostics.

Prerequisites

A running xDP Compute Cluster.
Your Spark application packaged as a Docker image in a registry accessible from the cluster.
A user role with permissions to create and run jobs.

Best Practices

Use naming conventions such as {source}_{target}_{purpose}_job for discoverability.
Use versioned image tags (e.g., my-app:1.2.3) instead of latest for repeatable deployments.
Tune resources iteratively — start conservative, analyze runs with the Spark History Server, then adjust.
Declare all Data Store Dependencies so xDP can manage credentials and track lineage automatically.

Last updated on

Was this page helpful?