Spark Jobs

Spark Jobs

Spark Jobs in xDP provides a managed environment for creating, running, and monitoring Apache Spark applications. It handles cluster submission and resource management so your team can focus on building data pipelines.

Key Concepts

ConceptDescription
Job DefinitionA reusable template for a Spark application — container image, executable script or JAR, resource requirements, and Spark configuration.
Job RunA single execution of a Job Definition. Each run has its own lifecycle, logs, and metrics.
Execution ConfigurationThe full set of parameters applied to a run: runtime arguments, driver/executor resources, and Spark property overrides.

Jobs Listing

From the side navigation, go to Spark > Spark Jobs to see all jobs across clusters:

Filter by Job Type, Plugin Type, Date Range, and Cluster. Use the Actions menu on any row to Edit, Clone, Run Now, or Delete a job.

Capabilities

  • Create and manage Spark Python, Java, and Notebook jobs from a single interface.
  • Submit jobs to Kubernetes without writing spark-submit commands.
  • Monitor every run with execution timelines, live log streaming, and resource metrics.
  • Launch directly into the Spark History Server or Live Spark UI for deep diagnostics.

Prerequisites

  • A running xDP Compute Cluster.
  • Your Spark application packaged as a Docker image in a registry accessible from the cluster.
  • A user role with permissions to create and run jobs.

Best Practices

  • Use naming conventions such as {source}_{target}_{purpose}_job for discoverability.
  • Use versioned image tags (e.g., my-app:1.2.3) instead of latest for repeatable deployments.
  • Tune resources iteratively — start conservative, analyze runs with the Spark History Server, then adjust.
  • Declare all Data Store Dependencies so xDP can manage credentials and track lineage automatically.
VariableType to search · ESC to discard
GlossaryType to search · ESC to discard
InsertType to search · ESC to discard
No matches