Run Spark Rapids

This page helps you run Spark jobs on GPUs using the NVIDIA RAPIDS Accelerator by outlining setup steps, configuration, and verification procedures.

Prerequisites

  • Ensure you have access to a cluster with GPU nodes and required permissions.
  • Java, Hadoop, Spark, and Hive are already installed and accessible in your environment.
  • CUDA libraries compatible with your RAPIDS version are installed.

Steps to Run Spark Rapids

  1. Download the required JAR files.
Bash
Copy
  1. Set environment variables.
Bash
Copy

Make sure the variables above reflect your cluster's directory structure.

  1. Validate the CUDA validation.

Before running your Spark job, check CUDA availability:

Bash
Copy

This command shows the available GPUs and the current CUDA version.

  1. Launch Spark-Shell with rapids.
Bash
Copy

Adjust the script paths and versions based on your actual deployment.

  1. Run a sample job.

In the Spark shell, try running a basic DataFrame operation to test GPU acceleration:

Bash
Copy

or

Bash
Copy

Monitor the Spark UI (typically at port 4040) to verify that GPU resources are being allocated and used for the tasks.

  1. Validation the job execution.
  • Check Spark logs in the Resource Manager for any RAPIDS library loading or GPU assignment errors.
  • Confirm RAPIDS acceleration is being used with log entries about com.nvidia.spark.rapids.
  • You can also set additional debug logs for more visibility:
Bash
Copy

Optional Steps

  • Tuning: Adjust spark.executor.memory, spark.executor.cores, and spark.executor.instances for optimal performance.
  • Library Version Check: Make sure Spark, CUDA, and CUDF versions are compatible.
  • Python Jobs: If running with PySpark, update the above procedure accordingly (e.g., use pyspark instead of spark-shell).
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
  Last updated