Run Spark Rapids

This page helps you run Spark jobs on GPUs using the NVIDIA RAPIDS Accelerator by outlining setup steps, configuration, and verification procedures.

Prerequisites

Ensure you have access to a cluster with GPU nodes and required permissions.
Java, Hadoop, Spark, and Hive are already installed and accessible in your environment.
CUDA libraries compatible with your RAPIDS version are installed.

Steps to Run Spark Rapids

Download the required JAR files.

Bash
    
​x
    
wget http://repo1.acceldata.dev/repository/odp-central/com/nvidia/rapids-4-spark_2.12/25.06.0.3.3.6.2-1/rapids-4-spark_2.12-25.06.0.3.3.6.2-1-cuda11.jar​wget https://repo1.maven.org/maven2/ai/rapids/cudf/25.06.0/cudf-25.06.0-cuda11.jar
Copy

Set environment variables.

Bash
    
 
export HIVE_HOME=/usr/odp/3.3.6.2-1/hiveexport SPARK_HOME=/usr/odp/3.3.6.2-1/spark3export HADOOP_CLASSPATH=$(hadoop classpath)
Copy

Make sure the variables above reflect your cluster's directory structure.

Validate the CUDA validation.

Before running your Spark job, check CUDA availability:

Bash
    
 
nvidia-smi
Copy

This command shows the available GPUs and the current CUDA version.

Launch Spark-Shell with rapids.

Bash
    
 
$SPARK_HOME/bin/spark-shell \  --master yarn \  --conf spark.yarn.queue=GPU \  --conf spark.plugins=com.nvidia.spark.SQLPlugin \  --conf spark.rapids.sql.enabled=true \  --conf spark.executor.resource.gpu.amount=1 \  --conf spark.task.resource.gpu.amount=0.1 \  --conf spark.resources.discoveryScript=$SPARK_HOME/examples/src/main/scripts/getGpusResources.sh \  --conf spark.executor.resource.gpu.discoveryScript=$SPARK_HOME/examples/src/main/scripts/getGpusResources.sh \  --conf spark.metrics.enabled=false \  --jars rapids-4-spark_2.12-25.06.0.3.3.6.2-1-cuda11.jar,cudf-25.06.0-cuda11.jar
Copy

Adjust the script paths and versions based on your actual deployment.

Run a sample job.

In the Spark shell, try running a basic DataFrame operation to test GPU acceleration:

Bash
    
 
val df = spark.range(1, 1_000_000)df.selectExpr("id", "id * 2 as double_id").show()
Copy

Bash
    
 
val df = spark.range(1, 100000000).toDF("id")val result = df.groupByExpr("id % 100").count()result.show()
Copy

Monitor the Spark UI (typically at port 4040) to verify that GPU resources are being allocated and used for the tasks.

Validation the job execution.

Check Spark logs in the Resource Manager for any RAPIDS library loading or GPU assignment errors.
Confirm RAPIDS acceleration is being used with log entries about com.nvidia.spark.rapids.
You can also set additional debug logs for more visibility:

Bash
    
 
--conf spark.rapids.sql.logging.enabled=true
Copy

Optional Steps

Tuning: Adjust spark.executor.memory, spark.executor.cores, and spark.executor.instances for optimal performance.
Library Version Check: Make sure Spark, CUDA, and CUDF versions are compatible.
Python Jobs: If running with PySpark, update the above procedure accordingly (e.g., use pyspark instead of spark-shell).

Last updated on Aug 20, 2025

Was this page helpful?