Spark RAPIDS

This guide explains how to install, configure, and run Spark with the NVIDIA RAPIDS Accelerator on your cluster environment.

Prerequisites

  • Ensure you have access to a cluster with GPU nodes and required permissions.
  • Java, Hadoop, Spark, and Hive are already installed and accessible on your environment.
  • CUDA libraries compatible with your RAPIDS version are installed.

Method A: Install RAPIDS Using Ambari Mpack

Spark RAPIDS is bundled with the Spark3 Ambari Mpack. Refer to the https://docs.acceldata.io/odp/odp-3.3.6.2-1/documentation/odp-working-with-ambari-management-packs#spark-3 documentation for installing the Spark3 Mpack. Once installed, refer to these steps:

  1. Open the Ambari UI, navigate to Menu -> Services.
  2. Click the ellipsis menu (⋯) in the top-right corner.
  3. Select Add Service. The list of services appears on the screen.
  4. Select Spark Rapids and click Next.
  1. On the Assign Slaves and Clients page, select nodes where you want to install the Spark Rapid Client and click Next.
  1. Review the configuration and click Deploy.
  1. After installation, the service MLflow gets added under Services.

Method B: Spark Rapids Standalone Deployment

  1. Download the standalone tarball.
Bash
Copy
  1. Set environment variables:
Bash
Copy

Note Ensure these paths match your cluster’s directory structure..

  1. Validate CUDA installation:
Bash
Copy

This confirms GPU availability and CUDA version.

  1. Launch Spark Shell with RAPIDS:
Bash
Copy

Note Adjust script paths and version numbers based on your environment.

  1. Run a sample job:
Bash
Copy

or

Bash
Copy

Monitor the Spark UI (default: port 4040) to verify GPU usage.

  1. Validate job execution:
    1. Check ResourceManager logs for GPU assignment or RAPIDS loading issues.
    2. Look for log messages containing com.nvidia.spark.rapids.
    3. Optional logging:
Bash
Copy

Optional Steps

  • Tuning: Adjust spark.executor.memory, spark.executor.cores, and spark.executor.instances for optimal performance.
  • Library Version Check: Ensure Spark, CUDA, and CUDF versions are compatible.
  • Python Jobs: If running with PySpark, update the above procedure accordingly (e.g., use pyspark instead of spark-shell).
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
  Last updated