Apps

1. What is Apps?

The Apps capability provides a curated catalog of essential data processing and analytics tools that can be installed and managed on your compute clusters with a single click. This feature abstracts away the complexity of deploying and configuring the data stack, enabling your data teams to provision tools like Apache Spark, Jupyter Hub, and Airflow in minutes. By standardizing the available toolset, Apps accelerates development cycles and reduces the operational overhead of managing a modern data platform.

2. Key Concepts

  • Application: A pre-packaged, integrated software component (e.g., Apache Spark, Trino) that can be deployed on a compute cluster to provide a specific data capability. Each application is versioned and tested for compatibility with the xDP platform.
  • Compute Cluster: The underlying infrastructure, typically a Kubernetes cluster, where applications are installed and executed. You can manage multiple clusters and deploy different sets of applications to each one to support various workloads and teams.

3. Capabilities

  • One-Click Application Deployment: Install complex, distributed data systems directly onto your compute clusters from a centralized catalog. This significantly reduces the time and expertise required to stand up new data services.
  • Centralized Lifecycle Management: View the installation status of all applications across your clusters, and easily edit configurations or uninstall them as your needs change. This provides a single pane of glass for managing the platform's software stack.
  • Environment Standardization: Ensure that all teams are using consistent, approved versions of data tools. This simplifies dependency management, improves reliability, and makes it easier to enforce security and governance policies.
  • Extensible Data Platform: Quickly add new capabilities to your data platform, such as interactive querying with Trino or workflow orchestration with Airflow, without a lengthy procurement or integration process. This allows the platform to evolve alongside your business requirements.

4. Supported Versions

The following table lists the supported versions for each application available in the xDP App Catalog. Vulnerabilities for these applications are managed within ODP, ensuring easier onboarding, security compliance, and simplified vulnerability scanning.

Note (Spark): Spark jobs run against a bundled base image — a pre-built container image that packages a specific Spark version and Python runtime together. When submitting a job, you specify both the container image (e.g., spark:3.5.5) and the sparkVersion field independently. xDP provides supported base images from the Acceldata image registry.

ApplicationSupported Version(s)Usage Notes
Apache Spark3.3.3,3.5.5Jobs run against a bundled base image. Acceldata ships certified images for 3.3.3, 3.3.4, 3.5.5, and 4.0.0. Any Spark >= 3.0.0 is accepted.Used as the primary compute engine for batch and structured processing workloads.
JupyterHub5.2.1Primarily used for interactive data exploration, notebook-based development, and ad hoc analysis by data scientists.
Trino472,468Used for distributed SQL query execution across heterogeneous data sources.
Apache Airflow2.8.3Manages DAG-based workflows for scheduling and coordinating data pipelines and jobs.
Apache NiFi / Registry1.28.1Used for data ingestion, flow management, and integration with external systems.
Apache Ranger2.5.0Provides centralized policy management and access control. Supported when xDP is integrated with an ODP cluster.
Apache Kafka3.7.2Upcoming: Enables streaming and event-driven data pipelines. Planned for integration with real-time ingestion and processing use cases.
Apache Flink1.19.1Upcoming: Provides stateful stream processing capabilities on compute clusters. Intended for low-latency, real-time analytics workloads.

4. Getting Started

Before you can install applications, ensure you have the necessary prerequisites in place.

Prerequisites:

  • You must have at least one active Compute Cluster registered with xDP.
  • Your user role must have permissions to install, configure, and uninstall applications.

Workflow to Deploy Your First Application:

  1. Select a Target Cluster: From the dropdown menu in the page header, choose the Compute Cluster where you want to install the application. The catalog will refresh to show the status of apps on that specific cluster.
  2. Browse the App Catalog: Review the list of available applications. Each card provides a brief description of the tool's purpose and its version.
  3. Install an Application: Locate the application you need and click the + Install button. You may be prompted to provide specific configuration details required for the initial setup.
  4. Verify the Installation: Once the deployment process is complete, the application's card will update to show an Installed status.
  5. Begin Using the Application: The newly installed capability is now available on the cluster for your data teams to use, for example, by submitting Spark jobs or building data pipelines.

Example: Your data science team needs an interactive environment to build machine learning models. You can navigate to the Apps page, select the primary analytics cluster, and install Jupyter Hub to provide them with a fully managed notebook environment in a matter of minutes.

5. Common Workflows

Install a Query Engine like Trino

  1. Navigate to the Apps page.
  2. Select the desired Compute Cluster from the header.
  3. Find the Trino card in the application catalog.
  4. Click + Install.
  5. After a few moments, the installation will complete, and the status will change to Installed. Your analytics team can now connect their BI tools to the Trino service endpoint to run interactive queries.

Modify an Existing Application's Configuration

  1. Navigate to the Apps page.
  2. Select the Compute Cluster where the application is installed.
  3. Locate the card for the installed application (e.g., Apache Spark).
  4. Click the Edit button.
  5. In the configuration view that appears, modify the necessary parameters (e.g., update resource limits, change default properties).
  6. Save your changes to apply the new configuration to the application.

Uninstall an Application

  1. Navigate to the Apps page.
  2. Select the Compute Cluster from which you need to remove the application.
  3. Find the card for the application you wish to remove.
  4. Click the Uninstall button (trash can icon).
  5. In the confirmation dialog, confirm the action. The application and its associated resources will be cleanly removed from the cluster, freeing up resources.

6. Best Practices

  • Isolate Workloads: For production environments, consider installing resource-intensive applications on dedicated compute clusters. For example, run ETL workloads on one cluster and interactive analytics or data science workloads on another to prevent resource contention.
  • Manage Configurations: When editing an application's configuration, document the changes and their purpose. For critical production systems, store your configurations in a version control system to track history and facilitate rollbacks.
  • Principle of Least Privilege: Restrict permissions to install and manage applications to platform administrators or team leads. This ensures stability and prevents accidental changes in a production environment.
  • Monitor Post-Installation: After installing a new application, navigate to the Compute Clusters view to monitor its impact on cluster-wide resource utilization (CPU, memory). Set up alerts to be notified of any performance degradation.
VariableType to search · ESC to discard
GlossaryType to search · ESC to discard
InsertType to search · ESC to discard
No matches