xDP Documentation
xDP
Get Started
Deployment Guide
xCentral - Platform Management
xStore Catalog and Metadata Management
xCompute - Compute Layer
xObserve - Observability Layer
Data Management
Applications
Orchestration
SQL (Trino)
Notebooks
Developer Guide
Migration Guide
Title
Message
Create new category
What is the title of your new category?
Edit page index title
What is the title of the page index?
Edit category
What is the new title of your category?
Edit link
What is the new title and URL of your link?
Prerequisites Capacity Planning
Summarize Page
Copy Markdown
Open in ChatGPT
Open in Claude
Connect to Cursor
Connect to VS Code
Capacity Planning
Control Plane
A minimum of 3 nodes is required for Kubernetes control-plane functionality.
Installed Software Scope
The following applications drive capacity requirements:
- Apache Spark - batch and streaming compute
- Trino - distributed SQL query engine
- JupyterHub - interactive notebook sessions
Concurrency and Data Volume Targets
Collect the following metrics for your environment before sizing nodes:
| Metric | What to Measure |
|---|---|
| Concurrent Spark jobs | Peak and average running jobs |
| Concurrent Trino queries | Maximum sustained query load |
| Active JupyterHub sessions | Simultaneous notebook users |
| Data scan volume | Terabytes scanned per hour/day |
| Processing throughput | Required gigabytes per second |
Memory vs Disk Spill
In-Memory Processing
| Application | Recommendation |
|---|---|
| Spark | Allocate 60--70% of executor memory for direct processing |
| Trino | Size memory pools to match anticipated query complexity |
| JupyterHub | Set per-user memory limits (typically 2 GB -- 16 GB) |
Disk Spill Configuration
- Use Local SSD storage for shuffle spill operations.
- NVMe SSDs with >= 50,000 IOPS are recommended.
- Provision 2--3x the expected maximum spill volume per node.
- Use multiple mount points in a JBOD configuration.
Storage Infrastructure
Local Disk
| Requirement | Specification |
|---|---|
| Type | NVMe SSD (mandatory for shuffle operations) |
| Mount protocol | JBOD -- /mnt/disk1, /mnt/disk2, etc. |
Persistent Storage
Use existing object storage (S3, GCS, Azure Blob) as the data lake repository. This is also required for hosting Spark History Server event logs.
Scaling Strategies
Dynamic Scaling
- Kubernetes Cluster Autoscaler for automated node-level adjustments.
- Spark Dynamic Allocation for adaptive executor provisioning.
Static Scaling
- Fixed node count for environments with predictable workloads.
- Lower operational complexity and more accurate cost forecasting.
Network Bandwidth
| Path | Minimum | Recommended |
|---|---|---|
| Inter-node communication | 25 Gbps | 100 Gbps |
| Storage network | Dedicated high-bandwidth link to object storage | -- |
Consider segregating data-plane and control-plane traffic onto separate networks.
VariableType to search · ESC to discard
GlossaryType to search · ESC to discard
InsertType to search · ESC to discard
No matches
Last updated on
Was this page helpful?
Next to read:
Prerequisites Kubernetes PermissionsFor additional help, contact our Support Team!
©2026, Acceldata Inc — All Rights Reserved.
Discard Changes
Do you want to discard your current changes and overwrite with the template?
Archive Synced Block
Message
Create new Template
What is this template's title?
Delete Template
Message
On This Page
Prerequisites Capacity PlanningCapacity PlanningControl PlaneInstalled Software ScopeConcurrency and Data Volume TargetsMemory vs Disk SpillStorage InfrastructureScaling StrategiesNetwork BandwidthIn-Memory ProcessingDisk Spill ConfigurationLocal DiskPersistent StorageDynamic ScalingStatic Scaling