Prerequisites Capacity Planning

Capacity Planning

Control Plane

A minimum of 3 nodes is required for Kubernetes control-plane functionality.

Installed Software Scope

The following applications drive capacity requirements:

  • Apache Spark - batch and streaming compute
  • Trino - distributed SQL query engine
  • JupyterHub - interactive notebook sessions

Concurrency and Data Volume Targets

Collect the following metrics for your environment before sizing nodes:

MetricWhat to Measure
Concurrent Spark jobsPeak and average running jobs
Concurrent Trino queriesMaximum sustained query load
Active JupyterHub sessionsSimultaneous notebook users
Data scan volumeTerabytes scanned per hour/day
Processing throughputRequired gigabytes per second

Memory vs Disk Spill

In-Memory Processing

ApplicationRecommendation
SparkAllocate 60--70% of executor memory for direct processing
TrinoSize memory pools to match anticipated query complexity
JupyterHubSet per-user memory limits (typically 2 GB -- 16 GB)

Disk Spill Configuration

  • Use Local SSD storage for shuffle spill operations.
  • NVMe SSDs with >= 50,000 IOPS are recommended.
  • Provision 2--3x the expected maximum spill volume per node.
  • Use multiple mount points in a JBOD configuration.

Storage Infrastructure

Local Disk

RequirementSpecification
TypeNVMe SSD (mandatory for shuffle operations)
Mount protocolJBOD -- /mnt/disk1, /mnt/disk2, etc.

Persistent Storage

Use existing object storage (S3, GCS, Azure Blob) as the data lake repository. This is also required for hosting Spark History Server event logs.

Scaling Strategies

Dynamic Scaling

  • Kubernetes Cluster Autoscaler for automated node-level adjustments.
  • Spark Dynamic Allocation for adaptive executor provisioning.

Static Scaling

  • Fixed node count for environments with predictable workloads.
  • Lower operational complexity and more accurate cost forecasting.

Network Bandwidth

PathMinimumRecommended
Inter-node communication25 Gbps100 Gbps
Storage networkDedicated high-bandwidth link to object storage--

Consider segregating data-plane and control-plane traffic onto separate networks.

VariableType to search · ESC to discard
GlossaryType to search · ESC to discard
InsertType to search · ESC to discard
No matches