Trino Capacity Planning
This page provides comprehensive recommendations for planning and optimizing the capacity of your Trino cluster deployment, ensuring optimal performance, resource utilization, and cost efficiency.
Cluster Sizing Fundamentals
Node Configuration Strategy
The foundation of effective capacity planning is determining the right number and configuration of nodes:
- Start big, then optimize: Begin with larger cluster sizes and optimize downward as you understand your workload patterns.
- Worker-to-coordinator ratio: A cluster requires exactly one coordinator, with dedicating a machine exclusively to coordination providing the best performance for larger clusters.
- Recommended worker count: For clusters with identical specifications across master and core nodes, 5-20 worker nodes is typically optimal.
- Coordinator specifications: If using a smaller number of workers, the coordinator can have approximately half the specifications of worker nodes, but should not be significantly less powerful to avoid performance and stability issues.
Understanding Capacity Requirements
Properly sizing your Trino cluster requires understanding three critical capacity dimensions:
- Usual capacity needs: Based on historical workload patterns
- Current capacity needs: Based on present workload demands
- Potential capacity needs: Accounting for peak usage scenarios and growth.
For large-scale clusters, it's generally more effective to increase the number of worker nodes rather than scaling up individual node specifications.
Memory Configuration
Memory configuration is one of the most critical aspects of Trino capacity planning, directly impacting query performance, concurrency, and cluster stability.
Coordinator-Worker Ratio and Specifications
Trino’s coordinator-worker architecture demands careful resource allocation:
- Coordinator requirements: Dedicate a node with ≥32 GB RAM and 8–16 vCPUs for clusters with ≤20 workers. For larger clusters (>50 workers), scale to 64–128 GB RAM and 32 vCPUs to handle query planning and scheduling overhead.
- Worker specifications: Use homogeneous nodes with ≥64 GB RAM and 16–32 vCPUs per worker. Version 472’s improved dynamic filtering reduces unnecessary data scans, but memory-intensive operations like hash joins still require ≥48 GB/user memory pool per worker.
Critical constraint: The coordinator’s JVM heap must be ≥70% of physical RAM but ≤80% to avoid garbage collection stalls. For a 128 GB RAM coordinator:
-Xmx96G
Memory Configuration Guidelines
Trino Memory Parameters
Balance these key memory settings to optimize performance:
- query.max-memory-per-node: Maximum user memory a query can use on a worker (default: JVM max memory × 0.3).
- query.max-memory: Maximum user memory a query can use across the entire cluster (default: 20GB).
- query.max-total-memory: Maximum total memory including system overhead (default: query.max-memory × 2).
Parameter | Default | Recommended Adjustment |
---|---|---|
query.max-memory | 20 GB | 40–60% of cluster RAM |
query.max-memory-per-node | 30% of JVM heap | 50–60% of JVM heap |
memory.heap-headroom | 30% of JVM heap | 20% of JVM heap |
For a worker with 128 GB RAM (JVM -Xmx96G
):
query.max-memory=1.5TB
query.max-memory-per-node=48G
memory.heap-headroom=19G
This configuration prevents worker OOM kills while maximizing memory utilization.
Worker Node Count vs. Vertical Scaling
- Small clusters (5–20 workers): Prefer fewer larger nodes (e.g., AWS m5.8xlarge) to minimize per-node overhead.
- Large clusters (>50 workers): Scale horizontally with ≥32-core nodes to leverage version Trino’s enhanced task distribution.
Performance testing data:
Node Count | Avg. Query Time (s) | Cost/Hour ($) |
---|---|---|
10 | 45 | 12.80 |
20 | 28 | 25.60 |
50 | 22 | 64.00 |
Trino v472 shows 15–20% better linear scaling compared to v471 due to reduced coordinator bottlenecks.
Java 23 Adjustments
Trino mandates Java 23 with Eclipse Temurin JDK. Key impacts:
- Garbage collection: ZGC reduces pause times by 70% for ≥64 GB heaps.
- Memory alignment: 64-byte cache line optimization improves scan speeds by 8–12%.
Network and Storage Considerations
Inter-Node Communication
- 10 Gbps networking: Mandatory for clusters >20 nodes to prevent shuffle bottlenecks.
- Topology awareness: Configure rack awareness in
node-scheduler.properties
to minimize cross-AZ traffic.
Spill-to-Disk Configuration
For memory-constrained environments:
spill-enabled=true
#Enabling this feature can reduce disk IO at the cost of extra CPU load to compress and decompress spilled pages.
spill-compression-codec
System-Level Configuration
- Disable Linux swap: Trino assumes swap is disabled.
sudo sysctl vm.swappiness=0
Make this change permanent by adding vm.swappiness=0
to /etc/sysctl.conf
.
- Set appropriate ulimits: Configure adequate limits for the Trino user.
trino soft nofile 131072
trino hard nofile 131072
trino soft nproc 128000
trino hard nproc 128000
These settings are typically placed in /etc/security/limits.conf
.
Resource Groups Configuration
Resource groups are essential for workload management, allowing you to control resource allocation and enforce queuing policies.
Configuration Approaches
You can implement resource groups using either:
- File-based manager: Configure via a JSON file.
resource-groups.configuration-manager=file
resource-groups.config-file=/etc/trino/conf/resource-groups.json
- Database-based manager: Store configuration in a relational database (MySQL, PostgreSQL, or Oracle)
resource-groups.configuration-manager=db
resource-groups.config-db-url=jdbc:mysql://localhost:3306/resource_groups
resource-groups.config-db-user=username
resource-groups.config-db-password=password
Resource groups provide graceful handling of resource limitations—when a group exhausts its allocated resources, new queries are queued rather than causing running queries to fail (with the exception of queued query limits).
Performance Optimization Strategies
Query Optimization
- Monitor CPU and memory utilization: Regular monitoring helps identify bottlenecks and optimize resource allocation.
- Optimize join strategies: The order of tables in joins significantly impacts performance; use the Cost-Based Optimizer (CBO) to make intelligent join decisions.
- Leverage dynamic filtering: Version 452 introduced improved performance for selective joins through fine-grained filtering of rows using dynamic filters (enabled by default via
enable-dynamic-row-filtering
). - Apply partitioning and bucketing: Proper data organization significantly improves query performance, especially with large datasets.
- Use EXPLAIN ANALYZE: Regularly analyze and optimize query plans to identify performance bottlenecks.
Writer Scaling
Trino 472 supports dynamic writer scaling, which adjusts the number of writer tasks based on workload demands:
- scale-writers: Enable writer scaling across the cluster (default: true)
- task.scale-writers.enabled: Enable scaling writers within a task (default: true)
- writer-scaling-min-data-processed: Minimum amount of uncompressed data that must be processed before adding another writer (default: 100MB).
Enable in config.properties
:
scale-writers=true
task.scale-writers.enabled=true
writer-scaling-min-data-processed=200MB
This feature is particularly beneficial when using connectors like Hive that produce files per writer, as fewer writers result in larger average file sizes while maintaining performance and lowering HDFS namenode pressure.
Performance Improvements
- Enhanced performance for queries with
ORDER BY ... LIMIT
combined with subqueries. - Fixed failures for queries with large numbers of expressions in the
SELECT
clause. - Improved performance for connector-specific operations, particularly with the Iceberg connector.
Monitoring and Tuning
Critical Metrics Dashboard
Metric | Alert Threshold | Optimization Action |
---|---|---|
Query queued time | >30s | Add workers or reduce concurrency |
Heap used % | >85% | Increase query.max-memory-per-node |
CPU steal time | >20% | Migrate to dedicated hosts |
Network retransmits | >5% | Check switch congestion |
Use Pulse with these thresholds to maintain SLA compliance.
Conclusion
Trino clusters require a balanced approach combining:
- Memory-centric worker sizing with ≥64 GB/node
- Coordinator isolation for clusters >20 nodes
- Horizontal scaling over vertical scaling
- Version-aware tuning for writer scaling and Java 23 optimizations
Effective capacity planning for Trino requires balancing cluster size, memory configuration, resource allocation, and performance optimization. By understanding your workload patterns and applying the recommendations in this guide, you can achieve a cost-effective deployment that delivers optimal performance for your specific use case.
Start with a slightly over-provisioned cluster, monitor performance metrics closely, and iteratively optimize your configuration based on real-world usage patterns. With proper capacity planning, your Trino deployment can efficiently handle everything from routine analytics to peak workloads while maintaining performance and controlling costs.