Apache Pinot Sizing Guide
Apache Pinot cluster hardware sizing depends on several factors, including:
- Data ingestion rate (real-time or batch)
- Query concurrency and latency requirements (Read and Write QPS)
- Data volume (raw + segment size post ingestion)
- Query complexity (filters, aggregations, joins)
- Use of features like star-tree indexing, inverted indexes, etc.
Here’s a general guideline for hardware sizing across key Pinot components:
Real-time Ingestion Nodes (Server Nodes)
These handle consuming from Kafka (or another stream), indexing, and serving queries.
Parameter | Guideline |
---|---|
CPU | 8–32 vCPUs per node (more cores for high ingest/query workloads) |
Memory (RAM) | 32–128 GB (based on segment size in memory, indexes) |
Disk (SSD recommended) | 1–2 TB NVMe SSD (ensure high IOPS; Pinot is I/O intensive) |
Network | ≥10 Gbps (especially for high ingest rate and query throughput) |
# of Nodes | Start with 3–5, scale based on data size and QPS |
Offline Ingestion/Storage Nodes
Used for querying large volumes of historical data (HDFS/S3 segments loaded onto these nodes).
Parameter | Guideline |
---|---|
CPU | 8–16 vCPUs |
Memory | 32–64 GB |
Disk (SSD preferred) | 1–4 TB (based on segment storage needs) |
# of Nodes | 3–10+ (depends on data volume and retention) |
Broker Nodes
These handle query routing and aggregation across servers.
Parameter | Guideline |
---|---|
CPU | 4–8 vCPUs |
Memory | 16–32 GB |
# of Nodes | 2–4 (scale based on concurrency and latency targets) |
Controller Nodes
Coordinate cluster metadata, segment assignment, and retention policies.
Parameter | Guideline |
---|---|
CPU | 2–4 vCPUs |
Memory | 8–16 GB |
Disk | 100–200 GB |
# of Nodes | 2 (HA via active-standby setup) |
Example: Sizing for 10 TB of data with moderate real-time ingestion and ~100 QPS
Role | #Nodes | CPU | RAM | Disk (SDD) |
---|---|---|---|---|
Server | 5 | 16 vCPUs | 64 GB | 2 TB |
Broker | 3 | 8 vCPUs | 32 GB | 500 GB |
Controller | 2 | 4 vCPUs | 16 GB | 100 GB |
Additional Notes
- Pinot memory usage is influenced by segment loading and query execution.
- Disk: Prefer SSDs for segment scan speed, especially for star-tree or sorted indexes.
- Co-location of real-time and offline servers is possible but avoid it for production if latency is critical.
- Use auto-scaling in Kubernetes or YARN environments based on CPU and memory.
For more details about the Pinot Sizing, see Capacity Planning in Apache Pinot - Part 1 | StarTree.
Was this page helpful?