Apache Pinot Sizing Guide

Apache Pinot cluster hardware sizing depends on several factors, including:

  • Data ingestion rate (real-time or batch)
  • Query concurrency and latency requirements (Read and Write QPS)
  • Data volume (raw + segment size post ingestion)
  • Query complexity (filters, aggregations, joins)
  • Use of features like star-tree indexing, inverted indexes, etc.

Here’s a general guideline for hardware sizing across key Pinot components:

Real-time Ingestion Nodes (Server Nodes)

These handle consuming from Kafka (or another stream), indexing, and serving queries.

ParameterGuideline
CPU8–32 vCPUs per node (more cores for high ingest/query workloads)
Memory (RAM)32–128 GB (based on segment size in memory, indexes)
Disk (SSD recommended)1–2 TB NVMe SSD (ensure high IOPS; Pinot is I/O intensive)
Network≥10 Gbps (especially for high ingest rate and query throughput)
# of NodesStart with 3–5, scale based on data size and QPS

Offline Ingestion/Storage Nodes

Used for querying large volumes of historical data (HDFS/S3 segments loaded onto these nodes).

ParameterGuideline
CPU8–16 vCPUs
Memory32–64 GB
Disk (SSD preferred)1–4 TB (based on segment storage needs)
# of Nodes3–10+ (depends on data volume and retention)

Broker Nodes

These handle query routing and aggregation across servers.

ParameterGuideline
CPU4–8 vCPUs
Memory16–32 GB
# of Nodes2–4 (scale based on concurrency and latency targets)

Controller Nodes

Coordinate cluster metadata, segment assignment, and retention policies.

ParameterGuideline
CPU2–4 vCPUs
Memory8–16 GB
Disk100–200 GB
# of Nodes2 (HA via active-standby setup)

Example: Sizing for 10 TB of data with moderate real-time ingestion and ~100 QPS

Role#NodesCPURAMDisk (SDD)
Server516 vCPUs64 GB2 TB
Broker38 vCPUs32 GB500 GB
Controller24 vCPUs16 GB100 GB

Additional Notes

  • Pinot memory usage is influenced by segment loading and query execution.
  • Disk: Prefer SSDs for segment scan speed, especially for star-tree or sorted indexes.
  • Co-location of real-time and offline servers is possible but avoid it for production if latency is critical.
  • Use auto-scaling in Kubernetes or YARN environments based on CPU and memory.

For more details about the Pinot Sizing, see Capacity Planning in Apache Pinot - Part 1 | StarTree.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
  Last updated