Title
Create new category
Edit page index title
Edit category
Edit link
Superset Pinot testing - Batch Data
Apache Pinot Batch Data Example With Superset
This page covers ingesting batch data into Pinot and querying it from Superset.
Assumption: Apache Pinot and Superset are already installed and running.
Set up: Define Host Variable
xxxxxxxxxx# Set to current machine hostnameexport PINOT_HOST=$(hostname -f) # Verifyecho "Pinot Host: $PINOT_HOST"Step 1: Ingest Batch Data into Pinot
Choose an Example Dataset
Pinot includes several example datasets in examples/batch/:
| Dataset | Description |
|---|---|
| airlineStats | Flight statistics data |
| baseballStats | Baseball player statistics |
| billing | Billing records |
| starbucksStores | Starbucks store locations |
| githubEvents | GitHub event data |
Create Schema and Table
xxxxxxxxxxcd /path/to/pinot # Create schemabin/pinot-admin.sh AddSchema \ -schemaFile examples/batch/airlineStats/airlineStats_schema.json \ -controllerHost $PINOT_HOST \ -controllerPort 9000 \ -exec # Create tablebin/pinot-admin.sh AddTable \ -tableConfigFile examples/batch/airlineStats/airlineStats_offline_table_config.json \ -schemaFile examples/batch/airlineStats/airlineStats_schema.json \ -controllerHost $PINOT_HOST \ -controllerPort 9000 \ -execVerify:
xxxxxxxxxxcurl http://$PINOT_HOST:9000/schemas # Should list: airlineStatscurl http://$PINOT_HOST:9000/tables # Should list: airlineStats_OFFLINEIf AddTable fails or tables list is empty, use curl instead:
xxxxxxxxxxcurl -X POST "http://$PINOT_HOST:9000/tables" \ -H "Content-Type: application/json" \ -d @examples/batch/airlineStats/airlineStats_offline_table_config.jsonIngest Data
Update the default ingestion job file to use your hostname (creates .bak backup):
xxxxxxxxxxsed -i.bak "s/localhost/$PINOT_HOST/g" examples/batch/airlineStats/ingestionJobSpec.yamlRun the ingestion:
xxxxxxxxxxbin/pinot-admin.sh LaunchDataIngestionJob \ -jobSpecFile examples/batch/airlineStats/ingestionJobSpec.yamlTo restore the original file: mv examples/batch/airlineStats/ingestionJobSpec.yaml.bak examples/batch/airlineStats/ingestionJobSpec.yaml
Verify Data
xxxxxxxxxxbin/pinot-admin.sh PostQuery \ -brokerHost $PINOT_HOST \ -brokerPort 8099 \ -query "SELECT COUNT(*) FROM airlineStats"Results
[root@newsuper-2 pinot]# bin/pinot-admin.sh PostQuery \> -brokerHost $PINOT_HOST \> -brokerPort 8099 \> -query "SELECT COUNT(*) FROM airlineStats"2026/04/02 06:21:17.572 INFO [PostQueryCommand] [main] Executing command: PostQuery -brokerProtocol http -brokerHost newsuper-2.newsuper.harshith.svc.cluster.local -brokerPort 8099 -query SELECT COUNT(*) FROM airlineStats2026/04/02 06:21:17.751 INFO [PostQueryCommand] [main] Result: {"resultTable":{"dataSchema":{"columnNames":["count(*)"],"columnDataTypes":["LONG"]},"rows":[[9746]]},"numRowsResultSet":1,"partialResult":false,"exceptions":[],"numGroupsLimitReached":false,"numGroupsWarningLimitReached":false,"timeUsedMs":131,"requestId":"299917538000000004","clientRequestId":null,"brokerId":"Broker_newsuper-2.newsuper.harshith.svc.cluster.local_8099","numDocsScanned":31,"totalDocs":9746,"numEntriesScannedInFilter":0,"numEntriesScannedPostFilter":31,"numServersQueried":1,"numServersResponded":1,"numSegmentsQueried":31,"numSegmentsProcessed":31,"numSegmentsMatched":31,"numConsumingSegmentsQueried":0,"numConsumingSegmentsProcessed":0,"numConsumingSegmentsMatched":0,"minConsumingFreshnessTimeMs":0,"numSegmentsPrunedByBroker":0,"numSegmentsPrunedByServer":0,"numSegmentsPrunedInvalid":0,"numSegmentsPrunedByLimit":0,"numSegmentsPrunedByValue":0,"brokerReduceTimeMs":6,"offlineThreadCpuTimeNs":0,"realtimeThreadCpuTimeNs":0,"offlineSystemActivitiesCpuTimeNs":0,"realtimeSystemActivitiesCpuTimeNs":0,"offlineResponseSerializationCpuTimeNs":0,"realtimeResponseSerializationCpuTimeNs":0,"offlineTotalCpuTimeNs":0,"realtimeTotalCpuTimeNs":0,"explainPlanNumEmptyFilterSegments":0,"explainPlanNumMatchAllFilterSegments":0,"traceInfo":{},"tablesQueried":["airlineStats"],"offlineThreadMemAllocatedBytes":0,"realtimeThreadMemAllocatedBytes":0,"offlineResponseSerMemAllocatedBytes":0,"realtimeResponseSerMemAllocatedBytes":0,"offlineTotalMemAllocatedBytes":0,"realtimeTotalMemAllocatedBytes":0,"pools":[-1],"rlsFiltersApplied":false,"groupsTrimmed":false}[root@newsuper-2 pinot]#Step 2: Connect Superset to Pinot
Add Database Connection
- In Superset, go to Settings → Database Connections
- Click + Database
- Select Apache Pinot (or "Other")
- Enter the SQLAlchemy URI:
xxxxxxxxxx# Generate connection stringecho "pinot://${PINOT_HOST}:8099/query/sql?controller=http%3A%2F%2F${PINOT_HOST}%3A9000"Format:
xxxxxxxxxxpinot://<broker>:8099/query/sql?controller=http%3A%2F%2F<controller>%3A9000Example:
pinot://newsuper-2.newsuper.harshith.svc.cluster.local:8099/query/sql?controller=http%3A%2F%2Fnewsuper-2.newsuper.harshith.svc.cluster.local%3A9000
- Click Test Connection → Connect
Create Dataset
- Go to Datasets → + Dataset
- Select your Pinot database
- Select schema:
default - Select table:
airlineStats - Click Create Dataset and Create Chart
Step 3: Query in Superset
SQL Lab Queries
Go to SQL Lab → SQL Editor:
Row count:
xxxxxxxxxxSELECT COUNT(*) as total_flights FROM airlineStats
Flights by carrier:
xxxxxxxxxxSELECT Carrier, COUNT(*) as flight_count, AVG(ArrDelay) as avg_arrival_delayFROM airlineStatsGROUP BY CarrierORDER BY flight_count DESCLIMIT 20
Delays by day of week:
xxxxxxxxxxSELECT DayOfWeek, COUNT(*) as flights, AVG(ArrDelay) as avg_delayFROM airlineStatsGROUP BY DayOfWeekORDER BY DayOfWeek
Top routes:
xxxxxxxxxxSELECT Origin, Dest, COUNT(*) as flight_countFROM airlineStatsGROUP BY Origin, DestORDER BY flight_count DESCLIMIT 10
Create Charts
To create a chart: Datasets → click on airlineStats → Create Chart → select chart type.
Bar Chart - Flights by Carrier
| Field | Value | Description |
|---|---|---|
| X-Axis | Carrier | Airline carrier codes (AA, UA, DL, etc.) shown on the horizontal axis |
| Metrics | COUNT(*) | Number of flights (bar height) |
| Dimensions | (leave empty) | Not needed for a simple bar chart |

Line Chart - Average Delay by Day of Week
| Field | Value | Description |
|---|---|---|
| X-Axis | DayOfWeek | Day of week (1-7) shown on horizontal axis |
| Metrics | AVG(ArrDelay) | Average arrival delay in minutes (line value) |
| Dimensions | (leave empty) | Not needed for single line |

Pie Chart - Flights by Origin Airport
| Field | Value | Description |
|---|---|---|
| X-Axis | (not used) | Pie charts don't use X-axis |
| Metrics | COUNT(*) | Number of flights (slice size) |
| Dimensions | Origin | Airport codes - each slice represents an origin airport |
| Row Limit | 10 | Show top 10 airports only |

Don't put the same column in both X-Axis and Dimensions - this causes a "Duplicate column/metric labels" error.
Create DashBoards
Hit Dashboards on the top bar and create a new dashboard, import charts (which we just created ) as shown by drag and drop, give it a name on the top left, and save.

Optionally, you can publish a chart
- appears in the Charts list
- can be added to dashboards
- can be shared with others
- persists in Superset DB


You can download dashboards or individual charts by clicking the 3-dot menu on the top right.
Quick Reference
Connection String
xxxxxxxxxxexport PINOT_HOST=$(hostname -f)echo "pinot://${PINOT_HOST}:8099/query/sql?controller=http%3A%2F%2F${PINOT_HOST}%3A9000"Pinot SQL Notes
- Supported:
SELECT,GROUP BY,ORDER BY,LIMIT, aggregations - Not supported:
JOIN, subqueries, window functions
Health Checks
xxxxxxxxxxcurl http://$PINOT_HOST:9000/health # Controllercurl http://$PINOT_HOST:8099/health # Broker