Superset Pinot testing - Batch Data

Apache Pinot Batch Data Example With Superset

This page covers ingesting batch data into Pinot and querying it from Superset.

Assumption: Apache Pinot and Superset are already installed and running.

Set up: Define Host Variable

Bash
Copy

Step 1: Ingest Batch Data into Pinot

Choose an Example Dataset

Pinot includes several example datasets in examples/batch/:

DatasetDescription
airlineStatsFlight statistics data
baseballStatsBaseball player statistics
billingBilling records
starbucksStoresStarbucks store locations
githubEventsGitHub event data

Create Schema and Table

Bash
Copy

Verify:

Bash
Copy

If AddTable fails or tables list is empty, use curl instead:

Bash
Copy

Ingest Data

Update the default ingestion job file to use your hostname (creates .bak backup):

Bash
Copy

Run the ingestion:

Bash
Copy

To restore the original file: mv examples/batch/airlineStats/ingestionJobSpec.yaml.bak examples/batch/airlineStats/ingestionJobSpec.yaml

Verify Data

Bash
Copy

Results

Bash
Copy

Step 2: Connect Superset to Pinot

Add Database Connection

  1. In Superset, go to SettingsDatabase Connections
  2. Click + Database
  3. Select Apache Pinot (or "Other")
  4. Enter the SQLAlchemy URI:
Bash
Copy

Format:

Bash
Copy

Example:

Bash
Copy
  1. Click Test ConnectionConnect

Create Dataset

  1. Go to Datasets+ Dataset
  2. Select your Pinot database
  3. Select schema: default
  4. Select table: airlineStats
  5. Click Create Dataset and Create Chart

Step 3: Query in Superset

SQL Lab Queries

Go to SQL LabSQL Editor:

Row count:

Bash
Copy

Flights by carrier:

Bash
Copy

Delays by day of week:

Bash
Copy

Top routes:

Bash
Copy

Create Charts

To create a chart: Datasets → click on airlineStatsCreate Chart → select chart type.

Bar Chart - Flights by Carrier

FieldValueDescription
X-AxisCarrierAirline carrier codes (AA, UA, DL, etc.) shown on the horizontal axis
MetricsCOUNT(*)Number of flights (bar height)
Dimensions(leave empty)Not needed for a simple bar chart

Line Chart - Average Delay by Day of Week

FieldValueDescription
X-AxisDayOfWeekDay of week (1-7) shown on horizontal axis
MetricsAVG(ArrDelay)Average arrival delay in minutes (line value)
Dimensions(leave empty)Not needed for single line

Pie Chart - Flights by Origin Airport

FieldValueDescription
X-Axis(not used)Pie charts don't use X-axis
MetricsCOUNT(*)Number of flights (slice size)
DimensionsOriginAirport codes - each slice represents an origin airport
Row Limit10Show top 10 airports only

Don't put the same column in both X-Axis and Dimensions - this causes a "Duplicate column/metric labels" error.

Create DashBoards

Hit Dashboards on the top bar and create a new dashboard, import charts (which we just created ) as shown by drag and drop, give it a name on the top left, and save.

Optionally, you can publish a chart

  • appears in the Charts list
  • can be added to dashboards
  • can be shared with others
  • persists in Superset DB

You can download dashboards or individual charts by clicking the 3-dot menu on the top right.

Quick Reference

Connection String

Bash
Copy

Pinot SQL Notes

  • Supported: SELECT, GROUP BY, ORDER BY, LIMIT, aggregations
  • Not supported: JOIN, subqueries, window functions

Health Checks

Bash
Copy
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
  Last updated