Kudu Dashboard

The Kudu Dashboard shows metrics in the form of summary panels and charts covering Master and Tablet Server health, Block Cache and RPC performance, and RowSet activity and transaction summary to help you understand service status and operational activity.

Master and Tablet Server Health Overview

Summary Panel

This panel summarizes the health and status of Kudu’s Master and Tablet Servers, including server counts, data directory usage, and Raft leadership roles. It helps you quickly assess availability and storage conditions.

MetricDescription
Number of Master ServersShows the number of master servers currently registered and running in the cluster.
Number of Tablet ServersShows the number of tablet servers currently registered and running in the cluster.
Master Server

This panel shows the count and status of data directory paths on the master process.

  • Data Directories Full: Number of directories that have reached full capacity.
  • Total Data Directories Space Available: Combined available storage across all data directories.
  • Failed Data Directories: Number of directories that are no longer accessible or have encountered errors.
Tablet Server

This panel shows the count and status of data directory paths on the tablet process.

  • Data Directories Full: Number of directories with no remaining space.
  • Total Data Directories Space Available: Total available space across all tablet data directories.
  • Failed Data Directories: Number of tablet data directories that have failed or are inaccessible.
Number of Raft Leaders

Master: Shows the number of master replicas that are Raft leaders.

Tablet: Shows the number of tablet replicas that are Raft leaders.

Charts

The charts display CPU and memory usage for Kudu Master and Tablet Servers, helping you track resource consumption and identify potential bottlenecks on each node.

MetricDescription
CPU Master ServerShows the CPU time spent processing Kudu Master processes on each node.
CPU Tablet ServerShows the CPU time spent processing Kudu Tablet processes on each node.
Memory Master ServerShows the memory usage of master server processes on each node.
Memory Table ServerShows the memory usage of tablet server processes on each node.

Block Cache and RPC Performance Summary

Summary Panel

This panel displays the number of block cache insertions and lookup operations performed on the tablet server.

MetricDescription
Master Block Cache InsertsShows the number of blocks added to the master’s block cache.
Master Block Cache LookupsShows number of times the master block cache was accessed.
Tserver Block Cache InsertsShows the number of blocks added to the tablet server’s block cache.
Tserver Block Cache LookupsShows the number of times the tablet server block cache was accessed.

Charts

The charts show RPC connection rates and high-percentile latency metrics for Master and Tablet servers, helping you identify network delays and server responsiveness issues.

MetricDescription
RPC RatesShows the number of incoming TCP connections made to the RPC server.
RPC LatencyShows the 99.99th percentile RPC queue time (in microseconds) for Tablet and Master servers.

RowSet Activity and Transaction Summary

Summary Panel

This panel indicates the number of active rowsets (on-disk or in-memory segments) in each tablet, helping to understand data fragmentation and compaction needs.

MetricDescription
RowSet Memory SizeShows the amount of memory currently used by in-memory RowSets.
RowSet Disk SizeShows the total disk space occupied by RowSets persisted on disk.
Running Rowset CompactionsShows the number of RowSet compaction processes currently in progress.
Rows DeletedShows the number of rows marked for deletion across the dataset.

Charts

The charts display row-level operations and transaction activity, helping you monitor data changes, track slow scans, and assess overall write and read performance across nodes.

MetricDescription
Rows InsertedShows the number of row insert operations over time for each node.
Rows UpdatedShows the count of row updates happening across nodes during the selected time range.
Rows DeletedTracks how many rows were deleted on each node over time.
Total Write Transactions InflightIndicates the number of ongoing write transactions at a given time.
Total Transactions In-flightRepresents all inflight (active) transactions both read and write per node.
Slow ScansShows the number of scan operations that exceed the configured latency threshold.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard