Spark Thrift
Spark Thrift Dashboard
The Spark Thrift Dashboard provides an overview of Spark Thrift application service that enables JDBC and ODBC clients to execute Spark SQL queries.
To view the Spark Dashboard, click Spark Thrift > Dashboard. The dashboard consists of summary panels, a Sankey Diagram with various metrics, and charts that display information about jobs based on other criteria such as memory and core utilization.
The default time range is Last 24 hrs. To view statistics from a custom date range, click the icon and select a time frame and timezone of your choice.
Summary Panels
The following table provides details of the summary of jobs:
Metric | Description |
---|---|
Users | The total number of users. |
# of Applications | The total number of applications |
Avg. CPU Allocated | The average of CPU time across all jobs. |
Avg. Memory Allocated (MB) | The average amount of memory allocated across jobs in Megabyte. |
Charts in Spark Thrift
Context Metric Distributions
The Context Metric Distributions panel displays the summary of jobs as a Sankey diagram.
By default, the chart displays the distribution by Duration. You can choose to display the distribution by Input Data, Output Data, Shuffle Reads, or Shuffle Writes from the drop-down list.
Core Usage by Locality
The Core Usage by Locality chart displays the core usage by the following locality types. The chart also displays Core Used and Core Wasted values (in%).
- Process Local: The tasks in this locality are run within the same process as the source data.
- Node Local: The tasks in this locality are run on the same machine as the source data.
- Rack Local: The tasks in this locality are run in the same rack as the source data.
- Any: The tasks in this locality are run anywhere else but not on the same node or rack.
- No pref: The tasks in this locality have no locality preference.
- Idle: The tasks in this locality that are idle.
Zooming-in Core Usage
You can take a closer look at the core usage by zooming in to any timeline on the graph.
To zoom in, drag and drop the mouse pointer on the section or timeline you want to zoom in. The second graph shows a closer view of the section or timeline you selected.
Other Charts in Spark Thrift
The following charts are also displayed on the Business Intelligence Dashboard.
Chart Name | |
---|---|
VCore Usage | The number of physical virtual cores used by a queue in the cluster. |
Memory Usage | The amount of memory used by a queue in the cluster in a particular timeframe. |
Query Duration Distribution | The number of queries grouped by duration. |
Query Execution Count | The number of queries executed within a timeframe. |
Average Query Time | The average time taken to execute queries. This metric also displays the Total Execution Time. |
Top 20 Users (By Query) | The top 20 users that executed the highest number of queries. |
Top 20 Tables (By Query) | The top 20 tables that executed the highest number of queries. |
Storage Memory | The amount of storage memory used by the Spark Thrift application, including Used Memory and Total Memory. |
Spark SQL
The Spark SQL panel displays the list of thrift servers with their state, either Connected or Disconnected. You can filter the data to be displayed in the page by clicking on the thrift server you want to view.