Databricks Compute
The following tabs are present in Databricks Compute:

Filters
The Data Source Filter allows you to switch the Databricks data source. This enables you to view and analyze data across various sections based on the selected Databricks account or project, providing flexibility for monitoring and managing information across different data sources.

Overview
The Overview page in Databricks Compute provides a comprehensive and detailed view of your Databricks environment. It displays key information as widgets and graphs, offering insights into cluster performance, resource utilization, and potential issues across your Databricks clusters.

Overview Tab
This section helps you monitor critical metrics such as cluster states, resource consumption, and errors. By leveraging the interactive visualizations and adjustable filters, you can drill down into specific aspects of your environment, allowing you to make data-driven decisions to optimize performance and manage costs effectively.
Widgets | Description |
---|---|
Cluster States | Displays the number of clusters in different states (Pending, Running, Resizing, etc.). This widget provides an at-a-glance view of your cluster operations, helping you monitor the current status and identify any clusters that may require attention. Pending: The number of clusters in pending state during the time period selected in the Global Calendar. Running: The number of clusters in running state during the time period selected in the Global Calendar. Restarting: The number of clusters which are restarting during the time period selected in the Global Calendar. Resizing: The number of clusters which are being resized during the time period selected in the Global Calendar. Terminating: The number of clusters which are getting terminated during the time period selected in the Global Calendar. Terminated: The number of clusters which were terminated during the time period selected in the Global Calendar. |
Databricks Users and Applications | Lists the most common errors encountered by your clusters. This is crucial for quickly identifying and resolving issues that could impact cluster performance. Each error entry includes a count, making it easier to prioritize troubleshooting efforts. This section displays two widgets described below. Users: The number of users using clusters during the time period selected in the Global Calendar. Applications: The number of applications used during the time period selected in the Global Calendar. |
Average Core Usage Summary | Total Cores: The total number of available cores. Allocated Cores: The total number of cores allocated out of the total available cores. Used Cores: The total number of cores used out of the total number of allocated cores. |
Average Memory Utilization Summary | This section displays three widgets which highlight usage of CPU memory. The values of the widgets are dependent on the filters selected in Global Calendar. Total Memory: The total amount of memory available. Allocated Memory: The amount of memory allocated out of the total available memory. Used Memory: The total amount of memory used out of the total amount of allocated memory. |
Databricks Top 10 Users | This bar chart displays the list of top 10 users who are provisioning clusters on Databricks. Each bar represents a user. When you hover over a bar, you can view the number of clusters provisioned by that user. The x-axis represents the user's Email IDs and the y-axis represents the number of clusters provisioned. |
Cluster Count by Instance Type | This chart shows the distribution of clusters based on different instance types. Each bar represents an instance type, and its height indicates the number of clusters using that type. This helps visualize the usage patterns of various instance types within the system. |
Active Clusters Over Time | This bar graph represents the number of active clusters, during the time period, selected in the Global Calendar. The x-axis represents a date and time (values change as per the date and time selected in the Global Calendar). The y-axis represents the active clusters. Each bar represents a date and time. When you hover over a bar, you can view the number of active clusters on the selected date and time. |
Cluster Failure Over Time | Visualizes the number of cluster failures recorded over time. This graph is essential for detecting patterns or spikes in cluster failures, which may indicate underlying issues in the environment or specific workloads that require optimization. The x axis represents a date and time (values change as per the date and time selected in the Global Calendar). The y axis represents the cluster failures. Each bar represents a date and time. There is an error code associated with cluster failures. When you hover over a bar, you can view the number of failed clusters on the selected date and time and also the error code for each failure. You can also filter the data of his graph to view data specific to error codes. |
Top Cluster Errors | A cluster error is considered to be top error if is occurrence frequency is highest as compared to other errors. This table represents the top errors whose occurrence is higher as compared to other errors. This table has two columns. The first column displays the number of times an error occurred and the second column displays the error message associated with the column. |
DBU Consumed | Displays the Databricks Units (DBUs) consumed over time. Tracking DBU consumption is vital for understanding your Databricks usage and associated costs. This widget helps you monitor usage trends and identify opportunities to optimize resource allocation. The x axis represents a date and time (values change as per the date and time selected in the Global Calendar). The y axis represents the number of DBUs consumed. Each data point represents a date and time. When you hover over a data point, you can view the number of DBUs consumed on that date and time. |
Average CPU Usage | This trend graph represents the amount of CPU used by node of all cluster types for the selected time period. The x-axis displays the date and time (values change as per the date and time selected in the Global Calendar). The y-axis displays the amount of CPU used by the executor node or driver node. |
Average Memory Used | This trend graph represents the amount of CPU memory used by node of all cluster types for the selected time period. The x-axis displays the date and time (values change as per the date and time selected in the Global Calendar). The y-axis displays the amount of CPU memory used by the executor node or driver node. |
Average Core Usage | This trend graph displays the amount of CPU core used during the time period selected in the Global Calendar. The x axis displays the date and time (values change as per the date and time selected in the Global Calendar). The y axis displays the number of cores used. Each trend line represent a memory type; available cores, allocated cores, and used cores. Each data point represents a date and time. When you hover over a data point, you can view the total number of available cores, allocated cores, and used cores on that date and time. |
Average Memory Utilization | This trend graph displays the amount of CPU memory used during the time period selected in the Global Calendar. The x axis displays the date and time (values change as per the date and time selected in the Global Calendar). The y axis displays the amount of memory used. Each trend line represent a memory type; available memory, allocated memory, and used memory. Each data point represents a date and time. When you hover over a data point, you can view the total amount of memory available, allocated, and used on that date and time. |
Core Wastage Over Time | This trend graph represents the amount of wasted or unused CPU cores over a specific time period in a cluster or system. The x-axis represents the time period, and the y-axis represents the core wastage usually as the absolute number of unused cores. This graph is particularly useful for identifying periods of low workload or idle times when CPU cores are not fully utilized. It helps in assessing the efficiency of resource allocation and workload scheduling, allowing you to optimize resource utilization and minimize wastage. |
- Identifying Bottlenecks and Optimizing Performance: Use the Cluster States and Cluster Failures Over Time widgets to quickly identify any clusters that are not performing as expected. This information can guide you in troubleshooting and optimizing those clusters for better performance.
- Cost Management: Leverage the DBU Consumed widget to monitor your usage costs closely. By analyzing trends in DBU consumption, you can make informed decisions on scaling resources up or down to manage costs effectively.
- Error Resolution: The Top Cluster Errors widget allows you to quickly pinpoint the most frequent issues affecting your clusters. Resolving these errors promptly can prevent potential downtime and maintain the stability of your Databricks environment.
Enhanced Filter on Search
The Enhanced Filter on Search function offers users significant filtering options, but it also contains various additional features to optimize search functionality on the Compute page. Users can use several filter conditions to efficiently refine search results, and the UI has been decluttered and optimized for usability.
- Expanded Filterable Columns: The search dropdown now includes a broader variety of columns, allowing users to filter results based on more precise criteria such cluster status, source, duration, and user.
- Decluttering Mechanism: Columns that are already visible can be hidden from the filter dropdown, keeping the list tidy and manageable.
- Contextual Filters: The system offers column-specific alternatives that adapt to the user's current view, resulting in more intuitive filter selections and a smoother navigation experience.
- Preservation of Filters Across Navigations: Filters are now preserved across navigations, reducing the need to continually apply the same filters.
- Primary Focus on Equality Operator: The = operator is now the primary focus of the filtering interface, which helps to expedite interactions and simplify data retrieval.

Clusters

Clusters Tab
Widget | Descriptions |
---|---|
Cluster Name | The name of the cluster. This column is frozen. You can view it even when you scroll right. Clicking the cluster name redirects you to the job studio page. |
Cluster ID | A system-generated identifier unique to each cluster instance, used for backend tracking and reference. Clicking the Cluster ID redirects you to the past runs associated with that cluster. |
Status | The current state of the cluster, such as Running , Terminated , Pending , or Resizing . Indicates the cluster’s operational status. |
Duration | The total amount of time the cluster has been active, measured from the start time to the end time or current time if still active. |
Total DBU Consumed | Displays the total Databricks Unit (DBU) consumption, representing the compute resources consumed by the cluster. |
Actual Databricks Cost | Total cost incurred from using Databricks services for the workload. |
Actual Cloud Total Cost | Combined cost of all cloud resources consumed during the workload. |
Actual Cloud VM Cost | Cost specifically attributed to virtual machine usage in the cloud environment. |
Recommended Cloud VM Cost | Estimated cost if a more optimal virtual machine configuration were used. |
Recommended Instance Type | Suggested VM instance type that could improve cost-efficiency or performance. |
Start Time | The exact time when the cluster was initiated, helping track when the job or task associated with the cluster began. |
End Time | The time when the cluster terminated, either due to the job completing or a manual termination. If still running, this field is empty. |
Cluster Source | The source that initiated the cluster, such as Job , API , UI , or Pipeline . This helps track how the cluster was created. |
User | The email ID of the user who initiated or is running the cluster, identifying who is responsible for the cluster’s activities. |
Termination Type | The method or reason for the cluster’s termination, such as Success , Client Error , or User Request . |
Termination Code | Provides further details about why the cluster was terminated, such as Job Finished , User Request , or specific error codes. |
Diagnostic Reason | Detailed diagnostic information about the termination or errors encountered during the cluster’s lifecycle. |
Spark Version | Indicates the specific version of Apache Spark running on the cluster, ensuring compatibility with different jobs and tasks. |
Worker Node Type | Specifies the type of worker nodes used in the cluster, which determine the resources allocated for executing tasks. |
Driver Node Type | The type of driver node used in the cluster, which manages job execution and coordinates the tasks running on worker nodes. |
Cluster Details
To proceed to the details page of a particular cluster, click on the cluster name.
On the cluster Details page, you can view the following information: Past Runs chart and Past Job Runs Details table.
The Past Runs chart presents a bar graph that visualizes the count of DBUs and their associated costs on the y-axis, while the x-axis denotes the corresponding date and time when the job consumed a specific number of DBUs.

Column Name | Descriptions |
---|---|
Creation Time | Date and time at which the cluster was created. |
State | The current state of the cluster or job. |
DBU Consumed | Amount of Databricks units consumed. |
Start Time | Time at which the job execution began. |
Termination Time | Time at which the job execution was complete. |
Executor Config | The settings and specifications that determine how the cluster's executors are configured. |
Number of Workers | The total count of worker nodes allocated across clusters for processing tasks. |
Min Workers | The minimum number of worker nodes used for the job to run. |
Max Workers | The maximum number of worker nodes used for the job to run. |
Executor Memory | The memory capacity allocated to your Databricks cluster. |
Duration | The time taken for execution of the job run. |
Balanced Recommendation | Displays recommendations for balanced performances. |
Cost Recommendation | Displays recommendations for cost objectives. |
Runtime Recommendation | Displays runtime recommendation. |
State Message | Displays the message on cluster state. |
Username | Displays the name of the user. |
Job Studio
The Job Studio page provides a comprehensive overview of all Databricks jobs, offering a detailed interface that enables users to track, monitor, and manage various jobs within their system. The intuitive layout presents job-related data in a tabular format, supported by filter options for enhanced navigation and drill-down capabilities.
All cost-related data on ADOC is displayed in US Dollars (USD) as the standard unit of measurement. Note that currency conversion is currently not supported.
For example: If your Azure account shows costs in (e.g., ₹110, or £110), ADOC will display the numerical value in USD (e.g., $110). This applies to all cost charts, including both actual and estimated costs, which are exclusively shown in USD.

At the top of the Job Studio page, a graphical representation displays job counts over time, segmented by the following job statuses:
- Canceled: Jobs that were intentionally halted before completion.
- Failed: Jobs that encountered errors and could not complete.
- Success: Jobs that successfully ran to completion without issues.
The central table displays all jobs that meet the filtering criteria, offering an in-depth view of their details.
Field | Description |
---|---|
Job Name | Displays the name of the job, often used to describe its function or purpose. |
Cluster ID | A unique system-generated identifier for the cluster running the job, helping identify and track the cluster. |
Job Status | The current status of the job, which could be Success , Failed , or Canceled . |
Actual Databricks Cost | The actual cost incurred by running the job, calculated based on the resources used. |
Estimate Databricks Cost | The estimated cost for running the job, specific to Databricks resources used. |
Estimate Vendor Cost | Displays any additional costs related to third-party vendor resources used during the job. |
Total Job Cost | The total cost associated with the job, combining Databricks and vendor estimates. |
Start Time | Indicates the exact time when the job began execution. |
End Time | Displays the time the job finished or was terminated. |
Duration | The total time the job ran, calculated from start to end time. |
Vendor Storage Cost | The storage cost charged by external vendors for storing data related to the job. |
Vendor Virtual Machines Cost | The total expense incurred for using virtual machines provided by the vendor. |
Vendor Virtual Network Cost | Costs incurred for using a virtual network provided by external vendors during the job’s execution. |
Vendor Bandwidth Cost | Bandwidth costs charged by external vendors related to data transfers during job execution. |
Run Page URL | Provides a direct link to the job’s run page for more detailed information and access to logs, metrics, and performance data. |
Cluster State | The state of the cluster used to run the job, such as Running , Terminated , or Pending . |
Creator User | The user who created the job, shown as their registered username or email. |
Trigger | Indicates how the job was initiated, such as PERIODIC (scheduled job) or ONE-TIME (manual trigger). |
Runtime Engine | Specifies the type of engine running the job, typically Photon or Standard . |
Job ID | The unique identifier for the job instance. |
Run ID | A system-generated ID that tracks each specific run of the job. |
Other Notable Features:
Feature | Description |
---|---|
Top 20 Jobs | The Job Studio page offers pre-configured views such as Top 20 Expensive Jobs and Long Running Jobs, allowing users to quickly identify resource-intensive tasks. |
Download Functionality | Users can export the job data in multiple formats using the Download option, enabling further analysis and reporting outside of the ADOC platform. Use the download function to extract the displayed data for further review, analysis, or sharing with team members. |
Filter Section | Users can apply multiple filters to narrow down the list of jobs based on specific criteria such as status, creator, or runtime engine. These filters can be combined for more granular searches. |
Job Review | Once the desired jobs are filtered, the user can explore detailed information such as cost breakdowns, job triggers, and run times. For deeper analysis, users can access the Run Page URL. |
Graph Analysis | The top graph visually represents job performance over time, allowing users to quickly understand trends and investigate any potential performance bottlenecks or issues. |
Job Details Page
The Job Details page provides users with an in-depth view of their Databricks job's performance and operational metrics. This page offers key insights into both driver and executor performance, trends over time, resource usage, and potential areas for optimization.
Here is a breakdown of the features and data sections visible on the page:

Summary
This section provides a high-level overview of the costs associated with running the job on the selected Databricks cluster. The data presented includes the following key details:

Widget | Descriptions |
---|---|
Actual Databricks Cost | This displays the total cost incurred from using Databricks resources for this specific job. In this example, the cost is $24.10. This cost is calculated based on the consumption of Databricks Units (DBUs) and other Databricks platform resources used during the job's execution. |
Actual Vendor Cost | This reflects any additional costs that come from using external or third-party vendor resources in conjunction with the job. In this case, the vendor cost is $23.54. Vendor costs can include things like cloud storage or virtual networks provided by external vendors. |
Total Cost | The total of both the Databricks and vendor costs, which provides a comprehensive view of the total cost for running the job. In this example, the total cost is $47.64. |
Cluster ID | This is the unique identifier for the specific cluster on which the job was executed. The Cluster ID is important for tracking and analyzing job performance across different clusters. |
Vendor Cost Breakdown
This section provides a detailed breakdown of costs incurred from using third-party vendor resources in conjunction with a data processing job. These costs are in addition to platform-specific costs (such as Databricks) and typically cover external infrastructure services used during the job’s execution. The data includes the following key cost components:
Widget | Description |
---|---|
Virtual Machines Cost | This indicates the cost associated with running virtual machines provided by an external cloud vendor. These machines may be used for compute tasks, supporting services, or extensions outside the primary processing environment. |
Storage Cost | This refers to charges for storing data externally, such as intermediate files, logs, or outputs. Storage cost can vary based on data size, storage type (standard or premium), and duration. |
Virtual Network Cost | This captures the cost of using virtual network infrastructure, such as private IPs, VPC peering, or internal communication between services. These costs apply when network traffic routes through vendor-managed infrastructure. |
Bandwidth Cost | This reflects the cost of data transferred between systems or across network boundaries, especially when large volumes of data move between the processing environment and external systems or storage. |
New Jobs: Job run details, including execution time, and resource consumption metrics, will be available only if the user has enabled the Databricks initialization(init) script. This ensures that the necessary monitoring and metrics collection tools are in place before jobs are executed.
Historical Data: For jobs that were executed prior to onboarding the data source or enabling the initialization script, detailed job run metrics and resource utilization data will not be available. Only jobs executed after the onboarding process or script enablement will show detailed metrics in the Job Details Page.
ADOC Recommendation: Ensure that the initialization script is configured at the time of data source onboarding to capture detailed job metrics for future analysis.
Node Size Recommendations
Node Size Recommendations in Databricks Compute show how to set Spark executor nodes depending on cost, runtime performance, and workload characteristics. These recommendations assists users in optimizing resource allocation and improve the job execution efficiency.
Where do Node Size Recommendations Apply?
Static Clusters: Recommendations are offered for jobs running on static clusters where the number of workers is predefined and fixed throughout the execution. The system offers suggestions for:
- Optimal number of cores
- Memory per executor
- Number of workers
Auto-Scale Clusters: For clusters with autoscaling enabled, Databricks automatically scales up or down based on resource needs. In this case, node size recommendations offer:
- Minimum and maximum worker configurations
- Estimated job completion time for various configurations
- Cost estimation for each worker configuration
What are the key metrics that drive the Node Recommendations?
Node size recommendations rely on Spark Job Performance metrics such as:
Metric | Description |
---|---|
CPU Utilization | High CPU usage indicates that more cores per executor are required, while low CPU usage recommends fewer cores. |
Memory Utilization | If memory usage is high, the system suggests increasing the memory per executor. Conversely, low memory usage suggests reducing the allocated memory to avoid resource wastage. |
Shuffle Operations | The recommendations also take into account the shuffle fetch wait time and shuffle remote bytes read, which affect the need for additional executors. |
It is important to understand the conditions under which node size recommendations are not available:
Single Node Clusters: No recommendations are made for clusters with a single node.
Jobs Without Spark Stages: If a Databricks job does not contain Spark stages, such as non-Spark jobs, no recommendations will be made.
Failed or Cancelled Jobs: Recommendations are unavailable for failed or cancelled jobs, as the Spark context required for analysis is not accessible.
All-Purpose Clusters: Jobs operating on all-purpose clusters do not receive node size recommendations, as these clusters dynamically auto-scales, making static recommendations less useful.
Driver and Executer Summary
This section provides detailed information about the driver used during the job execution. The driver is responsible for managing and orchestrating tasks across executors in a distributed computing environment like Databricks.

Widget | Descriptions |
---|---|
Name | Displays the unique identifier for the driver instance. This name helps track and reference the specific driver used for the job. The driver instance is usually associated with the cluster where the job was executed. |
User | Shows the user account that initiated or controlled the driver. In this case, the user is root, indicating that the job was run with administrative or elevated permissions. |
Duration | Indicates how long the driver was active during the job's execution. In this example, the driver was active for 14.10 minutes. This metric is crucial for understanding the time taken by the driver to manage the job's execution and resource distribution. |
Max Heap Used | Displays the maximum amount of heap memory consumed by the driver during the job. In this case, the driver used up to 5.32 GB of heap memory. Heap memory is critical for the driver's performance, as it is used for object creation, caching, and other memory-intensive tasks. |
Instance Type | Shows the type of virtual machine or hardware configuration used for the driver instance. Here, the instance type specifies the resources (like CPU and memory) allocated to the driver. The instance type impacts the overall performance and efficiency of the driver. |

Widget | Descriptions |
---|---|
Cores | Displays the number of CPU cores allocated to the driver. In this case, the driver is using 4 cores. More cores typically allow for better multitasking and parallel processing. |
Memory Available | Shows the total memory allocated to the driver. Here, the driver has 8.62 GB of memory available. This is crucial for handling data processing and managing job tasks effectively. |
Jobs | This metric indicates the number of jobs processed by the executors. In this case, the executors processed 6,244 jobs. Executors are responsible for running the actual tasks associated with the job. |
Stages | Shows the number of stages executed by the job. This job completed 6,244 stages, which represent different phases of job execution, such as shuffling, sorting, or aggregating data. |
Max used memory | Displays the peak memory usage by the executors during job execution. In this case, the maximum memory used was 743.184 MB. Monitoring this helps ensure that executors are not running out of memory during execution, which could cause performance degradation or task failures. |
Instance type | Indicates the type of instance used for the executors, which in this case is Standard_DS3_v2. This type specifies the compute and memory configuration, impacting the executor’s ability to handle tasks |
Cores per instance | This shows the number of CPU cores available for each executor instance. In this example, each executor is allocated 4 cores, allowing for concurrent processing of tasks. |
Memory available | Reflects the total memory available per executor instance, which is 8.67 GB in this case. Sufficient memory is essential for efficient data processing and task execution. |
Total instances | Indicates the total number of executor instances used in the job. Here, there is 1 executor instance. Increasing the number of instances can improve job performance by parallelizing tasks across more resources. |
Executor Node Recommendation
The Executor Node Recommendation section provides guidance on the optimal configuration of executor nodes based on different optimization criteria such as cost, runtime, or a balanced approach. The section also offers recommendations for both Auto-Scale and Static Cluster configurations. These recommendations help users optimize job performance while managing resource usage and cost.

Optimization Types
Node size recommendations in the Executor Node Recommendation widget are provided for different optimization strategies:
- Cost-Optimized Recommendation: Aims to reduce resource costs while maintaining acceptable performance.
- Runtime-Optimized Recommendation: Focuses on minimizing job execution time, possibly at the expense of higher costs.
- Balanced Recommendation: Strikes a balance between cost efficiency and performance.
The section contains the following widgets and sections:
Widget | Section | Descriptions |
---|---|---|
Recommendations for Auto-Scale Cluster Configuration | Recommendation | The optimization goal (e.g., Cost Optimized, Runtime Optimized, or Balanced Optimized). |
Instance Type | The type of virtual machine instance recommended for the job | |
Estimated Time | The expected time to complete the job with the recommended instance configuration. In this example, the estimated time is approximately 12.79 minutes for all recommendations. | |
Min Worker Count | The minimum number of workers allocated when the job starts. For the Cost Optimized recommendation, the minimum worker count is 1, while for Runtime Optimized and Balanced Optimized, it's 2. | |
Max Worker Count | The maximum number of workers that can be dynamically added as the job's demand grows. In the Cost Optimized configuration, the maximum worker count is 5, while for the other configurations, it is 3. | |
Vendor Cost | The estimated cost incurred by the vendor (e.g., cloud service provider) for running the job. In the Cost Optimized recommendation, the cost is $0.07, whereas for Runtime Optimized and Balanced Optimized, the cost is $0.14. | |
Recommendations for Static Cluster Configuration | Recommendation | The optimization goal for the static cluster configuration (e.g., Cost Optimized, Runtime Optimized, or Balanced Optimized). |
Instance Type | The recommended virtual machine instance type (e.g., Standard_DS3_v2). | |
Estimated Time | The estimated time to complete the job with the given instance configuration, which is 12.79 minutes for all configurations. | |
Worker Count | The fixed number of workers allocated for the entire duration of the job. For all optimization types, the recommended worker count is 2. | |
Vendor Cost | The estimated cost for the static cluster configuration, which remains constant at $0.14 for all optimization types. | |
Recommendation Based on Different Instance Types with Auto-Scale | Instance Type | Lists the different instance types that can be used for the job |
Estimated Time | The expected time to complete the job with the specified instance type. In this case, the estimated time for all configurations is around 12.79 minutes. | |
Min Worker Count | The minimum number of worker nodes assigned at the beginning of the job. Some instance types recommend starting with 1 worker, while others recommend 2. | |
Max Worker Count | The maximum number of workers that can be dynamically added to scale the job. For certain instances, such as Standard_DS3_v2, the maximum worker count is 3, while for others, it's 5. | |
Vendor Cost | The estimated vendor cost associated with using each instance type. For the Standard_DS3_v2 instance, the cost is around $0.14, while for Standard_D3_v2, the cost is $0.07. |
Trends
The Trends section visualizes key metrics over time for the job run.

- Executor Memory: Shows the amount of memory used by the executors.
- Executor Cores: Displays the number of CPU cores utilized by the executors.
- Input Bytes Read: Reflects the amount of input data read by the job.
These metrics help monitor resource consumption and job performance. Users can also use the Compare Runs option to analyze and compare these trends across different job runs for deeper insights.
Limits
The Limits section provides insights into the scalability constraints of a Spark application by analyzing three key metrics:

Wall Clock Time:
Driver Wall Clock Time: Measures the time spent by the driver in coordinating the job execution.
Executor Wall Clock Time: Shows the total time spent by executors in processing tasks.
Total Wall Clock Time: The combined time for the driver and executors, reflecting the overall job duration.
Ideal Times:
Critical Path: Represents the minimum time required for the job to complete under ideal conditions.
Ideal Application Time: The estimated optimal runtime for the application based on resource availability.
Actual Runtime: The real execution time taken by the application
OOCH (One Core Compute Hour)
- Displays the available and wasted compute hours for the job.
- OCCH wasted by Executor and Driver highlight inefficiencies in resource usage.
Metrics
The Metrics section provides detailed insights into the performance of the Spark executors. It consolidates various performance metrics to help users analyze resource usage and job behavior.

Metrics | Description |
---|---|
Storage Memory | Shows how much memory is allocated, used, and available for both on-heap and off-heap memory, helping monitor the memory footprint of the job. |
Schedule Information | Tracks active tasks and thread pool size over time, providing insights into task execution concurrency and thread utilization. |
Bytes Read/Written | Displays the total amount of data read and written by the job, giving a clear view of input/output performance. |
File System Bytes Read/Written | Highlights the bytes read and written directly from and to the filesystem, helping identify heavy I/O operations. |
Shuffle Information | Provides details on shuffle operations, including bytes written and bytes read from both local and remote sources. Shuffle operations are critical for job performance, especially in distributed data processing. |
Spark JVM GC and CPU Time | Visualizes the time spent on JVM Garbage Collection (GC) and CPU time, which are key indicators of system performance. High GC times may suggest inefficient memory usage, while CPU time reflects the computational load. |
Records Read/Written | Displays the number of records read and written during job execution, helping measure data throughput. |
Spark Details Aggregate Metrics
Metrics | Description |
---|---|
Task Duration (milliseconds) | Total time spent by the task starting from its creation |
JVM GC Time (milliseconds) | Amount of time spent in GC while this task was in progress |
Executor CPU Time (nanoseconds) | CPU time the executor spent running this task. This includes time fetching shuffle data |
Executor Deserialize CPU Time (nanoseconds) | CPU time taken on the executor to deserialize this task. |
Executor Deserialize Time (milliseconds) | Elapsed time spent to deserialize this task |
Executor Runtime (milliseconds) | Total time spent by executor core running this task |
Peak Execution Memory (bytes) | Maximum execution memory used by a task |
Input Bytes Read (bytes) | Number of bytes read by a task(using read API’s) |
Output Bytes Written (bytes) | Number of bytes written by a task(using write API’s) |
Disk Bytes Spilled (bytes) | Size of spilled bytes on disk(can be different if compressed) |
Memory Bytes Spilled (bytes) | Number of bytes that were spilled to disk during the task |
Result Size (bytes) | The number of bytes sent by the task back to driver |
Result Serialization Time (milliseconds) | Elapsed time spent serializing the task result. The value is expressed in milliseconds |
Shuffle Read Bytes Read (bytes) | Total bytes read by a task for shuffle data |
Shuffle Read Fetch Wait Time (milliseconds) | Time spent by the task waiting for shuffle data |
Shuffle Read Local Blocks (number) | Shuffle blocks fetched from local machine(disk access) |
Shuffle Read Records Read (number) | Total records read by a task for shuffle data |
Shuffle Read Remote Blocks (number) | Shuffle blocks fetched from remote machine(network access) |
Shuffle Write Bytes Written (bytes) | Total shuffle bytes written by a task |
Shuffle Write Records Written (number) | Total shuffle records written by a task |
Shuffle Write Time (nanoseconds) | Amount of time spent in a task writing shuffle data |
Spark SQL Executions
This table provides details about each Spark SQL execution within the job.

Field | Description |
---|---|
Execution ID | A unique identifier for each SQL execution, used to track and differentiate executions. |
Description | A brief description of the SQL query or operation being executed, which provides insight into the query or code being run (e.g., the specific pyspark.sql.functions being used). |
Start Time | The exact timestamp when the SQL execution started, helping track when the query began processing. |
End Time | The timestamp when the SQL execution completed, giving the total execution duration for that query. |
Duration | The total time taken for the execution to complete, displayed in milliseconds or seconds depending on the length of the query execution. |
State | The current status of the SQL execution, which can be Running, Completed, or other statuses depending on the progress of the query. |
More details | A link that provides additional details about the specific SQL execution, including deeper insights into the query’s performance and execution plan. |
Stages
The Stages section provides insights into the different stages of a Spark application. There are two available views: List and Timeline.
- The List tab displays a detailed breakdown of each stage in a table format, providing insights into the tasks and performance of each stage
- The Timeline tab shows the stages of the application in a visual format, where each stage is represented as a horizontal bar.


Driver & Executor Stats
The section shows driver and executor CPU and Memory usage. All executors are listed in the charts.

CPU Usage Driver(Driver & Executor) | These charts display the percentage of CPU usage over time for both the driver and the executors. Monitoring CPU usage helps identify whether resources are underutilized or overutilized. |
Memory Usage Driver(Driver & Executor) | This chart shows the memory usage percentage for both the driver and the executor, helping users track how efficiently memory is being consumed. |
Heap Usage Driver(Driver & Executor) | Displays the heap memory usage over time, which is important for identifying potential memory leaks or inefficiencies in memory allocation. |
Core Wastage Driver(Driver & Executor) | Tracks the number of CPU cores that are being wasted during job execution. High core wastage may indicate inefficient resource allocation or over-provisioning. |
Other Widgets and Interactive Features
Compare Runs | Allows users to compare the current job run with other previous runs. This comparison can help identify performance improvements or degradations over time. |
Spark SQL Executions | If applicable, provides detailed statistics on the execution of SQL queries run during the job. This can help users analyze how efficiently SQL tasks were executed. |
Stages | Displays detailed information about the different stages of the job, helping users track the progress and identify potential bottlenecks in the pipeline. |
Driver & Executor Summary | Users can review the performance of both the driver and executors in detail, allowing for comprehensive analysis of memory and resource usage. |
Metric Analysis | Users can explore the different metrics that provide insights into job execution. Metrics such as Memory Usage and Heap Usage are critical for performance tuning and identifying resource constraints. |
Trends and Stages | When available, users can use the Trends section to understand job behavior over time. The Stages section helps in identifying slow or inefficient stages of the job. |
Comparison | Users can make use of the Compare Runs feature to evaluate how job performance has changed over time, especially after modifying configurations or code optimizations. |
All Purpose Cluster

All Purpose Cluster Tab
Key Features and Sections
Clusters Name Panel
The left sidebar displays a list of all the clusters along with their corresponding total costs. Users can select a specific cluster to view more detailed information, including the total cost breakdown by date.
- Search Bar: Users can search for a specific cluster by entering the cluster name in the search box, which filters the displayed clusters accordingly.
- Cluster Name List: Displays the names of the clusters along with their total costs. The selected cluster is highlighted, and the total cost is updated accordingly in the graphical and tabular sections
Graphical Representation of Total Costs
A bar chart visualizes the Total Cost for the selected cluster over time. The x-axis represents the dates, and the y-axis shows the costs in USD. This graph allows users to quickly identify cost patterns and spikes on specific dates.
- The chart title updates based on the selected cluster
- Hovering over individual bars provides detailed cost information for that specific date.
Cost Breakdown Table
A detailed table below the graph breaks down the Databricks Cost, Vendor Cost, and Total Cost by date for the selected cluster. Users can sort the table by each column to view the highest or lowest costs over time.
Field | Descriptions |
---|---|
Date | The date on which the cost was incurred. |
Databricks Cost | The cost associated with Databricks resources for that day. |
Vendor Cost | Any costs related to third-party vendors for that specific day. |
Total Cost | The sum of the Databricks Cost and Vendor Cost for each day. |
Interaction Workflows
Cluster Selection: From the left sidebar, users can select a cluster to analyze. Upon selection, the graph and table will automatically update to reflect the costs for the chosen cluster.
Graph Analysis: The graph provides a visual summary of the total costs over time, allowing users to quickly identify cost spikes or changes in resource usage.
Detailed Cost Breakdown: Users can scroll through the detailed cost breakdown in the table, sorting by any column to identify key cost drivers on a daily basis.
Exporting Data: By using the Download button, users can easily export the cost data for further analysis or integration into external reporting tools.
The Databricks Compute interface provides a comprehensive and versatile set of tools and insights that empower users to manage and optimize their Databricks environments effectively. Through various tabs—Overview, Clusters, Job Studio, and All Purpose Cluster—users gain the ability to monitor and analyze critical metrics such as cluster states, job performance, resource utilization, and associated costs.
These tabs, combined with enhanced filter functionalities and interactive data visualizations, give users the ability to make informed, data-driven decisions that improve operational efficiency, optimize resource allocation, and manage costs effectively. The intuitive interface and powerful insights make it easier to detect potential issues, resolve errors, and fine-tune both workflows and infrastructure for optimal performance.
Job Runs

Job Runs Tab
This page provides an overview of Job Runs within the Databricks environment. It includes a detailed table listing information about completed and ongoing jobs, with multiple filtering options to narrow down the data. Key sections and features of the page include:
Filters
- Cluster Type: Allows filtering by the type of cluster used, such as job clusters or all-purpose clusters.
- Status: Filters jobs based on their status, such as Success, Failed, Canceled, or Running.
- Owner: Filters jobs by the user who initiated or owns the job.
Job Runs Aggregate Table
Field | Description |
---|---|
Cluster Name | The name of the cluster associated with the job run. |
Cluster ID | The unique identifier for the cluster. |
Cluster Type | Indicates the type of cluster (e.g., job_cluster __or all purpose_cluster). |
Job Name | The name of the job or query being run. |
Status | The completion status of the job (e.g., SUCCESS, FAILED, CANCELED). |
Job ID | The unique identifier of the job run. |
Duration | Specifies the time taken to complete the job. |
DBU Consumed | Indicates the Databricks Units consumed during the job run. |
Estimated Databricks Cost | The estimated cost incurred for Databricks usage. |
Estimated Vendor Cost | The estimated cost incurred from the underlying cloud vendor. |
Start Time | The time when the job run started. |
End Time | The time when the job run ended. |
Executor Heap Used % | The percentage of heap memory utilized by the executor during job execution. |
CPU Used % | The percentage of CPU resources consumed by the executor. |
Executor Memory | The total memory allocated to the executor for processing tasks. |
Diagnostics | Additional diagnostic information or status for the job, such as errors or workload details. |
Owner | The email address or identifier of the person who owns or initiated the job. |
App Id | A unique identifier assigned to each application or job instance. |
App Name | The name or reference label associated with the job or application. |
DLT Pipelines

DLT Pipelines Tab
This page provides an interface for managing and monitoring Delta Live Tables (DLT) pipelines. It displays a list of pipelines along with key details such as their current state, execution mode, recent run information, and performance metrics. You can apply filters to refine the pipeline list based on specific criteria.
Filters
- Current State: Filter by the pipeline's operational status (e.g., Idle, Running, Failed).
- Owner: Filter by the user or team managing the pipeline (e.g., usernames, email addresses).
DLT Pipelines Aggregate Table
Field | Description |
---|---|
Name | The name of the Delta Live Table (DLT) pipeline. |
Current State | The current operational state of the pipeline (e.g., Idle). |
Owner | The user who owns or manages the pipeline. |
Pipeline Execution | The environment in which the pipeline is executed (e.g., Development). |
Pipeline Mode | The mode in which the pipeline runs (e.g., Triggered). |
Total Runs | The total number of times the pipeline has executed. |
Last Job Run | A link or identifier for the most recent pipeline job execution. |
Last Run State | The result of the most recent run (e.g., Failed, Completed). |
Last Run Duration | The amount of time the last run took to complete. |
Last Run Start Time | The date and time when the most recent run started. |
Clicking a pipeline name opens the Pipeline Run Details side panel, displaying key information about the pipeline, such as:

Pipeline Run Details
Field | Description |
---|---|
Pipeline Name | The name of the pipeline being executed. |
Cluster ID | The unique identifier of the cluster associated with the pipeline run. |
State | The current status of the pipeline run (e.g., FAILED , SUCCESSFUL ). |
Cause | The reason for the pipeline run state (e.g., JOB_TASK ). |
Start Time | The date and time when the pipeline run started. |
End Time | The date and time when the pipeline run ended. |
Is Validate Only | Indicates whether the run was for validation only (true or false ). |
Is Full Refresh | Specifies whether the pipeline run performed a full refresh (true or false ). |
Execution | Details about the execution. |
Databricks Cost | The cost associated with Databricks resources for the pipeline run. |
Cloud Vendor Cost | The cost incurred for cloud resources used in the pipeline run. |
Total Cost | The combined total cost of Databricks and cloud vendor resources. |