| Keyword | Description |
|---|---|
| Pulse UI | The interface for monitoring and analyzing cluster data. Pulse UI is the central hub where users can view cluster status, metrics, incidents, and alerts. It provides interactive charts, dashboards, and drill-downs to support operational decision-making. |
| Cluster | A group of connected nodes working together to run large-scale data processing tasks. In Pulse, clusters are displayed as the main units for monitoring. Users can view cluster health, performance metrics, and workloads in one place to ensure smooth operations. |
| Node | An individual machine in a cluster that runs Hadoop services and workloads. Pulse provides node-level details, including CPU, memory, disk, network usage, etc., helping users identify and troubleshoot localized issues quickly. |
| Hadoop Services | Software components (e.g., YARN, HDFS, Hive) perform specific cluster functions. Hadoop Services are monitored in Pulse to track availability and performance. Users can see which services are healthy or failing and understand the impact and performance on overall cluster operations. |
| Metadata | Descriptive information about tasks or resources for monitoring and analysis. Metadata in Pulse provides context for clusters, nodes, and workloads. Users can audit activity, track trends over time, and investigate root causes of issues. |
| Resource Usage | Key metrics showing the usage of CPU, memory, disk, I/O resources, etc. at the cluster, node, and service levels. Pulse visualizes these metrics through charts and dashboards, enabling users to monitor processing load, detect bottlenecks, and optimize resource usage at the cluster, node, and service levels. |
| Throughput Analysis | Overview of I/O and network performance across nodes. Pulse visualizes disk read/write times, I/O, and network packet trends to help users monitor node performance and detect bottlenecks. |
| Cluster Health Overview | A consolidated view of a cluster, including nodes, services, workloads, incidents, and logs. Pulse displays the Cluster Health Overview on the Home page, giving users a high-level view of cluster performance and stability for faster decision-making. |
| Search | A feature that enables searching and filtering of records across various Pulse UI pages, including Nodes, Alerts, Logs, and YARN Application Explorer. Pulse Search allows users to quickly locate records, filter data by relevant fields, and analyze cluster activity efficiently to support troubleshooting and operational decision-making. |
| Agent Status | Status of Agents installed on hosts, indicating Up, Down, Uninstalled, or Never Installed. Pulse displays Agent status across hosts to help users identify nodes requiring attention and maintain continuous data collection. |
| YARN Workload or Application | A workload managed by YARN for distributing cluster resources across applications. Pulse shows YARN workloads to help users analyze task distribution, performance trends, and resource allocation for efficient cluster management. |
| Application | A top-level workload in YARN, created when a user submits a job or query. It is managed by an ApplicationMaster and may contain one or more jobs or queries. Pulse monitors applications across services (MapReduce, Spark, Tez, Hive, Impala, etc.) in the Application Explorer. It displays type, status, resource usage, incidents, logs, and recommendations. |
| Job | A unit of execution within an application, typically Spark, MapReduce, etc. A job consists of multiple tasks. Pulse tracks job-level metrics such as memory, CPU, containers, and hosts to help troubleshoot performance and resource usage. |
| Query | A user request (for example, Hive, Tez, or Impala SQL) that is compiled into jobs or stages and executed as a YARN application. Pulse provides query visibility, showing execution status, incidents, and recommendations for performance optimization. |
| Task | The smallest unit of execution in YARN, such as a mapper, reducer, Spark task, etc., running inside a container. Pulse highlights task inefficiencies through recommendations (for example, low mapper memory usage, small tasks, task runtime skews) to help tune resource allocation. |
| Container | A resource allocation unit in YARN that provides CPU and memory for running tasks. Each task (mapper, reducer, Spark executor, etc.) executes within a container on a specific host. Pulse tracks container usage, availability, and allocation trends, helping identify resource bottlenecks and optimize performance. |
| Queue | A logical resource pool in YARN that controls how resources are shared across applications. Queues enforce capacity, priorities, and scheduling policies for jobs and queries. Pulse monitors queue utilization with CPU and memory charts, helping you analyze workload distribution and detect contention. |
| Concurrent Applications or Jobs | Jobs or applications running at the same time. Pulse enables you to select an application or job and check the other jobs running at the same time to analyze data. |
| Agent Status | Status of Agents installed on hosts, indicating Up, Down, Uninstalled, or Never Installed. Pulse displays Agent status across hosts to help users identify nodes requiring attention and maintain continuous data collection. |
| Anomaly Detection | Automatically identifies and highlights abnormal data points in time-series charts based on historical trends. |
Was this page helpful?