Pulse Services

Pulse Services are the core functional building blocks of Pulse, handling tasks such as data collection, processing, alerting, communication, visualization, system management, etc. These work together to ensure seamless data flow, real-time analytics, and efficient monitoring within the Pulse ecosystem.

This page provides a high-level view of Pulse services. Pulse services are grouped into the following categories, each described in detail on their respective pages:

Core Services: These are essential services that handle data collection, processing, and orchestration. For details, see Pulse Core Services.
Add-on Services: These are optional services that you can deploy based on specific observability requirements. For details, see Pulse Add-on Services.
Databases: These services provide persistent storage for processed metrics, logs, and metadata. For details, see Pulse Databases.

Service	Purpose	Description	Communication
Messaging Queue	Serves as an asynchronous messaging queue for various events processed by Pulse services. Critical events, such as Hive, Tez, Yarn, and Spark process events, are stored here.	Built on the NATS messaging framework, it stores critical metadata about big data processes, enabling services to operate independently. It plays a vital role in debugging and allows revisiting and reprocessing past events when needed.	A lightweight text-based publish/subscribe protocol built on top of TCP/IP sockets.
Connectors	Entities that apply business logic and analytics to raw events, responsible for collecting events from components that require a pull-based mechanism for event retrieval.	Some of the connector processes include Kafka, Yarn, and Spark. These connectors function as clients to big data processes and utilize a pull mechanism to retrieve metadata events. For example, they can fetch all topics from a Kafka cluster or retrieve files from HDFS for Spark listener events.	The connectors use independent protocols based on the specific client. For example, Yarn utilizes an HTTP-based protocol, Spark employs a custom RPC protocol for HDFS, and Kafka uses a binary protocol built on top of TCP.
Streaming	A stateless service that serves as a gateway to long-term storage. Pulse utilizes MongoDB to store long-term data, including summaries, analytics, and the status of big data processes.	This service is designed to perform the final step of analytics or summarization before persisting data into multiple MongoDB collections. It is leveraged by the Pulse UI to display graphs, summaries, trends, analytics, and recommendations.	The connectors transmit data to the streaming microservice using the HTTP protocol, while MongoDB communicates using its wire binary protocol.
Notifications	A service that delivers notifications to end users through multiple channels, such as Slack, email, and more.	When an alert is breached, a notification is generated and sent to the end user.	Different protocols are used to facilitate communication with target channels.
Alerts	A service that continuously evaluates system conditions for breaches and notifies the notifications microservice to send appropriate communications to the end user.	The busiest service in the Pulse data pipeline, responsible for evaluating custom rules set by the end user to trigger notifications or actions when anomalous behavior is detected.	Primarily interacts with MongoDB to retrieve data for evaluation, enabling the triggering of actions or notifications when necessary.
Logstash(+Curator)	Necessary for the collection and storage of large-scale system logs in a big data environment.	Open-source frameworks for collecting logs from a target cluster.	Uses the Beats protocol, with JSON as the primary format for payloads.
Dashplot (+ DB Explorer)	A custom "Pulse" tool designed to connect to multiple backend databases, enabling the implementation of custom analytics for time series metrics and their associated graphs.	Provides quick insights into the data stored in the Pulse backend.	HTTP protocol
Director	A service that implements various actions triggered by custom alerts, serving as the defined responses to maintain the system's stability.	Uses Ansible as one of the methods to perform actions on target hosts, such as cleaning up dormant HDFS files. Additionally, it supports executing these actions through lightweight agents on the target hosts, utilizing a command pattern implemented via a messaging queue.	SSH and NATS protocol
Hydra	Responsible for deploying, configuring, refreshing, and managing remote Pulse agents.	Hydra monitors and manages Pulse Node Agents, Yarn Optimizer Agents, and others, providing command and control for remote agents.	Custom binary protocol.
Pulse UI, GraphQL	The gateway for users to view their big data processes, access recommendations and analytics, track actions performed, review generated alerts, monitor service health, detect app anomalies, and explore historical data.	The secure UI (customer has to provide the certificates) serves as the common gateway to all the target clusters owned by the customer.	Interacts with various databases, including MongoDB, VictoriaMetrics, Elasticsearch, and others.
Databases	MongoDB is used for long-term storage, VictoriaDB for time series data, and ElasticSearch for storing container and log information for quick retrieval.	Provides long-term data storage, where a MongoDB cluster can be configured as a sharded cluster to offer retention periods ranging from 6 to 9 months, depending on infrastructure availability. This setup enables efficient storage and quick query performance across sharded MongoDB instances.	Custom protocols are used based on the database implementation: MongoDB utilizes its custom binary protocol, VictoriaDB uses the HTTP protocol, and Elasticsearch also communicates via the HTTP protocol.

Last updated on

Was this page helpful?