Google Cloud Pub/Sub
Google Cloud Pub/Sub integration in Acceldata Data Observability Cloud (ADOC) enables comprehensive data reliability, observability, and profiling for your event-driven architecture. Introduced in ADOC v4.7.0, this connector allows you to crawl, profile, and reconcile Pub/Sub data streams using ADOC’s batch reading engine—without complex setup.
Google Cloud Pub/Sub is a fully managed real-time messaging service that allows applications to exchange event data at scale. By integrating it with ADOC, you can ensure continuous data reliability and visibility for your streaming workloads. The ADOC connector reads from Pub/Sub topics using ephemeral (temporary) subscriptions, ensuring isolated execution and automatic cleanup after each batch job.
Supported Authentication Methods
| Authentication Type | Description |
|---|---|
| Google Workload Identity (Default) | Uses GCP Workload Identity Federation for secure, identity-based authentication between ADOC and GCP. |
| Service Account Key File (JSON) | Authenticate using a service account JSON key uploaded directly in ADOC. |
Prerequisites and Permissions
Before adding Google Cloud Pub/Sub as a data source, ensure the following:
- You have an existing Data Plane configured in ADOC. Refer to the Data Plane Installation Guide plane for setup instructions.
- The following GCP IAM permissions must be granted to the service account used for connection:
| Permission | Resource Scope | Purpose |
|---|---|---|
pubsub.subscriptions.create | Subscription Project ID | Create ephemeral subscriptions during job execution. |
pubsub.topics.attachSubscription | Each Topic of Interest | Attach ephemeral subscriptions to Pub/Sub topics. |
pubsub.subscriptions.delete | Subscription Project ID | Delete ephemeral subscriptions post-job to avoid residual resources. |
pubsub.subscriptions.consume, pubsub.messages.pull, pubsub.messages.acknowledge | Ephemeral Subscriptions | Enable Spark Batch Reader to read and acknowledge messages. |
The Test Connection validates lifecycle permissions (create, attach, delete) critical for ADOC’s Pub/Sub batch processing model.
- Ensure the following connection details are available:
- Source and Subscription Project IDs
- Google Cloud region (e.g.,
us-east1) - Authentication credentials (Workload Identity or JSON key)
- Topics to be read by ADOC
Configuration Parameters
| Parameter | Description | Mandatory | Example |
|---|---|---|---|
| Data Source Name | Unique identifier for the Pub/Sub source. | ✅ | GCP-PubSub-Prod |
| Description | Optional notes for the data source. | ❌ | Production Pub/Sub data pipeline |
| Data Plane | ADOC Data Plane to use. | ✅ | dp-gcp-us |
| Source Project ID | Project ID where Pub/Sub topics reside. | ✅ | source-project-123 |
| Subscription Project ID | Project ID for temporary subscriptions. | ✅ | subscription-project-xyz |
| Region | GCP region of Pub/Sub topics. | ✅ | us-east1 |
| Authentication Method | Choose Workload Identity or Upload Service Account File. | ✅ | Workload Identity |
| Service Account File | JSON key for Service Account authentication. | ⚙️ Required if JSON file method chosen | /path/to/service-account.json |
| Topics of Interest | Comma-separated list of Pub/Sub topics to monitor. | ✅ | orders-topic, audit-topic |
Adding Google Cloud Pub/Sub as a Data Source
Navigate to Register > Data Sources tab in ADOC.
Click Add Data Source.
Select Google Cloud Pub/Sub from the list of data sources.
Enter a Data Source Name and optional Description.
Ensure the Data Reliability toggle is enabled.
Choose an existing Data Plane or create a new one.
Click Next to configure Connection Details.
Provide the following:
- Authentication method (choose between Workload Identity or JSON file)
- Credentials File
- Source Project ID
- Subscription Project ID
- List of Topic Names
Click Test Connection to validate access and permissions.
Once successful, click Next to configure Topic details in the Observability Setup step.
Configuring Topic Details
After successfully connecting to your Google Cloud Pub/Sub data source, the Set Up Observability page allows you to configure topic-level settings for monitoring and data reliability.
| Field | Description | Example |
|---|---|---|
| Asset Name | Logical name assigned to the Pub/Sub topic within ADOC. Appears as the asset identifier in the Data Reliability dashboard. | orders_topic_asset |
| Topic Name | Exact name of the Pub/Sub topic in Google Cloud. | orders-topic |
| Message Format | Supported formats: JSON, Avro, Confluent Avro. | JSON |
| Subscriber Parallelism | Number of parallel subscribers used for data reading during job execution. Controls throughput. | 1 |
| Schema ID | Identifier of the schema (for Avro/Confluent Avro). Used for mapping to a Schema Registry entry. | orders-schema-v2 |
| Schema Naming Strategy | Naming convention used to resolve schema identity. Options: TOPIC_NAME, RECORD_NAME, TOPIC_RECORD_NAME. | TOPIC_RECORD_Name |
| Key or Value | Specifies whether the schema applies to the message key or value. | Value |
| Record Name | Record name for Avro or Confluent Avro messages. | OrderRecord |
| Record Namespace | Avro namespace used to organize record schemas. | com.retail |
| Topic Schema | Full schema definition for the topic, if manually provided | Inline JSON or avro schema |
| Schema File Path | Path to an external schema file (e.g., .avsc). | /schemas/order.avsc |
These parameters allow ADOC’s Spark Batch Reader to interpret payloads correctly during data profiling and quality evaluation.
Optional Settings
Enable Schema Drift Monitoring
Turn on this setting to track structural changes (schema drift) in your Pub/Sub topic data over time.
Note: Schema drift detection requires Enable Crawler Execution Schedule to be turned on.
Enable Crawler Execution Schedule
Set up scheduled crawlers to automatically scan and profile your Pub/Sub topics at regular intervals.
Options include:
- Frequency: Choose how often the crawler runs (e.g., Daily, Weekly, Hourly).
- Execution Time: Specify the start time for crawler execution.
- Time Zone: Select the appropriate time zone (e.g.,
UTC,Asia/Calcutta). - Multiple Execution Windows: Add multiple time slots as needed.
Example:
Every Day at 12:00 AM UTC (Next Execution: 2025-10-28 05:30:00 Asia/Calcutta)
Set Notifications
- Notify on Crawler Failure: Select one or more configured notification channels (e.g., Slack, Email) to receive alerts if a crawler run fails.
- Notify on Success: Toggle on to receive notifications when a crawler run completes successfully.
Finally, click Submit to save your topic configuration and begin monitoring Pub/Sub data through ADOC.
Data Reading Options (Batch Mode)
ADOC supports two data reading modes for Google Cloud Pub/Sub:
1. Full Read
- Creates a temporary subscription.
- Reads all messages from the earliest retained message to the job start time.
- Deletes the subscription after processing.
Use Case: Initial ingestion or complete refresh of topic data.
2. Incremental Read
- Creates a new ephemeral subscription for each job run.
- Supports two strategies:
- Timestamp-based: Reads messages newer than the previous job’s watermark.
- Lookback-based: Reads messages within a user-defined time window (e.g., last 24 hours).
Use Case: Continuous, non-overlapping data processing or recovery with overlap for fault tolerance.
Next Steps
- View the newly added data source under Data Reliability → Data Sources.
- Schedule crawler runs for continuous profiling and data quality checks.
- Monitor data health, schema drift, and freshness metrics through ADOC dashboards.