Google Cloud Pub/Sub

Google Cloud Pub/Sub integration in Acceldata Data Observability Cloud (ADOC) enables comprehensive data reliability, observability, and profiling for your event-driven architecture. Introduced in ADOC v4.7.0, this connector allows you to crawl, profile, and reconcile Pub/Sub data streams using ADOC’s batch reading engine—without complex setup.

Google Cloud Pub/Sub is a fully managed real-time messaging service that allows applications to exchange event data at scale. By integrating it with ADOC, you can ensure continuous data reliability and visibility for your streaming workloads. The ADOC connector reads from Pub/Sub topics using ephemeral (temporary) subscriptions, ensuring isolated execution and automatic cleanup after each batch job.

Supported Authentication Methods

Authentication Type	Description
Google Workload Identity (Default)	Uses GCP Workload Identity Federation for secure, identity-based authentication between ADOC and GCP.
Service Account Key File (JSON)	Authenticate using a service account JSON key uploaded directly in ADOC.

Prerequisites and Permissions

Before adding Google Cloud Pub/Sub as a data source, ensure the following:

You have an existing Data Plane configured in ADOC. Refer to the Data Plane Installation Guide plane for setup instructions.
The following GCP IAM permissions must be granted to the service account used for connection:

Permission	Resource Scope	Purpose
`pubsub.subscriptions.create`	Subscription Project ID	Create ephemeral subscriptions during job execution.
`pubsub.topics.attachSubscription`	Each Topic of Interest	Attach ephemeral subscriptions to Pub/Sub topics.
`pubsub.subscriptions.delete`	Subscription Project ID	Delete ephemeral subscriptions post-job to avoid residual resources.
`pubsub.subscriptions.consume`, `pubsub.messages.pull`, `pubsub.messages.acknowledge`	Ephemeral Subscriptions	Enable Spark Batch Reader to read and acknowledge messages.

The Test Connection validates lifecycle permissions (create, attach, delete) critical for ADOC’s Pub/Sub batch processing model.

Ensure the following connection details are available:
- Source and Subscription Project IDs
- Google Cloud region (e.g., us-east1)
- Authentication credentials (Workload Identity or JSON key)
- Topics to be read by ADOC

Configuration Parameters

Parameter	Description	Mandatory	Example
Data Source Name	Unique identifier for the Pub/Sub source.	✅	`GCP-PubSub-Prod`
Description	Optional notes for the data source.	❌	`Production Pub/Sub data pipeline`
Data Plane	ADOC Data Plane to use.	✅	`dp-gcp-us`
Source Project ID	Project ID where Pub/Sub topics reside.	✅	`source-project-123`
Subscription Project ID	Project ID for temporary subscriptions.	✅	`subscription-project-xyz`
Region	GCP region of Pub/Sub topics.	✅	`us-east1`
Authentication Method	Choose Workload Identity or Upload Service Account File.	✅	`Workload Identity`
Service Account File	JSON key for Service Account authentication.	⚙️ Required if JSON file method chosen	`/path/to/service-account.json`
Topics of Interest	Comma-separated list of Pub/Sub topics to monitor.	✅	`orders-topic, audit-topic`

Adding Google Cloud Pub/Sub as a Data Source

Navigate to Register > Data Sources tab in ADOC.
Click Add Data Source.
Select Google Cloud Pub/Sub from the list of data sources.
Enter a Data Source Name and optional Description.
Ensure the Data Reliability toggle is enabled.
Choose an existing Data Plane or create a new one.
Click Next to configure Connection Details.
Provide the following:
- Authentication method (choose between Workload Identity or JSON file)
- Credentials File
- Source Project ID
- Subscription Project ID
- List of Topic Names
Click Test Connection to validate access and permissions.
Once successful, click Next to configure Topic details in the Observability Setup step.

Configuring Topic Details

After successfully connecting to your Google Cloud Pub/Sub data source, the Set Up Observability page allows you to configure topic-level settings for monitoring and data reliability.

Field	Description	Example
Asset Name	Logical name assigned to the Pub/Sub topic within ADOC. Appears as the asset identifier in the Data Reliability dashboard.	`orders_topic_asset`
Topic Name	Exact name of the Pub/Sub topic in Google Cloud.	`orders-topic`
Message Format	Supported formats: JSON, Avro, Confluent Avro.	`JSON`
Subscriber Parallelism	Number of parallel subscribers used for data reading during job execution. Controls throughput.	`1`
Schema ID	Identifier of the schema (for Avro/Confluent Avro). Used for mapping to a Schema Registry entry.	`orders-schema-v2`
Schema Naming Strategy	Naming convention used to resolve schema identity. Options: `TOPIC_NAME`, `RECORD_NAME`, `TOPIC_RECORD_NAME`.	`TOPIC_RECORD_Name`
Key or Value	Specifies whether the schema applies to the message key or value.	`Value`
Record Name	Record name for Avro or Confluent Avro messages.	`OrderRecord`
Record Namespace	Avro namespace used to organize record schemas.	`com.retail`
Topic Schema	Full schema definition for the topic, if manually provided	Inline JSON or avro schema
Schema File Path	Path to an external schema file (e.g., `.avsc`).	`/schemas/order.avsc`

These parameters allow ADOC’s Spark Batch Reader to interpret payloads correctly during data profiling and quality evaluation.

Optional Settings

Enable Schema Drift Monitoring

Turn on this setting to track structural changes (schema drift) in your Pub/Sub topic data over time.

Note: Schema drift detection requires Enable Crawler Execution Schedule to be turned on.

Enable Crawler Execution Schedule

Set up scheduled crawlers to automatically scan and profile your Pub/Sub topics at regular intervals.

Options include:

Frequency: Choose how often the crawler runs (e.g., Daily, Weekly, Hourly).
Execution Time: Specify the start time for crawler execution.
Time Zone: Select the appropriate time zone (e.g., UTC, Asia/Calcutta).
Multiple Execution Windows: Add multiple time slots as needed.

Example:

Every Day at 12:00 AM UTC (Next Execution: 2025-10-28 05:30:00 Asia/Calcutta)

Set Notifications

Notify on Crawler Failure: Select one or more configured notification channels (e.g., Slack, Email) to receive alerts if a crawler run fails.
Notify on Success: Toggle on to receive notifications when a crawler run completes successfully.

Finally, click Submit to save your topic configuration and begin monitoring Pub/Sub data through ADOC.

Data Reading Options (Batch Mode)

ADOC supports two data reading modes for Google Cloud Pub/Sub:

1. Full Read

Creates a temporary subscription.
Reads all messages from the earliest retained message to the job start time.
Deletes the subscription after processing.

Use Case: Initial ingestion or complete refresh of topic data.

2. Incremental Read

Creates a new ephemeral subscription for each job run.
Supports two strategies:
- Timestamp-based: Reads messages newer than the previous job’s watermark.
- Lookback-based: Reads messages within a user-defined time window (e.g., last 24 hours).

Use Case: Continuous, non-overlapping data processing or recovery with overlap for fault tolerance.

Next Steps

View the newly added data source under Data Reliability → Data Sources.
Schedule crawler runs for continuous profiling and data quality checks.
Monitor data health, schema drift, and freshness metrics through ADOC dashboards.

Last updated on

Was this page helpful?