Iceberg

Apache Iceberg is an open table format designed for large-scale analytic datasets stored on data lakes and cloud object storage.

Integrating Apache Iceberg with Acceldata Data Observability Cloud (ADOC) gives your team full observability over Iceberg tables managed through AWS Glue, REST Catalog, or REST Catalog (Hive JDBC). Once connected, ADOC crawls your Iceberg namespaces and tables, discovers assets, and makes them available for data reliability policies, profiling, and monitoring — without requiring any changes to your existing Iceberg infrastructure.

What You Can Do with Iceberg in ADOC

With an Iceberg integration, you can:

Discover and inventory Iceberg tables across namespaces in your catalog.
Apply Data Reliability policies — including format, freshness, volume, and custom SQL rules — directly on Iceberg table assets.
Schedule automated crawlers to keep your asset inventory current and receive notifications when a crawler fails or succeeds.
Monitor schema changes and data drift across Iceberg tables over time.
Use ADOC's Reliability Explorer to track policy results, failed records, and data health trends for Iceberg assets.

Prerequisites

Before integrating Apache Iceberg with ADOC, ensure the following conditions are met:

A running ADOC data plane is available and reachable from your environment.
The Iceberg catalog type you intend to connect (AWS Glue, REST Catalog, or REST Catalog with Hive JDBC) is provisioned and accessible.
For AWS Glue: Valid AWS credentials are available in one of the supported forms: Access Key and Secret Key, EC2 Instance Profile, or IAM Roles for Service Accounts (IRSA). The credentials must have read access to the AWS Glue Data Catalog and the underlying S3 storage location.
For REST Catalog: The REST Catalog URI is accessible from the data plane. Authentication credentials (username and password or token) are available. If the catalog does not provide storage credentials, you must supply separate storage authentication credentials for the underlying S3 or ECS storage.
For REST Catalog (Hive JDBC): In addition to the REST Catalog credentials, a valid Hive JDBC connection URL, username, and password or token are required.
Your ADOC user account has the Integrations permission to add data sources via Control Center.

Add Iceberg as a Data Source in ADOC

Step 1: Start Setup

In the left main navigation menu, go to Control Center -> Integrations.
On the Integrations page, click Add Data Source.
From the list of available data sources, select Iceberg.
On the Basic Details page, complete the following details and click Next:
1. Name: Enter a unique name to identify this Iceberg integration within ADOC.
2. Description: (Optional) Enter a brief description of this data source.
3. Data Reliability: Ensure the toggle is enabled to activate data reliability monitoring for this integration.
4. Data Plane: Select the data plane that will run crawler and analysis jobs for this integration.

Step 2: Add Connection Details

On the Connection Details page, select the Iceberg catalog type and provide the corresponding connection parameters.

Select one of the following catalog types from the Iceberg Catalog dropdown:

AWS Glue

If you select AWS Glue, provide the following:

Field	Description
Storage Authentication Type	Select one of the following: Access Key and Secret Key, EC2 Instance Profile, or IAM Roles for Service Accounts (IRSA).
Region	Enter the AWS region where the Glue Data Catalog and underlying storage reside (for example, `us-east-1`).

If you selected Access Key and Secret Key, also provide:

Field	Description
Access Key	Enter the AWS access key ID with read access to Glue and the associated S3 storage.
Secret Key	Enter the corresponding AWS secret access key.

REST Catalog

If you select REST Catalog, provide the following:

Field	Description
REST Catalog URI	Enter the base URI of the REST Catalog endpoint (for example, `https://catalog.example.com`).
Username	Enter the username for authenticating to the REST Catalog.
Password / Token	Enter the password or bearer token for authentication.

Storage Credentials

ADOC supports two approaches for storage authentication when using a REST Catalog:

Catalog-provided storage credentials: Enable the Catalog Provides Storage Credentials toggle if the REST Catalog is configured to vend temporary storage credentials directly. No additional storage configuration is required.
Direct storage authentication: If the catalog does not provide storage credentials, leave the toggle disabled and provide the following:

Field	Description
Storage Authentication Type	Select one of the following: Access Key and Secret Key, EC2 Instance Profile, or IAM Roles for Service Accounts (IRSA).
Region	Enter the AWS region of the underlying storage (for example, `us-west-2`).
S3 / ECS Storage Endpoint	(Optional) Enter a custom storage endpoint if using a non-AWS S3-compatible object store such as ECS.

REST Catalog (Hive JDBC)

If you select REST Catalog (Hive JDBC), provide the following:

Field	Description
REST Catalog URI	Enter the base URI of the REST Catalog endpoint.
Username	Enter the username for authenticating to the REST Catalog.
Password / Token	Enter the password or bearer token for REST Catalog authentication.
Hive JDBC URL	Enter the JDBC connection URL for the Hive Metastore (for example, `jdbc:hive2://hive-server:10000`).
Hive JDBC User	Enter the Hive JDBC username (for example, `admin`).
Hive JDBC Password / Token	Enter the password or token for the Hive JDBC connection.

After entering the connection details for your chosen catalog type, click Test Connection to verify that ADOC can connect to the catalog using the credentials provided. Resolve any connection errors before proceeding.

Once the connection test passes, click Next.

Step 3: Set Up Observability

On the Observability Setup page, configure how ADOC will discover and monitor Iceberg assets.

Database / Namespace

Select the database or namespace within the Iceberg catalog that ADOC should crawl. Only namespaces accessible to the authenticated catalog user are listed.

Crawler Execution Schedule (Optional)

Enable the Crawler Execution Schedule toggle to configure automated crawls on a recurring basis. Select the crawl frequency from the available options (for example, hourly, daily, or weekly). When disabled, you can trigger crawls manually from the integration settings at any time.

Notifications

Field	Description
Notify on Crawler Failure	Enable this toggle to receive a notification when a scheduled or manual crawler run fails. Select one or more Notification Channels from the list to specify where alerts are sent.
Notify on Success	Enable this toggle to receive a notification when a crawler run completes successfully. By default, ADOC notifies on failed policy execution; enabling this option adds success notifications.

Click Submit to complete the integration. ADOC will initiate the first crawl of the selected Iceberg namespace and populate the asset inventory.

What's Next

After completing the Iceberg integration, explore the following topics:

Discover Assets – Browse and search the Iceberg tables discovered by ADOC within your connected namespace.
Data Reliability Policies – Create and apply policies to monitor data quality, freshness, and volume on Iceberg assets.
Reliability Explorer – Review policy execution results, failed record counts, and asset health trends across your Iceberg tables.
Notification Channels – Configure Slack, Microsoft Teams, PagerDuty, or other channels to receive alerts from ADOC.
Manage Integrations – Edit connection details, update credentials, or re-run crawlers from the Integrations page in Control Center.

Last updated on

Was this page helpful?