Apache Hive

Connect your Hive environment to ADOC for data profiling, governance, quality and lineage tracking.

Prerequisites

Ensure the following requirements are met before you connect Hive as a data source:

Hive metastore access: Your Hive metastore URI must be reachable.
Authenticated Hive Access: Use credentials or configuration that allow ADOC to read metadata.
Data Plane Configuration: Select or create a Data Plane that can connect to Hive.
Network Connectivity: Ensure there are no firewall or proxy rules blocking ADOC from accessing the Hive metastore.

Add Hive as a Data Source

Follow these steps to set up Hive in ADOC:

Step 1: Start Setup

Select Register from the left main menu.
Select Add Data Source.
Select Hive from the list of data sources.
On the Data Source Details page:

Enter a unique name for this data source.
Optionally, add a brief description to clarify its purpose.
Enable the Data Reliability toggle and select your data plane from the drop-down list.

Select Next to proceed.

Step 2: Add Connection Details

Field	Description
Metastore JDBC URL	JDBC URI used to connect to the Hive metastore (e.g., `jdbc:mysql://host:3306/hive`)
Metastore Username	Hive metastore username.
Metastore Password	Password for the username provided.
Cluster Name	Logical name used to identify this Hive cluster in ADOC.
Hive Connection Type	Connection mode for Hive integration. Select `DEFAULT` or `Hive3` as per requirement.

Select Test Connection. If successful, you’ll see “Connected.” If not, check the values and try again.
Select Next to proceed.

Step 3: Setup Observability

Select the databases from the Databases dropdown list.

Optional Settings

Enable Schema Drift Monitoring to track structural changes in your Hive tables. NOTE __Schema drift detection requires scheduled crawler to be enabled.
Enable Job Concurrency and set a maximum number of parallel jobs using Maximum Slots to limit how many profiling or schema checks can run in parallel. For more information, see Control Plane Concurrent Connections and Queueing Mechanism.
Enable Crawler Execution Schedule to set up scheduled scans of your Hive tables:
1. Choose how often the crawler runs (e.g., daily)
2. Set execution time and time zone
3. Add multiple execution times if needed
Set Notifications
1. Notify on Crawler Failure: Choose one or more channels to receive failure alerts via configured channels.
2. Notify on Success: Toggle this if you'd like to receive success notifications.
Click Submit to save your configuration to register and begin monitoring the Hive data source.

You have successfully added Hive as a data source. A new card for Hive will appear on the Data Sources page, displaying crawler status and basic connection details.

You can choose to run the crawler immediately or schedule it for later.

What’s Next

After you connect Hive, you can go to the following pages to view and manage your data:

Reliability – View data quality scores, profiling status, and rule results for your BigQuery datasets.
Pipelines – Monitor how data flows from BigQuery to downstream systems.
Alerts – See notifications for quality issues or pipeline delays in real time.

Last updated on

Was this page helpful?