ADOC Glossary

The terms used in relation to ADOC are listed here, along with a brief description of what each one implies.

A

TermDefinition
Alerts

An alert is a notification generated when a observability parameter fails or succeeds.

For example, users receive an alert if a data quality check identifies anomalies in a dataset.

Analysis Service

The Analysis Service performs data profiling, rule executions, and data sampling tasks using various configuration parameters, such as Data Retention Days, Historical Metrics Interval for Anomaly Detection, and Minimum Required Historical Metrics for Anomaly Detection.

For example, the Analysis Service can identify unexpected items or events in a dataset using historical metrics.

Anomaly

An anomaly refers to irregularities detected in a dataset's values using historical metrics. ADOC allows customization of Minimum Required Historical Metrics and Historical Metric Interval.

For example, anomalies can include incorrect data values, unexpected data elements, or outlier records.

API (Application Programming Interface) keys

An API key is a unique identifier used to authenticate a user, developer, or calling program to an API.

For example, the POST Start Profiling API method requires an API key to initiate the asset profiling process.

Asset

An asset is an entity composed of data, such as a warehouse or database containing schemas and tables, or files in storage services like S3, GCS, or ADLS.

For example, a data asset may consist of data records organized into schemas, tables, and columns.

Auto ProfileAuto profiling is the automated processing of information to analyze data. This process displays data source assets that have auto profiling enabled.

B

TermDefinition
Business GlossaryThe Business Glossary captures all information about specific assets, pipelines, or business processes for future reference.

C

TermDefinition
Compute

The Compute feature provides an estimate of resource utilization and the compute and storage costs of the underlying infrastructure. It offers recommendations to optimize resource allocation and reduce costs.

For example, if instances are running for a specific time at a certain cost, the Compute feature helps understand the cost and provides recommendations on efficient instance usage.

Contract

A contract is a summary that includes the organization's name, account consumption, capacity used as a percentage, and the contract end date. Contract costs can be predicted using previous cost consumption metrics.

For example, viewing costs across accounts and services in an organization shows storage, cloud services, replication, data transfer, and compute costs.

Crawl

Crawling is the process of extracting metadata from data sources. After establishing a connection to the data source, you can crawl metadata from the remote data source into the ADOC database.

For example, crawling retrieves metadata such as owner, table type, and row count.

D

TermDefinition
Data Drift Policy

A Data Drift Policy determines the percentage change in certain metrics when the underlying data changes. Users can create data drift rules to validate data changes against tolerance thresholds for each metric type.

For example, setting a data drift rule to alert if the average value of a column changes by more than 5% compared to the previous day.

Data Protection

Data Protection enables non-admin users to have selected columns from a table be masked. It imposes restrictions on columns containing personally identifiable information (PII).

For example, enabling PII protection on sensitive columns hides the data from unauthorized users.

Data Quality Policy

A Data Quality Policy measures how healthy the data is within a data source from a consumer or business standpoint. Multiple policies can be executed to check data quality.

For example, a data quality policy may enforce that no null values are present in critical columns.

Data Reconciliation Policy

A Data Reconciliation Policy refers to comparing the target data to the original source data to ensure that data migration transfers the data correctly. It can be created in ADOC between two assets of similar type or between assets that can be profiled.

For example, reconciling data between a source database and a destination data warehouse after migration.

Data SourceA data source is the origin location of the data being used. The database is located on a remote server and is accessible via database connections. To retrieve data, a server must establish a connection to the database.
Discover PageThe Discover Page provides a list of all the different assets present in your ecosystem while configuring ADOC to track them, with various filtering capabilities.

E

TermDefinition
Entity Relationship (ER) DiagramAn ER Diagram is a graphical representation of the relationships between entities in a database. It helps visualize data structures and connections, aiding in database design and data lineage understanding.
Error Metric

Error metrics are measurements used to evaluate the accuracy of models or data processes. In ADOC, they can be used to track the number of errors detected during data profiling or reconciliation processes.

For example, the Mean Absolute Error (MAE) can be used to compare the predicted values against actual values to identify deviations.

Event LogsEvent logs capture and store occurrences of system activities, providing a traceable history for auditing and troubleshooting purposes. They contain information such as event types, timestamps, and source details.
Export Policy

An export policy is a set of rules that dictate how data and reports can be exported from the ADOC platform. This includes configuring formats, destinations, and user permissions for exports.

For example, exporting a data quality report as a CSV file to an external storage location.

F

TermDefinition
Filter UDT (User Defined Template)A Filter UDT allows users to define their own data quality rules in languages such as Java, Scala, Python, JavaScript, or Spark SQL to filter records in a data asset (table).

G

TermDefinition
Grouping Policy

Grouping policies allow users to aggregate multiple assets or rules under a common group to simplify management and execution. These policies can be applied to a set of data quality rules or assets that share similar characteristics.

For example, grouping multiple customer-related data sources under a single policy for data quality monitoring.

H

TermDefinition
Historical MetricsHistorical metrics track historical data points over time to analyze trends, detect anomalies, and set baselines for future data observations. For example, using historical metrics of data quality to determine acceptable thresholds for missing values or data type inconsistencies.

I

TermDefinition
Ingestion Pipeline

An ingestion pipeline is a series of data operations that pull raw data into the ADOC platform for further processing and analysis.

For example, setting up an ingestion pipeline to pull transactional data from AWS S3 into ADOC.

J

TermDefinition
JobsJobs are operations triggered when an action is performed in ADOC. Various jobs can be viewed and monitored in the jobs window, such as profile jobs, auto profile queues, data quality jobs, reconciliation jobs, and upcoming jobs.

K

TermDefinition
Ingestion Pipeline

KPIs are measurable values used to assess the performance and success of data operations or policies within ADOC.

For example, tracking the percentage of successful data quality checks as a KPI for data reliability.

L

TermsDefinition
Label

Labels allow data assets to be categorized by purpose, owner, or business function. Labels use key-value pairs defined over an asset to facilitate advanced search methods.

For example, assigning a label like "Confidentiality: High" to sensitive data assets.

Lineage

Lineage depicts how data was obtained from various sources, showing a graphical representation of data flow.

For example, lineage diagrams illustrate how data moves through different ETL processes from source to destination.

Lookup Data Quality Policy

A Lookup Data Quality Policy enables the validation of values in a table against a reference dataset or predefined set of valid values.

For example, ensuring that all state codes in a dataset match a predefined list of US state abbreviations.

M

TermDefinition
Metadata

Metadata describes information about data, making it easier to locate, use, and reuse specific data instances.

For example, metadata for a column "Name" in a table "Customer_Information" indicates that the column's data type is string.

Monitoring Dashboard

The Monitoring Dashboard provides a consolidated view of the performance and status of all assets, jobs, and policies within ADOC.

For example, monitoring the status of ongoing profiling jobs and any data quality issues detected.

Monitors

Monitors are entities that continuously observe data sources and assets to track specific metrics or patterns, providing real-time observability.

For example, setting up a monitor to observe data drift in a dataset and alerting if the change exceeds the tolerance threshold.

N

TermDefinition
Notification ChannelA Notification Channel is used to configure notifications via email, Slack, Hangouts, Jira, or a webhook URL. Multiple notification channels can be set up depending on user segregation.

O

TermDefinition
Observability

Observability refers to the capability of the platform to measure the internal state of a data system by analyzing the output and logs, enabling users to diagnose and fix issues effectively.

For example, observing a data pipeline to understand how changes in the source system impact data quality downstream.

P

TermDefinition
Persistence Path

The Persistence Path specifies the result location at the asset level. Data quality results will be stored in the specified persistence path of any storage, such as Amazon S3, HDFS, Google Cloud Storage, or Azure Data Lake.

The persistence path can be set globally in the admin console but can be overridden if configured at the asset level.

PipelineA Pipeline represents the complete ETL (Extract-Transform-Load) workflow and contains asset nodes and associated jobs. It facilitates observability of data movement from source repositories to target repositories.
Policy

A policy is a rule mapped to an asset to perform specific actions. There are three types of policies that can be defined for an asset:

  1. Data Quality Policy – performed on a single asset.
  2. Data Drift Policy – performed by comparing two assets.
  3. Schema Drift Policy – performed by comparing two assets.
Policy Template

A Policy Template contains predefined rules and configurations that can be applied to create new data quality or reconciliation policies, saving time and ensuring consistency.

For example, using a policy template to standardize rules for missing values and unique constraints.

Profile

Data profiling is the process of reviewing, analyzing, and summarizing data into meaningful information. It produces a high-level overview that assists in identifying data quality concerns.

For example, profiling an asset provides statistical data such as minimum, maximum, and average values.

Pushdown Data Engine

The Pushdown Data Engine is a data processing engine within ADOC that performs data operations directly on the source system, reducing the need to move data across the network.

For example, performing a join operation between two tables directly in the data warehouse instead of pulling the data into ADOC.

Q

TermDefinition
Queries

A query is a request for data results from the database or an action on the data. The Queries tab displays the top 50 successful or failed queries, along with their estimated cost, user, database, warehouse, and execution status.

For example, viewing the estimated cost by query type (CRUD) and cost per warehouse in graphical form.

R

TermDefinition
Rule SetA Rule Set is a group of data quality rules that exist outside of an asset-level policy. It can be used to automatically create policies by applying them over assets.
RulesRules are defined functions used when configuring a policy to validate data during policy execution. A policy can contain multiple rules that check for null values, uniqueness, and other attributes on an asset.
Reference Asset

A reference asset is a dataset or schema used as a baseline to compare with other datasets during reconciliation or validation operations.

For example, using a "Customer Master" dataset as a reference asset to validate data consistency in transactional datasets

S

TermDefinition
Sample DataSample Data represents the content of attributes in a table, providing example values for understanding the data.
Schema Drift Policy

A Schema Drift Policy detects changes to a schema or table between previously crawled and currently crawled data sources. In ADOC, schema drift policies are executed every time a data source is crawled.

For example, detecting if a new column has been added or an existing column's data type has changed.

Source Connection

A source connection is a configuration that establishes a link between ADOC and a data source, enabling data extraction, profiling, and monitoring.

For example, configuring a source connection to an Azure Data Lake storage account.

StorageThe Storage tab provides a summary of storage costs. Table, database, and high churn table storage costs can be viewed to take appropriate actions.
Stock Monitor

Stock Monitors are pre-built monitors available in the ADOC platform to track common metrics and observability parameters.

For example, using a stock monitor to track schema changes in a dataset.

T

TermDefinition
TagsTags are metadata that help describe an asset and allow it to be found through browsing or searching. Tags aid in data discoverability and can be linked to assets and policies. Tags can be generated by the system or by users.
Template

A Template contains multiple rule definitions. These rule definitions are applied when a Data Quality Policy is created. Instead of defining rules for each policy, you can use a policy template that contains predefined rules.

For example, when a policy template is added, all rule definitions in the template are automatically evaluated.

Transform UDTA Transform User Defined Template allows you to extract or manipulate values from a record in a data asset (table), using custom logic defined in languages such as Java, Scala, Python, JavaScript, or Spark SQL.

V

TermDefinition
Virtual Data Source

A Virtual Data Source refers to a logical representation of a data source that enables the integration of multiple physical data sources into a single entity.

For example, combining multiple Google Cloud Storage buckets into a single virtual data source for unified monitoring.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard