Search or ask a question

ADOC Glossary

The terms used in relation to ADOC are listed here, along with a brief description of what each one implies.

A

`Term`	Definition
ADOC CLI	A command-line interface for ADOC. It can generate binaries with dependencies, upload binaries, and create User Defined Template (UDT) definitions
Access Control	Regulates access to computer or network resources based on user roles within an organization
Absolute File Count	Monitors the absolute number of files.
Absolute File Size	Monitors the absolute size of files
Absolute Row Count	Related to Data Cadence, which can include metrics for the absolute number of rows for a certain assets.
Alerts	An alert is a notification generated when a observability parameter fails or succeeds. For example, users receive an alert if a data quality check identifies anomalies in a dataset.
Analysis Service	The Analysis Service performs data profiling, rule executions, and data sampling tasks using various configuration parameters, such as Data Retention Days, Historical Metrics Interval for Anomaly Detection, and Minimum Required Historical Metrics for Anomaly Detection. For example, the Analysis Service can identify unexpected items or events in a dataset using historical metrics.
Anomaly	An anomaly refers to irregularities detected in a dataset's values using historical metrics. ADOC allows customization of Minimum Required Historical Metrics and Historical Metric Interval. For example, anomalies can include incorrect data values, unexpected data elements, or outlier records.
Anomaly Detection Settings	ADOC allows customization of Minimum Required Historical Metrics and Historical Metric Interval.
API (Application Programming Interface) keys	An API key is a unique identifier used to authenticate a user, developer, or calling program to an API. For example, the POST Start Profiling API method requires an API key to initiate the asset profiling process.
Asset	An asset is an entity composed of data, such as a warehouse or database containing schemas and tables, or files in storage services like S3, GCS, or ADLS. For example, a data asset may consist of data records organized into schemas, tables, and columns.
Asset List View	Display of all the assets discovered in ADOC.
Asset Similarity	Compares the degree of similarity between columns in multiple tables, calculating similarity percentages and producing a Table Similarity score.
Audit Log	ADOC logs Data Reliability and Compute events, such as crawler activities and scheduling
Auto Profile	Auto profiling is the automated processing of information to analyze data. This process displays data source assets that have auto profiling enabled.
Avro File Format	A data serialization system used by Hadoop.

B

Term	Definition
Big Data	Big Data refers to extremely large volumes of data, arriving at high velocity, and encompassing a wide variety of data types (structured, semi-structured, and unstructured)
Business Glossary	The Business Glossary captures all information about specific assets, pipelines, or business processes for future reference.
Bulk Policies	Feature that simplifies creating data quality rules, grouping them, and applying them to data sources. It automatically creates a Data Quality policy and applies the rules to assets matching a tag-based condition

C

Term	Definition
Change in File Count	Tracks variations in file count.
Change in File Size	Tracks variations in file size.
Cloud Service	Stores warehouse names, tables names, database names, usage and contract metadata, login details, and users details
Compute	The Compute feature provides an estimate of resource utilization and the compute and storage costs of the underlying infrastructure. It offers recommendations to optimize resource allocation and reduce costs. For example, if instances are running for a specific time at a certain cost, the Compute feature helps understand the cost and provides recommendations on efficient instance usage.
Context Switching	ADOC supports context switching in heterogeneous pipelines
Contract	A contract is a summary that includes the organization's name, account consumption, capacity used as a percentage, and the contract end date. Contract costs can be predicted using previous cost consumption metrics. For example, viewing costs across accounts and services in an organization shows storage, cloud services, replication, data transfer, and compute costs.
Crawl/Crawler	Crawling is the process of extracting metadata from data sources. After establishing a connection to the data source, you can crawl metadata from the remote data source into the ADOC database. For example, crawling retrieves metadata such as owner, table type, and row count.

D

Term	Definition
Data Drift Policy	A Data Drift Policy determines the percentage change in certain metrics when the underlying data changes. Users can create data drift rules to validate data changes against tolerance thresholds for each metric type. For example, setting a data drift rule to alert if the average value of a column changes by more than 5% compared to the previous day.
Data Freshness Policy	Tracks whether data is updated within the expected timeframe.
Data Governance	Composite term used to ensures data quality and governance.
Data Lineage	Establishes data lineage by detecting external data sources and finding relationships between them, enhancing cross-system data visibility. It depicts how data was obtained from various sources, showing a graphical representation of data flow.
Data Plane	It is required to add data source in ADOC. The Data Plane list view displays all the Data Planes created in ADOC. The Data Plane is a client-managed layer within the ADOC architecture that directly interacts with and manages the client's data resources. It facilitates the smooth transfer of data between various software systems and is essential for leveraging Data Reliability for a data source
Data Protection	Data Protection enables non-admin users to have selected columns from a table be masked. It imposes restrictions on columns containing personally identifiable information (PII). For example, enabling PII protection on sensitive columns hides the data from unauthorized users.
Data Quality Policy	A Data Quality Policy measures how healthy the data is within a data source from a consumer or business standpoint. Multiple policies can be executed to check data quality. For example, a data quality policy may enforce that no null values are present in critical columns.
Data Reconciliation Policy	A Data Reconciliation Policy refers to comparing the target data to the original source data to ensure that data migration transfers the data correctly. It can be created in ADOC between two assets of similar type or between assets that can be profiled. For example, reconciling data between a source database and a destination data warehouse after migration.
Data Reliability	ADOC's function for ensuring data quality and governance, providing tools to maintain the integrity, consistency, and reliability of data assets.
Data Retention	It sets rules to specify how long data should be kept.
Data Source	A data source is the origin location of the data being used. The database is located on a remote server and is accessible via database connections. To retrieve data, a server must establish a connection to the database.
Data Synchronization	ADOC supports metadata synchronization. Data Synchronization within ADOC ensures data consistency between different systems or storage locations by periodically comparing data and metadata to identify and resolve discrepancies. This process involves updating metadata or flagging inconsistencies to maintain data integrity across the ADOC environment.
Dependencies	ADOC CLI build command will generate fat binaries with all the dependencies.
Discover Page	The Discover Page provides a list of all the different assets present in your ecosystem while configuring ADOC to track them, with various filtering capabilities.
Discrepancy Resolution	Reconciliation policies build data trust and reliability for discrepancy resolution.

E

Term	Definition
Entity Relationship (ER) Diagram	An ER Diagram is a graphical representation of the relationships between entities in a database. It helps visualize data structures and connections, aiding in database design and data lineage understanding.
Error Metric	Error metrics are measurements used to evaluate the accuracy of models or data processes. In ADOC, they can be used to track the number of errors detected during data profiling or reconciliation processes. For example, the Mean Absolute Error (MAE) can be used to compare the predicted values against actual values to identify deviations.
Event Logs	Event logs capture and store occurrences of system activities, providing a traceable history for auditing and troubleshooting purposes. They contain information such as event types, timestamps, and source details.
Export Policy	An export policy is a set of rules that dictate how data and reports can be exported from the ADOC platform. This includes configuring formats, destinations, and user permissions for exports. For example, exporting a data quality report as a CSV file to an external storage location.
Execute Policy Operator	ADOC provides using Execute Policy Operator. This feature executes a specified data policy (Data Quality or Reconciliation) within a data pipeline, which can be triggered upon span completion, either fully or incrementally, and links the execution to a specific pipeline run.
External Integrations	These are connections with third-party services and applications through the ADOC platform, with updated capabilities offering a flexible approach to load pipeline monitoring metadata independently of the platform's ongoing activity. They often utilize OAuth for secure, limited access without exposing login details

F

Term	Definition
Filters	Filters in ADOC help you narrow down and focus on the specific data you need, whether you're searching for assets, policies, or other information. They let you sift through the noise by setting criteria based on various attributes like data source, tags, or time ranges, so you can quickly find what's most relevant to you.
Filter UDT (User Defined Template)	A Filter UDT allows users to define their own data quality rules in languages such as Java, Scala, Python, JavaScript, or Spark SQL to filter records in a data asset (table).

G

Term	Definition
Gen AI Assisted Metadata Generation	It employs advanced AI algorithms to analyze data assets and generate descriptive metadata automatically to improve the discoverability and understandability of data assets, facilitating data governance and usage.
Grouping Policy	Grouping policies allow users to aggregate multiple assets or rules under a common group to simplify management and execution. These policies can be applied to a set of data quality rules or assets that share similar characteristics. For example, grouping multiple customer-related data sources under a single policy for data quality monitoring.

H

Term	Definition
Home Page	The ADOC landing page provides an overview of Acceldata's capabilities and allows navigation to specific dashboards, recommendations, or actions.
Historical Metrics	Historical metrics track historical data points over time to analyze trends, detect anomalies, and set baselines for future data observations. For example, using historical metrics of data quality to determine acceptable thresholds for missing values or data type inconsistencies.

I

Term	Definition
Integration Points	Integration Points are the specific interfaces and methods used within ADOC to connect to Hadoop Distributed File System (HDFS) and other related components, such as MapR, to facilitate data loading and management. These points ensure seamless compatibility and robust support for managing and monitoring data sources within the Hadoop ecosystem, thus improving data observability and reliability
Ingestion Pipeline	An ingestion pipeline is a series of data operations that pull raw data into the ADOC platform for further processing and analysis. For example, setting up an ingestion pipeline to pull transactional data from AWS S3 into ADOC.

J

Term	Definition
Jobs	Jobs are operations triggered when an action is performed in ADOC. Various jobs can be viewed and monitored in the jobs window, such as profile jobs, auto profile queues, data quality jobs, reconciliation jobs, and upcoming jobs.
Job State	The final state of the job, providing a more detailed description of the process, such as metrics requests sent, partial analysis received, or the task being fully completed

K

Term	Definition
KPI	KPIs are measurable values used to assess the performance and success of data operations or policies within ADOC. For example, tracking the percentage of successful data quality checks as a KPI for data reliability.

L

Terms	Definition
Label	Labels allow data assets to be categorized by purpose, owner, or business function. Labels use key-value pairs defined over an asset to facilitate advanced search methods. For example, assigning a label like "Confidentiality: High" to sensitive data assets.
Lineage	Lineage depicts how data was obtained from various sources, showing a graphical representation of data flow. For example, lineage diagrams illustrate how data moves through different ETL processes from source to destination.
Lookup Type	A toggle switch in Validation UDF. When enabled, ADOC recognizes that the Validation UDF would be used in a Lookup rule in Data Quality policy.
Lookup Data Quality Policy	A Lookup Data Quality Policy enables the validation of values in a table against a reference dataset or predefined set of valid values. For example, ensuring that all state codes in a dataset match a predefined list of US state abbreviations.

M

Term	Definition
Metadata	Metadata describes information about data, making it easier to locate, use, and reuse specific data instances. For example, metadata for a column "Name" in a table "Customer_Information" indicates that the column's data type is string.
Metadata Synchronization	ADOC supports metadata synchronization. Metadata Synchronization is a process within ADOC that ensures consistency and alignment of metadata across various systems and clusters. This involves aligning table structures, formats, and other relevant metadata parameters between, for example, MapR HDFS/Hive and Apache HDFS/Hive. The goal is to accurately reflect the current state of data and facilitate efficient data discovery, understanding, and governance.
Minimal Privilege	In the context of ADOC Data Plane installation, refers to the practice of granting the least amount of permissions necessary for the Data Plane to function correctly. This approach enhances security by restricting the Data Plane's access only to the specific resources and actions it requires, reducing the potential impact of security breaches or unauthorized activities.
Monitoring Dashboard	The Monitoring Dashboard provides a consolidated view of the performance and status of all assets, jobs, and policies within ADOC. For example, monitoring the status of ongoing profiling jobs and any data quality issues detected.
Monitors	Monitors are entities that continuously observe data sources and assets to track specific metrics or patterns, providing real-time observability. For example, setting up a monitor to observe data drift in a dataset and alerting if the change exceeds the tolerance threshold.
Monthly Asset Profiling Schedule	ADOC allows users to schedule asset profiling on a monthly basis, enabling automated execution and ensuring timely evaluation of data.

N

Term	Definition
Notification Channel	A Notification Channel is used to configure notifications via email, Slack, Hangouts, Jira, or a webhook URL. Multiple notification channels can be set up depending on user segregation.

O

Term	Definition
OAuth Integration	It lets ADOC connect securely to other platforms and applications. Instead of sharing your username and password, it's like giving ADOC a special key to access specific things, like your data, without exposing your personal login details, making it more secure. This means you can seamlessly connect ADOC to services like Snowﬂake or Azure Databricks, streamlining your workflow while keeping your account safe
Observability	Observability refers to the capability of the platform to measure the internal state of a data system by analyzing the output and logs, enabling users to diagnose and fix issues effectively. For example, observing a data pipeline to understand how changes in the source system impact data quality downstream.
Operating System	Dataplane Uses Operating System Level Metrics. This refers to the Data Plane's capability to gather performance metrics directly from the operating system on which it runs. By collecting these metrics, ADOC can gain insights into resource utilization, system health, and overall performance of the Data Plan. These metrics help identify potential bottlenecks, optimize resource allocation, and ensure the Data Plane operates efficiently.

P

Term	Definition
Permissions	Define user roles, access levels, and permissions within the ADOC platform based on your organization's requirements
Persistence Path	The Persistence Path specifies the result location at the asset level. Data quality results will be stored in the specified persistence path of any storage, such as Amazon S3, HDFS, Google Cloud Storage, or Azure Data Lake. The persistence path can be set globally in the admin console but can be overridden if configured at the asset level.
Pipeline	A Pipeline represents the complete ETL (Extract-Transform-Load) workflow and contains asset nodes and associated jobs. It facilitates observability of data movement from source repositories to target repositories.
Policy	A policy is a rule mapped to an asset to perform specific actions. There are three types of policies that can be defined for an asset: Data Quality Policy – performed on a single asset. Data Drift Policy – performed by comparing two assets. Schema Drift Policy – performed by comparing two assets.
Policy Execution	When you execute a policy, all of the rules stated in your policy are run, and you can view the results for each rule.
Policy Template	A Policy Template contains predefined rules and configurations that can be applied to create new data quality or reconciliation policies, saving time and ensuring consistency. For example, using a policy template to standardize rules for missing values and unique constraints.
Profile	Data profiling is the process of reviewing, analyzing, and summarizing data into meaningful information. It produces a high-level overview that assists in identifying data quality concerns. For example, profiling an asset provides statistical data such as minimum, maximum, and average values.
Pushdown Data Engine	The Pushdown Data Engine is a data processing engine within ADOC that performs data operations directly on the source system, reducing the need to move data across the network. For example, performing a join operation between two tables directly in the data warehouse instead of pulling the data into ADOC.

Q

Term	Definition
Queries	A query is a request for data results from the database or an action on the data. The Queries tab displays the top 50 successful or failed queries, along with their estimated cost, user, database, warehouse, and execution status. For example, viewing the estimated cost by query type (CRUD) and cost per warehouse in graphical form.

R

Term	Definition
RBAC	The Role Based Access Control (RBAC) feature in ADOC, which provided authorization control across the entire application, has been deprecated. Starting with ADOC V4.0, RBAC has been superseded by Resource-Based Access Management (RBAM), offering more granular control by introducing domains and resource groups
RBAM	Resource-Based Access Management is a method of regulating access to resources. RBAM is an enhanced access control system in ADOC that lets administrators define who can access specific resources (like assets, reports, and policies) by using domains and resource groups. This goes beyond traditional Role-Based Access Control (RBAC) by providing granular control over both actions and resource visibility, ensuring users can only see and interact with the resources relevant to their roles. RBAM improves security, aids in regulatory compliance, and supports delegated administration, making resource management more scalable and efficient for enterprises
Reconciliation	Data Reconciliation Policy refers to comparing the target data to the original source data to ensure that data migration transfers the data correctly. It can be created in ADOC between two assets of similar type or between assets that can be profiled.
Regex Match	Regex Match is a type of data quality check in ADOC that validates data patterns within columns of a single asset, utilizing regular expressions for structured data like email addresses. This ensures that the column values adhere to a specified pattern. When creating a data quality policy in ADOC, a Pattern Match rule can be selected to check if column values conform to a given regular expression.
Rule Set	A Rule Set is a group of data quality rules that exist outside of an asset-level policy. It can be used to automatically create policies by applying them over assets.
Rules	Rules are defined functions used when configuring a policy to validate data during policy execution. A policy can contain multiple rules that check for null values, uniqueness, and other attributes on an asset.
Reference Asset	A reference asset is a dataset or schema used as a baseline to compare with other datasets during reconciliation or validation operations. For example, using a "Customer Master" dataset as a reference asset to validate data consistency in transactional datasets

S

Term	Definition
Sample Data	Sample Data represents the content of attributes in a table, providing example values for understanding the data.
Schema Drift Policy	A Schema Drift Policy detects changes to a schema or table between previously crawled and currently crawled data sources. In ADOC, schema drift policies are executed every time a data source is crawled. For example, detecting if a new column has been added or an existing column's data type has changed.
Schema Registry	A central repository for Apache Avro schemas and includes a REST API for schema storage and retrieval.
Security Measures	ADOC complies with privacy and security requirements, ensuring data never leaves the environment and supports configurations where PII is not removed
Segments	The Data Reliability function in Acceldata Data Observability Cloud(ADOC) allows you to apply polices on assets, to maintain data quality in your assets.
Source Connection	A source connection is a configuration that establishes a link between ADOC and a data source, enabling data extraction, profiling, and monitoring. For example, configuring a source connection to an Azure Data Lake storage account.
SQL Rule	SQL Rule is a feature within ADOC's Data Quality Policy that enables users to create custom data validation and transformation logic using SQL expressions.
Storage	The Storage tab provides a summary of storage costs. Table, database, and high churn table storage costs can be viewed to take appropriate actions.
Stock Monitor	Stock Monitors are pre-built monitors available in the ADOC platform to track common metrics and observability parameters. For example, using a stock monitor to track schema changes in a dataset.
System Time	Dataplane uses operating system level metrics, such as CPU utilization, load, and system time.

T

Term	Definition
Tags	Tags are metadata that help describe an asset and allow it to be found through browsing or searching. Tags aid in data discoverability and can be linked to assets and policies. Tags can be generated by the system or by users. It is also used to apply policies to assets. ADOC provides a Tags page where you can manage tags
Template	A Template contains multiple rule definitions. These rule definitions are applied when a Data Quality Policy is created. Instead of defining rules for each policy, you can use a policy template that contains predefined rules. For example, when a policy template is added, all rule definitions in the template are automatically evaluated.
Transform UDT	A Transform User Defined Template allows you to extract or manipulate values from a record in a data asset (table), using custom logic defined in languages such as Java, Scala, Python, JavaScript, or Spark SQL.

V

Term	Definition
Validation UDF	You can use Validation UDF in a Lookup rule in Data Quality policy. You can create a lookup rule that compares multiple target columns with multiple reference columns by using User Defined Templates (UDT).
Virtual Data Source	A Virtual Data Source refers to a logical representation of a data source that enables the integration of multiple physical data sources into a single entity. For example, combining multiple Google Cloud Storage buckets into a single virtual data source for unified monitoring.

W

Term	Definition
Webserver	A Web Server is software that handles requests to access the ADOC platform's interface. Think of it as the middleman that takes your instructions and displays the ADOC platform to you through a web-based interface, often using software like Apache or Nginx.

Last updated on

Was this page helpful?

On This Page

ADOC Glossary A B C D E F G H I J K L M N O P Q R S T V W