ADOC Glossary

The terms used in relation to ADOC are listed here, along with a brief description of what each one implies.

A

TermDefinition
ADOC CLIA command-line interface for ADOC. It can generate binaries with dependencies, upload binaries, and create User Defined Template (UDT) definitions
Access ControlRegulates access to computer or network resources based on user roles within an organization
Absolute File CountMonitors the absolute number of files.
Absolute File SizeMonitors the absolute size of files
Absolute Row CountRelated to Data Cadence, which can include metrics for the absolute number of rows for a certain assets.
Alerts

An alert is a notification generated when a observability parameter fails or succeeds.

For example, users receive an alert if a data quality check identifies anomalies in a dataset.

Analysis Service

The Analysis Service performs data profiling, rule executions, and data sampling tasks using various configuration parameters, such as Data Retention Days, Historical Metrics Interval for Anomaly Detection, and Minimum Required Historical Metrics for Anomaly Detection.

For example, the Analysis Service can identify unexpected items or events in a dataset using historical metrics.

Anomaly

An anomaly refers to irregularities detected in a dataset's values using historical metrics. ADOC allows customization of Minimum Required Historical Metrics and Historical Metric Interval.

For example, anomalies can include incorrect data values, unexpected data elements, or outlier records.

Anomaly Detection SettingsADOC allows customization of Minimum Required Historical Metrics and Historical Metric Interval.
API (Application Programming Interface) keys

An API key is a unique identifier used to authenticate a user, developer, or calling program to an API.

For example, the POST Start Profiling API method requires an API key to initiate the asset profiling process.

Asset

An asset is an entity composed of data, such as a warehouse or database containing schemas and tables, or files in storage services like S3, GCS, or ADLS.

For example, a data asset may consist of data records organized into schemas, tables, and columns.

Asset List ViewDisplay of all the assets discovered in ADOC.
Asset SimilarityCompares the degree of similarity between columns in multiple tables, calculating similarity percentages and producing a Table Similarity score.
Audit LogADOC logs Data Reliability and Compute events, such as crawler activities and scheduling
Auto ProfileAuto profiling is the automated processing of information to analyze data. This process displays data source assets that have auto profiling enabled.
Avro File FormatA data serialization system used by Hadoop.

B

TermDefinition
Big DataBig Data refers to extremely large volumes of data, arriving at high velocity, and encompassing a wide variety of data types (structured, semi-structured, and unstructured)
Business GlossaryThe Business Glossary captures all information about specific assets, pipelines, or business processes for future reference.
Bulk PoliciesFeature that simplifies creating data quality rules, grouping them, and applying them to data sources. It automatically creates a Data Quality policy and applies the rules to assets matching a tag-based condition

C

TermDefinition
Change in File CountTracks variations in file count.
Change in File SizeTracks variations in file size.
Cloud ServiceStores warehouse names, tables names, database names, usage and contract metadata, login details, and users details
Compute

The Compute feature provides an estimate of resource utilization and the compute and storage costs of the underlying infrastructure. It offers recommendations to optimize resource allocation and reduce costs.

For example, if instances are running for a specific time at a certain cost, the Compute feature helps understand the cost and provides recommendations on efficient instance usage.

Context SwitchingADOC supports context switching in heterogeneous pipelines
Contract

A contract is a summary that includes the organization's name, account consumption, capacity used as a percentage, and the contract end date. Contract costs can be predicted using previous cost consumption metrics.

For example, viewing costs across accounts and services in an organization shows storage, cloud services, replication, data transfer, and compute costs.

Crawl/Crawler

Crawling is the process of extracting metadata from data sources. After establishing a connection to the data source, you can crawl metadata from the remote data source into the ADOC database.

For example, crawling retrieves metadata such as owner, table type, and row count.

D

TermDefinition
Data Drift Policy

A Data Drift Policy determines the percentage change in certain metrics when the underlying data changes. Users can create data drift rules to validate data changes against tolerance thresholds for each metric type.

For example, setting a data drift rule to alert if the average value of a column changes by more than 5% compared to the previous day.

Data Freshness PolicyTracks whether data is updated within the expected timeframe.
Data GovernanceComposite term used to ensures data quality and governance.
Data LineageEstablishes data lineage by detecting external data sources and finding relationships between them, enhancing cross-system data visibility. It depicts how data was obtained from various sources, showing a graphical representation of data flow.
Data PlaneIt is required to add data source in ADOC. The Data Plane list view displays all the Data Planes created in ADOC. The Data Plane is a client-managed layer within the ADOC architecture that directly interacts with and manages the client's data resources. It facilitates the smooth transfer of data between various software systems and is essential for leveraging Data Reliability for a data source
Data Protection

Data Protection enables non-admin users to have selected columns from a table be masked. It imposes restrictions on columns containing personally identifiable information (PII).

For example, enabling PII protection on sensitive columns hides the data from unauthorized users.

Data Quality Policy

A Data Quality Policy measures how healthy the data is within a data source from a consumer or business standpoint. Multiple policies can be executed to check data quality.

For example, a data quality policy may enforce that no null values are present in critical columns.

Data Reconciliation Policy

A Data Reconciliation Policy refers to comparing the target data to the original source data to ensure that data migration transfers the data correctly. It can be created in ADOC between two assets of similar type or between assets that can be profiled.

For example, reconciling data between a source database and a destination data warehouse after migration.

Data ReliabilityADOC's function for ensuring data quality and governance, providing tools to maintain the integrity, consistency, and reliability of data assets.
Data RetentionIt sets rules to specify how long data should be kept.
Data SourceA data source is the origin location of the data being used. The database is located on a remote server and is accessible via database connections. To retrieve data, a server must establish a connection to the database.
Data SynchronizationADOC supports metadata synchronization. Data Synchronization within ADOC ensures data consistency between different systems or storage locations by periodically comparing data and metadata to identify and resolve discrepancies. This process involves updating metadata or flagging inconsistencies to maintain data integrity across the ADOC environment.
DependenciesADOC CLI build command will generate fat binaries with all the dependencies.
Discover PageThe Discover Page provides a list of all the different assets present in your ecosystem while configuring ADOC to track them, with various filtering capabilities.
Discrepancy ResolutionReconciliation policies build data trust and reliability for discrepancy resolution.

E

TermDefinition
Entity Relationship (ER) DiagramAn ER Diagram is a graphical representation of the relationships between entities in a database. It helps visualize data structures and connections, aiding in database design and data lineage understanding.
Error Metric

Error metrics are measurements used to evaluate the accuracy of models or data processes. In ADOC, they can be used to track the number of errors detected during data profiling or reconciliation processes.

For example, the Mean Absolute Error (MAE) can be used to compare the predicted values against actual values to identify deviations.

Event LogsEvent logs capture and store occurrences of system activities, providing a traceable history for auditing and troubleshooting purposes. They contain information such as event types, timestamps, and source details.
Export Policy

An export policy is a set of rules that dictate how data and reports can be exported from the ADOC platform. This includes configuring formats, destinations, and user permissions for exports.

For example, exporting a data quality report as a CSV file to an external storage location.

Execute Policy OperatorADOC provides using Execute Policy Operator. This feature executes a specified data policy (Data Quality or Reconciliation) within a data pipeline, which can be triggered upon span completion, either fully or incrementally, and links the execution to a specific pipeline run.
External IntegrationsThese are connections with third-party services and applications through the ADOC platform, with updated capabilities offering a flexible approach to load pipeline monitoring metadata independently of the platform's ongoing activity. They often utilize OAuth for secure, limited access without exposing login details

F

TermDefinition
FiltersFilters in ADOC help you narrow down and focus on the specific data you need, whether you're searching for assets, policies, or other information. They let you sift through the noise by setting criteria based on various attributes like data source, tags, or time ranges, so you can quickly find what's most relevant to you.
Filter UDT (User Defined Template)A Filter UDT allows users to define their own data quality rules in languages such as Java, Scala, Python, JavaScript, or Spark SQL to filter records in a data asset (table).

G

TermDefinition
Gen AI Assisted Metadata GenerationIt employs advanced AI algorithms to analyze data assets and generate descriptive metadata automatically to improve the discoverability and understandability of data assets, facilitating data governance and usage.
Grouping Policy

Grouping policies allow users to aggregate multiple assets or rules under a common group to simplify management and execution. These policies can be applied to a set of data quality rules or assets that share similar characteristics.

For example, grouping multiple customer-related data sources under a single policy for data quality monitoring.

H

TermDefinition
Home PageThe ADOC landing page provides an overview of Acceldata's capabilities and allows navigation to specific dashboards, recommendations, or actions.
Historical MetricsHistorical metrics track historical data points over time to analyze trends, detect anomalies, and set baselines for future data observations. For example, using historical metrics of data quality to determine acceptable thresholds for missing values or data type inconsistencies.

I

TermDefinition
Integration PointsIntegration Points are the specific interfaces and methods used within ADOC to connect to Hadoop Distributed File System (HDFS) and other related components, such as MapR, to facilitate data loading and management. These points ensure seamless compatibility and robust support for managing and monitoring data sources within the Hadoop ecosystem, thus improving data observability and reliability
Ingestion Pipeline

An ingestion pipeline is a series of data operations that pull raw data into the ADOC platform for further processing and analysis.

For example, setting up an ingestion pipeline to pull transactional data from AWS S3 into ADOC.

J

TermDefinition
JobsJobs are operations triggered when an action is performed in ADOC. Various jobs can be viewed and monitored in the jobs window, such as profile jobs, auto profile queues, data quality jobs, reconciliation jobs, and upcoming jobs.
Job StateThe final state of the job, providing a more detailed description of the process, such as metrics requests sent, partial analysis received, or the task being fully completed

K

TermDefinition
KPI

KPIs are measurable values used to assess the performance and success of data operations or policies within ADOC.

For example, tracking the percentage of successful data quality checks as a KPI for data reliability.

L

TermsDefinition
Label

Labels allow data assets to be categorized by purpose, owner, or business function. Labels use key-value pairs defined over an asset to facilitate advanced search methods.

For example, assigning a label like "Confidentiality: High" to sensitive data assets.

Lineage

Lineage depicts how data was obtained from various sources, showing a graphical representation of data flow.

For example, lineage diagrams illustrate how data moves through different ETL processes from source to destination.

Lookup TypeA toggle switch in Validation UDF. When enabled, ADOC recognizes that the Validation UDF would be used in a Lookup rule in Data Quality policy.
Lookup Data Quality Policy

A Lookup Data Quality Policy enables the validation of values in a table against a reference dataset or predefined set of valid values.

For example, ensuring that all state codes in a dataset match a predefined list of US state abbreviations.

M

TermDefinition
Metadata

Metadata describes information about data, making it easier to locate, use, and reuse specific data instances.

For example, metadata for a column "Name" in a table "Customer_Information" indicates that the column's data type is string.

Metadata SynchronizationADOC supports metadata synchronization. Metadata Synchronization is a process within ADOC that ensures consistency and alignment of metadata across various systems and clusters. This involves aligning table structures, formats, and other relevant metadata parameters between, for example, MapR HDFS/Hive and Apache HDFS/Hive. The goal is to accurately reflect the current state of data and facilitate efficient data discovery, understanding, and governance.
Minimal PrivilegeIn the context of ADOC Data Plane installation, refers to the practice of granting the least amount of permissions necessary for the Data Plane to function correctly. This approach enhances security by restricting the Data Plane's access only to the specific resources and actions it requires, reducing the potential impact of security breaches or unauthorized activities.
Monitoring Dashboard

The Monitoring Dashboard provides a consolidated view of the performance and status of all assets, jobs, and policies within ADOC.

For example, monitoring the status of ongoing profiling jobs and any data quality issues detected.

Monitors

Monitors are entities that continuously observe data sources and assets to track specific metrics or patterns, providing real-time observability.

For example, setting up a monitor to observe data drift in a dataset and alerting if the change exceeds the tolerance threshold.

Monthly Asset Profiling ScheduleADOC allows users to schedule asset profiling on a monthly basis, enabling automated execution and ensuring timely evaluation of data.

N

TermDefinition
Notification ChannelA Notification Channel is used to configure notifications via email, Slack, Hangouts, Jira, or a webhook URL. Multiple notification channels can be set up depending on user segregation.

O

TermDefinition
OAuth Integration

It lets ADOC connect securely to other platforms and applications. Instead of sharing your username and password, it's like giving ADOC a special key to access specific things, like your data, without exposing your personal login details, making it more secure.

This means you can seamlessly connect ADOC to services like Snowflake or Azure Databricks, streamlining your workflow while keeping your account safe

Observability

Observability refers to the capability of the platform to measure the internal state of a data system by analyzing the output and logs, enabling users to diagnose and fix issues effectively.

For example, observing a data pipeline to understand how changes in the source system impact data quality downstream.

Operating System

Dataplane Uses Operating System Level Metrics. This refers to the Data Plane's capability to gather performance metrics directly from the operating system on which it runs.

By collecting these metrics, ADOC can gain insights into resource utilization, system health, and overall performance of the Data Plan. These metrics help identify potential bottlenecks, optimize resource allocation, and ensure the Data Plane operates efficiently.

P

TermDefinition
PermissionsDefine user roles, access levels, and permissions within the ADOC platform based on your organization's requirements
Persistence Path

The Persistence Path specifies the result location at the asset level. Data quality results will be stored in the specified persistence path of any storage, such as Amazon S3, HDFS, Google Cloud Storage, or Azure Data Lake.

The persistence path can be set globally in the admin console but can be overridden if configured at the asset level.

PipelineA Pipeline represents the complete ETL (Extract-Transform-Load) workflow and contains asset nodes and associated jobs. It facilitates observability of data movement from source repositories to target repositories.
Policy

A policy is a rule mapped to an asset to perform specific actions. There are three types of policies that can be defined for an asset:

  1. Data Quality Policy – performed on a single asset.
  2. Data Drift Policy – performed by comparing two assets.
  3. Schema Drift Policy – performed by comparing two assets.
Policy ExecutionWhen you execute a policy, all of the rules stated in your policy are run, and you can view the results for each rule.
Policy Template

A Policy Template contains predefined rules and configurations that can be applied to create new data quality or reconciliation policies, saving time and ensuring consistency.

For example, using a policy template to standardize rules for missing values and unique constraints.

Profile

Data profiling is the process of reviewing, analyzing, and summarizing data into meaningful information. It produces a high-level overview that assists in identifying data quality concerns.

For example, profiling an asset provides statistical data such as minimum, maximum, and average values.

Pushdown Data Engine

The Pushdown Data Engine is a data processing engine within ADOC that performs data operations directly on the source system, reducing the need to move data across the network.

For example, performing a join operation between two tables directly in the data warehouse instead of pulling the data into ADOC.

Q

TermDefinition
Queries

A query is a request for data results from the database or an action on the data. The Queries tab displays the top 50 successful or failed queries, along with their estimated cost, user, database, warehouse, and execution status.

For example, viewing the estimated cost by query type (CRUD) and cost per warehouse in graphical form.

R

TermDefinition
RBACThe Role Based Access Control (RBAC) feature in ADOC, which provided authorization control across the entire application, has been deprecated. Starting with ADOC V4.0, RBAC has been superseded by Resource-Based Access Management (RBAM), offering more granular control by introducing domains and resource groups
RBAM

Resource-Based Access Management is a method of regulating access to resources. RBAM is an enhanced access control system in ADOC that lets administrators define who can access specific resources (like assets, reports, and policies) by using domains and resource groups.

This goes beyond traditional Role-Based Access Control (RBAC) by providing granular control over both actions and resource visibility, ensuring users can only see and interact with the resources relevant to their roles.

RBAM improves security, aids in regulatory compliance, and supports delegated administration, making resource management more scalable and efficient for enterprises

ReconciliationData Reconciliation Policy refers to comparing the target data to the original source data to ensure that data migration transfers the data correctly. It can be created in ADOC between two assets of similar type or between assets that can be profiled.
Regex MatchRegex Match is a type of data quality check in ADOC that validates data patterns within columns of a single asset, utilizing regular expressions for structured data like email addresses. This ensures that the column values adhere to a specified pattern. When creating a data quality policy in ADOC, a Pattern Match rule can be selected to check if column values conform to a given regular expression.
Rule SetA Rule Set is a group of data quality rules that exist outside of an asset-level policy. It can be used to automatically create policies by applying them over assets.
RulesRules are defined functions used when configuring a policy to validate data during policy execution. A policy can contain multiple rules that check for null values, uniqueness, and other attributes on an asset.
Reference Asset

A reference asset is a dataset or schema used as a baseline to compare with other datasets during reconciliation or validation operations.

For example, using a "Customer Master" dataset as a reference asset to validate data consistency in transactional datasets

S

TermDefinition
Sample DataSample Data represents the content of attributes in a table, providing example values for understanding the data.
Schema Drift Policy

A Schema Drift Policy detects changes to a schema or table between previously crawled and currently crawled data sources. In ADOC, schema drift policies are executed every time a data source is crawled.

For example, detecting if a new column has been added or an existing column's data type has changed.

Schema RegistryA central repository for Apache Avro schemas and includes a REST API for schema storage and retrieval.
Security MeasuresADOC complies with privacy and security requirements, ensuring data never leaves the environment and supports configurations where PII is not removed
SegmentsThe Data Reliability function in Acceldata Data Observability Cloud(ADOC) allows you to apply polices on assets, to maintain data quality in your assets.
Source Connection

A source connection is a configuration that establishes a link between ADOC and a data source, enabling data extraction, profiling, and monitoring.

For example, configuring a source connection to an Azure Data Lake storage account.

SQL RuleSQL Rule is a feature within ADOC's Data Quality Policy that enables users to create custom data validation and transformation logic using SQL expressions.
StorageThe Storage tab provides a summary of storage costs. Table, database, and high churn table storage costs can be viewed to take appropriate actions.
Stock Monitor

Stock Monitors are pre-built monitors available in the ADOC platform to track common metrics and observability parameters.

For example, using a stock monitor to track schema changes in a dataset.

System TimeDataplane uses operating system level metrics, such as CPU utilization, load, and system time.

T

TermDefinition
TagsTags are metadata that help describe an asset and allow it to be found through browsing or searching. Tags aid in data discoverability and can be linked to assets and policies. Tags can be generated by the system or by users. It is also used to apply policies to assets. ADOC provides a Tags page where you can manage tags
Template

A Template contains multiple rule definitions. These rule definitions are applied when a Data Quality Policy is created. Instead of defining rules for each policy, you can use a policy template that contains predefined rules.

For example, when a policy template is added, all rule definitions in the template are automatically evaluated.

Transform UDTA Transform User Defined Template allows you to extract or manipulate values from a record in a data asset (table), using custom logic defined in languages such as Java, Scala, Python, JavaScript, or Spark SQL.

V

TermDefinition
Validation UDFYou can use Validation UDF in a Lookup rule in Data Quality policy. You can create a lookup rule that compares multiple target columns with multiple reference columns by using User Defined Templates (UDT).
Virtual Data Source

A Virtual Data Source refers to a logical representation of a data source that enables the integration of multiple physical data sources into a single entity.

For example, combining multiple Google Cloud Storage buckets into a single virtual data source for unified monitoring.

W

TermDefinition
WebserverA Web Server is software that handles requests to access the ADOC platform's interface. Think of it as the middleman that takes your instructions and displays the ADOC platform to you through a web-based interface, often using software like Apache or Nginx.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard