Discover
Concepts | Descriptions |
---|---|
Anomaly Detection | Anomaly detection is the identification of events, observations or data patterns which deviates from expected patterns or historical trends. ADOC allows you to create anomaly policy with a user-defined threshold, and to receive notifications when it is breached . This helps users to take actions or informed business decisions. |
Asset groups | As a Resource Group, Asset Groups are essential to ADOC. This tool helps administrators organize assets and grant access permissions. It ensures users only interact with role-relevant assets, improving security and operational efficiency. |
Assets | In ADOC, assets are data based entities. It shows every data-holding entity ADOC monitors, manages, and optimizes in an organization's data ecosystem. Warehouses, databases, schemas, tables, and files on S3, GCS, or ADLS are data assets. ADOC regularly examines data source assets after adding them to assure data quality and trust. After data source addition and crawling, Asset List View displays assets. ADOC's Discover Assets page enables users view, modify, and monitor policy mappings, data quality ratings, and alert statuses. |
Auto lineage | Auto lineage helps the platform identify, track, and visualize data flow and dependencies across an organization's data ecosystem's sources and pipelines. The auto lineage graph shows data flow from many sources and procedures. This is essential for detecting data modifications and quality issues. Query Analyzer Service aids ADOC auto-lineage. |
Cataloging | The process of automatically discovering, extracting, managing, and organizing metadata from an organization's different data assets into a searchable repository is called cataloging. ADOC relies on this feature to provide comprehensive visibility into data operations and ensure data dependability, quality, and performance. |
Crawling | The process of crawling involves extracting metadata from data sources. After connecting to a data source, ADOC crawls to load metadata into the database. This information is put together and searched for by the ADOC Data Plane Catalog Server. Crawling gives ADOC data quality, anomaly, and transformation checks more context. Accurate data is gathered by constantly observing assets. Data history is important because ADOC can't connect data sources like Power BI reports until they have been crawled or cataloged. ADOC features such as Schema Drift Monitoring starts when new assets are crawled. |
Data Lineage | Data lineage shows multi-source data flow. It records data flow through your data ecosystem's phases and changes. ADOC records data from origin to consumption, letting organizations see ecosystem changes. |
Data Quality | Data quality refers to the overall utility, reliability, and accuracy of data for its intended purpose. Some of the important data quality traits are accuracy, completeness, consistency, timeliness, validity, and uniqueness. Having high quality data helps enterprise trust their data, improve business results and regulatory compliance. |
Data Warehouse | It is a crucial part of enterprise data infrastructure. ADOC provides complete visibility by connecting to a company's data warehouse. Big data is stored, processed, and analyzed on this cloud platform. Data sources including data warehouses-list an organization's data assets, location, format, owner, and usage. Data warehouses and other data sources must be added and configured in ADOC to leverage its features. ADOC constantly examines its assets after adding a data warehouse to ensure reliable data. ADOC's Data Plane scrapes, processes, and monitors client data sources instantaneously. |
Discovery | Discovery in ADOC is the observability through identifying, investigating, and managing assets throughout the enterprise data ecosystem. This is mostly done through ADOC's Home page and the Discover Assets page. The ADOC platform home page summarizes observability, alerts, and recommendations. ADOC can track and filter assets in your ecosystem using the Discover asset page's 360 view. The solution streamlines data asset exploration, monitoring, and administration throughout an enterprise. |
Events | Events are records of specific actions that happen in a system or process. They have useful business or process data which are useful for further research, observability, and tracking. In the context of ADOC, events describe pipeline run actions. They are linked to span. Events allow ADOC users to monitor and view pipeline tasks in real time. Alerts can be generated with them. Selecting the root span displays all pipeline events. |
Extract | The step of pulling data from a source system so it can be processed. ADOC performs extract operations during data ingestion to begin analyzing data quality. For example, extracting customer data from a PostgreSQL database before running profiling. |
Freshness | How recently data has been updated. ADOC monitors freshness to ensure your dashboards and reports reflect the latest information. For example, alerting you when your daily sales data hasn’t been updated in over 24 hours. |
Governance | Rules and policies that help manage data responsibly. ADOC enforces governance through data quality checks, access controls, and audit tracking. For example, setting up rules that prevent access to sensitive columns like customer SSNs. |
Ingest | The process of bringing data from its original source into a system. ADOC supports ingestion from over 15 sources to begin monitoring. For example, pulling daily logs from Amazon S3 into ADOC for profiling and quality analysis. |
Job nodes | Individual steps within a data pipeline. ADOC uses job nodes to show the sequence and status of tasks like data extraction or transformation. For example, a job node might handle just the “cleaning” part of the data pipeline. |
Jobs | A group of related tasks executed to move or transform data. ADOC tracks each job to show if it completed successfully or failed. For example, a job might run every night to move data from a transactional database to a warehouse. |
Load | The step where processed data is saved into a storage system or database. ADOC monitors the success and timing of load operations. For example, making sure customer records are correctly written to BigQuery at the end of the pipeline. |
Metadata enrichment | Adding extra information to data to help explain it better. ADOC enriches datasets with details like data lineage and quality scores. For example, tagging a table with when it was last updated and which job created it. |
Metrics overlay | Showing key metrics like freshness and volume on top of visual dashboards. ADOC overlays these metrics so users can instantly assess data health. For example, a freshness score next to each data asset in a dashboard view. |
Pipelines | Automated flows that move data through various steps like ingest, transform, and load. ADOC monitors the health and timing of each pipeline. For example, a pipeline that ingests user logs, cleans them, and stores them in Redshift. |
Profiles | Summaries that describe what a dataset looks like. ADOC creates profiles automatically to show data types, distributions, and outliers. For example, showing that 95% of “Age” values fall between 20 and 60. |
Pushdown | Running data processing tasks directly within the database to reduce load time and improve performance. ADOC uses pushdown to apply rules without moving data. For example, filtering invalid email addresses inside Snowflake before pulling data into ADOC. |
Resource groups | Logical groups used to manage access and permissions in ADOC. Teams can share or limit visibility based on resource groups. For example, giving only the finance team access to financial data sources and dashboards. |
Schema | The structure of a dataset, including tables, columns, and types. ADOC tracks schema changes to prevent breakages. For example, alerting when a column is dropped from a key table used in a dashboard. |
Span | A specific time slice of a job or task used for monitoring. ADOC uses spans to measure how long each part of a job took and whether it succeeded. For example, a 5-minute span showing the execution of a data quality rule. |
Spark | A powerful engine used to process large amounts of data. ADOC integrates with Apache Spark to run heavy data profiling or transformation tasks. For example, profiling a massive 10TB table in a distributed Spark environment. |
Timelines | A visual or structured way to track events or changes over time. In ADOC, timelines help monitor the flow and health of data, jobs, and quality rules. For example, viewing a timeline of when schema changes happened in a specific data source. |
Transactional DB | A database used to handle real-time operations like inserts, updates, and deletes. ADOC connects to transactional databases to monitor operational data and ensure it remains reliable. For example, monitoring a MySQL database used for online customer orders. |
Transformation | The process of converting raw data into a structured and usable format. ADOC applies transformations to clean and standardize data, ensuring consistency and readiness for analysis. For example, converting date formats from "MM/DD/YYYY" to "YYYY-MM-DD" across datasets for uniformity. |
Volume | The amount of data in a dataset, usually measured by row count or file size. ADOC tracks data volume to detect sudden changes that could indicate issues. For example, if a table normally has a million rows per day but suddenly has only 10, ADOC will raise an alert. |