Torch Glossary

The terms used in relation to Torch are listed here, along with a brief description of what each one implies.

TermDefinition
AlertsAn Alert is raised when a data quality policy or a reconciliation policy fails or succeeds.
Analysis Service

Analysis service is used to perform data profiling, rule executions and data sampling tasks using various configuration parameters such as Data Retention Days,Historical Metrics Interval For Anomaly Detection, Minimum Required Historical Metrics For Anomaly Detection.

For example, you can identify the unexpected items or events in a dataset with the historical metrics parameters.

Anomaly

Anomaly detection is the process of checking the values in a dataset for irregularities by using historical metrics.

For example, you can view incorrect data values, data elements, or records.

Torch allows you to customize the minimum required historical metrics and the historical metrics interval.

API (Application Programming Interface) Keys

An API key is a unique identifier used to authenticate a user, developer, or calling program to an API.

You can authenticate your application and control its access. API keys are used for projects, while authentication is used for users.

For example, the POST Start Profiling API method initiates the asset profiling process.

Asset

Assets are entities that are made up of data. A warehouse or database with schemas and tables, files in case of S3 or GCS or ADLS.

For example, a data asset made up of data records. The schemas, tables, and table-columns are examples of asset types.

Auto ProfileAuto profiling is the automated processing of information to analyze data. This process displays data source assets that have auto profiling enabled.
Business GlossaryThe glossary is where all information about a specific asset, pipeline, or business process is captured for future reference.
Compute

The Compute feature provides an overall estimate of resource utilization, compute and storage costs of underlying infrastructure, as well as recommendations to optimize resource allocation and thereby costs.

For example, if you are using instances that are running for a specific time at a specific cost, the compute feature will provide you with an understanding of the cost of the instances while they are running as well as recommendations on how to best use the instances with the least cost.

Contract

Contract is a summary that includes the name of the organization, account consumption, prediction, capacity used in percentage, and the contract end date. Contract costs can be predicted using previous cost consumption metrics.

For example, cost across accounts and services in a given organization can be viewed, which shows storage, cloud services, replication, data transfer, and compute cost.

Crawl

Crawling is the process of extracting metadata from data sources.

After establishing a connection to the datasource, you can crawl metadata from the remote datasource into the Torch database.

For example, metadata such as owner, table type, row count, and so on.

Data Drift Policy

Data Drift determines the percentage change in certain metrics when the underlying data changes.

For example, the user can create data drift rules to validate the data change against a tolerance threshold for each type of metric.

Data Protection

By enabling data protection, non-admin users can have selected columns from a table be masked. Data protection imposes restrictions on PII-enabled data columns.

For example, if you don't want to show any data in a specific column to a specific user, enable PII on columns of interest.

Data Quality Policy

Data quality is a measure of how healthy the data is within the data source, whether from a consumer or business standpoint.

To check the quality of data, multiple policies can be executed.

Data Reconciliation Policy

Data Reconciliation Policy refers to a data migration verification phase in which the target data is compared to the original source data to ensure that the migration architecture transfers the data correctly.

A data reconciliation policy can be created in Torch between two assets of similar type or between assets that can be profiled.

Data Source

Data source is the location of the origin of the data that is being used.

The database is located on a remote server and is accessible via multiple database connections. To retrieve data from the database, our server must establish a connection to the database.

Discover PageDiscover page gives you a list of all the different assets present in your ecosystem while configuring Torch to track with different sort of filtering capabilities.
Filter UDT (User Defined Template)

A UDT lets users define their own data quality rules in Java, Scala, Python, Java Script or Spark SQL.

Filter UDT template allows you to filter a record in a data asset (table).

Jobs

Jobs are operations that are triggered when an action is performed on Torch.

Various Jobs can be viewed and monitored in the jobs window, such as profile jobs, auto profile queues, data quality jobs, reconciliation jobs and upcoming jobs.

Label

Using Labels, data assets can be categorized by purpose, owner or business function.

Label uses advanced methods to search data using a key and a value that is defined over an asset.

For example, labels such as “Confidentiality or Sensitivity: High, Medium, Low”.

LineageLineage depicts how data was obtained from various sources. It demonstrates a graphical representation of a data flow.
Metadata

Metadata describes information about data, making it easier to locate, use, and reuse specific instances of data.

For example, the metadata of a column Name in a table customer_information would indicate that the column's data type is string type.

Notification ChannelIt is used to configure any notification via Email, Slack, Hangout, Jira and Webhook URL. There can be multiple notification channels depending on segregation at users end.
Persistence Path

Persistence path can be used to specify the result location at the asset level.

The data quality results will be stored in the specified persistence path of any storage. For example, Amazon S3, HDFS, Google Cloud Storage,Azure Data Lake

Persistence path can be set globally in the admin console. However, the path will be overridden if the path is configured at the asset level.

PipelineA Pipeline represents the complete ETL (Extract-Transform-Load) workflow and contains Asset nodes and Jobs associated. Pipeline is a set of processes which facilitates observability of movement of data from source repository to the target repository.
Policy

A policy is a rule mapped into an asset to perform some specific actions. There are three policies we can define for the asset.

  1. DQ policy - performed on the single asset.
  2. Data drift - performed by comparing two assets.
  3. Schema drift - performed by comparing two assets.
Profile

Data profiling is the process of reviewing, analyzing, and synthesizing data into meaningful summaries. The approach produces a high-level overview that assists in the identification of data quality concerns.

Profiling an asset gives statistical data of an asset, such as min, max, average values, etc.

Rule Set

Rule set is a group of data quality rules that exist outside of an asset-level policy.

It can be used to automatically create policies, by applying over assets.

Rules

Rules are defined functions that are used when configuring a policy to validate data when the policy is executed.

A policy can contain multiple rules that check for null values, uniqueness, and other attributes on an asset.

Sample DataThe content of an attribute in a table is represented by sample data.
Schema Drift Policy

The Schema Drift Policy detects changes to a schema or table between the previously crawled and currently crawled data sources.

For example, in Torch, schema drift policies are executed every time a data source is crawled.

Storage

The Storage tab provides a summary of storage costs.

Table, database, and high churn table storage costs can be viewed in order to take appropriate actions.

Tags

Tags are a type of metadata that help describe an asset and allows it to be found through browsing or searching.

Data discoverability is aided by tags. It can be linked to assets and policies. Tags can be generated by the system or by the user.

Transform UDTTransform User Defined Template allows you to extract values from a record in a data asset (table).
Template

A Template contains a number of rule definitions. Rules definitions are applied when a Data Quality Policy is created.

Instead of defining rules for each policy, you can use a policy template that contains rules.

When a policy template is added to it, all of the rule definitions defined in the policy template are automatically evaluated.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
On This Page
Torch Glossary