Torch Glossary

The terms used in relation to Torch are listed here, along with a brief description of what each one implies.

Term	Definition
Alerts	An Alert is raised when a data quality policy or a reconciliation policy fails or succeeds.
Analysis Service	Analysis service is used to perform data profiling, rule executions and data sampling tasks using various configuration parameters such as Data Retention Days,Historical Metrics Interval For Anomaly Detection, Minimum Required Historical Metrics For Anomaly Detection. For example, you can identify the unexpected items or events in a dataset with the historical metrics parameters.
Anomaly	Anomaly detection is the process of checking the values in a dataset for irregularities by using historical metrics. For example, you can view incorrect data values, data elements, or records. Torch allows you to customize the minimum required historical metrics and the historical metrics interval.
API (Application Programming Interface) Keys	An API key is a unique identifier used to authenticate a user, developer, or calling program to an API. You can authenticate your application and control its access. API keys are used for projects, while authentication is used for users. For example, the POST Start Profiling API method initiates the asset profiling process.
Asset	Assets are entities that are made up of data. A warehouse or database with schemas and tables, files in case of S3 or GCS or ADLS. For example, a data asset made up of data records. The schemas, tables, and table-columns are examples of asset types.
Auto Profile	Auto profiling is the automated processing of information to analyze data. This process displays data source assets that have auto profiling enabled.
Business Glossary	The glossary is where all information about a specific asset, pipeline, or business process is captured for future reference.
Compute	The Compute feature provides an overall estimate of resource utilization, compute and storage costs of underlying infrastructure, as well as recommendations to optimize resource allocation and thereby costs. For example, if you are using instances that are running for a specific time at a specific cost, the compute feature will provide you with an understanding of the cost of the instances while they are running as well as recommendations on how to best use the instances with the least cost.
Contract	Contract is a summary that includes the name of the organization, account consumption, prediction, capacity used in percentage, and the contract end date. Contract costs can be predicted using previous cost consumption metrics. For example, cost across accounts and services in a given organization can be viewed, which shows storage, cloud services, replication, data transfer, and compute cost.
Crawl	Crawling is the process of extracting metadata from data sources. After establishing a connection to the datasource, you can crawl metadata from the remote datasource into the Torch database. For example, metadata such as owner, table type, row count, and so on.
Data Drift Policy	Data Drift determines the percentage change in certain metrics when the underlying data changes. For example, the user can create data drift rules to validate the data change against a tolerance threshold for each type of metric.
Data Protection	By enabling data protection, non-admin users can have selected columns from a table be masked. Data protection imposes restrictions on PII-enabled data columns. For example, if you don't want to show any data in a specific column to a specific user, enable PII on columns of interest.
Data Quality Policy	Data quality is a measure of how healthy the data is within the data source, whether from a consumer or business standpoint. To check the quality of data, multiple policies can be executed.
Data Reconciliation Policy	Data Reconciliation Policy refers to a data migration verification phase in which the target data is compared to the original source data to ensure that the migration architecture transfers the data correctly. A data reconciliation policy can be created in Torch between two assets of similar type or between assets that can be profiled.
Data Source	Data source is the location of the origin of the data that is being used. The database is located on a remote server and is accessible via multiple database connections. To retrieve data from the database, our server must establish a connection to the database.
Discover Page	Discover page gives you a list of all the different assets present in your ecosystem while configuring Torch to track with different sort of filtering capabilities.
Filter UDT (User Defined Template)	A UDT lets users define their own data quality rules in Java, Scala, Python, Java Script or Spark SQL. Filter UDT template allows you to filter a record in a data asset (table).
Jobs	Jobs are operations that are triggered when an action is performed on Torch. Various Jobs can be viewed and monitored in the jobs window, such as profile jobs, auto profile queues, data quality jobs, reconciliation jobs and upcoming jobs.
Label	Using Labels, data assets can be categorized by purpose, owner or business function. Label uses advanced methods to search data using a key and a value that is defined over an asset. For example, labels such as “Confidentiality or Sensitivity: High, Medium, Low”.
Lineage	Lineage depicts how data was obtained from various sources. It demonstrates a graphical representation of a data flow.
Metadata	Metadata describes information about data, making it easier to locate, use, and reuse specific instances of data. For example, the metadata of a column Name in a table customer_information would indicate that the column's data type is string type.
Notification Channel	It is used to configure any notification via Email, Slack, Hangout, Jira and Webhook URL. There can be multiple notification channels depending on segregation at users end.
Persistence Path	Persistence path can be used to specify the result location at the asset level. The data quality results will be stored in the specified persistence path of any storage. For example, Amazon S3, HDFS, Google Cloud Storage,Azure Data Lake Persistence path can be set globally in the admin console. However, the path will be overridden if the path is configured at the asset level.
Pipeline	A Pipeline represents the complete ETL (Extract-Transform-Load) workflow and contains Asset nodes and Jobs associated. Pipeline is a set of processes which facilitates observability of movement of data from source repository to the target repository.
Policy	A policy is a rule mapped into an asset to perform some specific actions. There are three policies we can define for the asset. DQ policy - performed on the single asset. Data drift - performed by comparing two assets. Schema drift - performed by comparing two assets.
Profile	Data profiling is the process of reviewing, analyzing, and synthesizing data into meaningful summaries. The approach produces a high-level overview that assists in the identification of data quality concerns. Profiling an asset gives statistical data of an asset, such as min, max, average values, etc.
Rule Set	Rule set is a group of data quality rules that exist outside of an asset-level policy. It can be used to automatically create policies, by applying over assets.
Rules	Rules are defined functions that are used when configuring a policy to validate data when the policy is executed. A policy can contain multiple rules that check for null values, uniqueness, and other attributes on an asset.
Sample Data	The content of an attribute in a table is represented by sample data.
Schema Drift Policy	The Schema Drift Policy detects changes to a schema or table between the previously crawled and currently crawled data sources. For example, in Torch, schema drift policies are executed every time a data source is crawled.
Storage	The Storage tab provides a summary of storage costs. Table, database, and high churn table storage costs can be viewed in order to take appropriate actions.
Tags	Tags are a type of metadata that help describe an asset and allows it to be found through browsing or searching. Data discoverability is aided by tags. It can be linked to assets and policies. Tags can be generated by the system or by the user.
Transform UDT	Transform User Defined Template allows you to extract values from a record in a data asset (table).
Template	A Template contains a number of rule definitions. Rules definitions are applied when a Data Quality Policy is created. Instead of defining rules for each policy, you can use a policy template that contains rules. When a policy template is added to it, all of the rule definitions defined in the policy template are automatically evaluated.

Last updated on

Was this page helpful?