Add Data Reliability Policy

Policies enable the creation of rules to ensure data quality in systems. In Acceldata Data Observability Cloud (ADOC), data quality policies are used to assess the health of data sources.

Policies can contain multiple rules, and if all rules are positive, the policy passes. You can run policies manually or at predefined intervals, and the system generates an execution to track policy success.

Notifications can be configured via email, Slack, or Webhook URL. Multiple executions provide insight into data quality, and failure locations help identify system issues.

ADOC supports four types of Data Reliability Policies to maintain data quality and schema stability.

This is a guide on how to

Create a Data Quality policy,
Execute it and
View the results

Quality Policy

Data quality measures data source health from consumer or business perspectives. ADOC's quality policy ensures data quality for single assets by checking properties like Null Values, Asset Data Type, and Regex Match.

Take a look at this video which explains the process of adding a Data Quality policy in ADOC.

1. Add a Quality Policy

To create a data quality policy, complete the following steps in the Policy section of the Reliability in ADOC:

Click the Add New button and select Quality Policy.
Select the asset from the Asset List.
Click Select.

The following panels are displayed in the Create Data Quality Policy window:

Panels	Description
Asset Info	The Asset Info panel displays the hierarchy of the asset, along with all the tags of the asset, that are generated when crawled. To add tags, click the Add Tag button.
Info	Specify a name for the data quality policy, inside the Info panel. Also, specify a description for the data quality policy. To add tags to the policy, click on Add Tag.
User Defined Transformations	To select and To select and apply pre-configured user defined transforms that can be viewed as a column, click the Add Transformation button. The Attach User-Defined Function pop-up window appears. Choose a pre-configured filter function from the drop-down menu, or click the Click Here link to add a new user-defined function. For more information on how to create a new user defined function. Once you select a pre-configured filter function, you can provide values for the function variables to change the name if required. Then, click Validate for the changes to be saved. Next, you must provide a column name for the user-defined transform function and click the Add button. The transform column along with the description and number of variables used is displayed in the Attached Transformation table. You can also view the source code in the Code tab as well as the asset variables in the Asset Variables tab. Note When the user defined function is successfully added as a column, it is appended to the current asset (table), which can be seen in the sample data section allowing you to apply all types of rule definitions on it.
Sample Data	The Sample Data panel displays all the columns in the asset. Here, select the columns to add a rule definition. Only the selected columns will appear while trying to add a rule definition. If none of the columns are selected, then all columns will appear while adding a rule definition.
Rule Definitions	The Rule Definitions panel displays all the rules that can be defined for the columns of an asset. Click any type of rule definition and specify values accordingly. The different types of rule definitions you can add to the asset are as follows: Null Values: Checks if the selected column contains any null values i.e., checks for completeness. If you enable the Include Empty Values in good records? toggle switch, rows with empty values are also considered as good records. Schema Match: Checks column datatype against the selected column and datatype. Validates the consistency in data. Include Empty Values in good records: If you enable it, the toggle switch, rows with empty values are also considered as good records. Include Null Values in good records: Similarly, if you enable the toggle switch, rows with Null values are considered as good records. Pattern Match: Checks if the column values adhere to the given regex (regular expression). (string type). Enumerations: Verifies if the selected column values are in the list. Include Empty Values in good records: If you enable it, the toggle switch, rows with empty values are also considered as good records. Include Null Values in good records: Similarly, if you enable the toggle switch, rows with Null values are considered as good records. Checks if the selected column values are present in the Tag provided. Tags Match: Check if a value falls within the selected range. Distinct Values: Check Distinct values i.e., if all the values in the selected column are unique. Toggle the icon to enable row processing. Row Check: Checks the number of rows. Data Policy Templates: Matches a set of rules that are configured. Select a policy template from the drop-down list. Custom: Create a custom condition involving one or more columns. For example, C1+C2>C3. The expressions must be in the form of a Spark SQL statement. User Defined: User defined rules allow you to write custom code and create templates. Lookup Rules: Lookup rules allow you to perform lookup on data in a target column against data in reference column.
Check Incrementally	Click the icon to incrementally check the conditions by selecting one of the following incremental strategies and specify required values
Schedule Execution	Click the icon to schedule execution. To schedule, select any tag like minute, hour, day, week, month, or year. Enable the Start Scheduler Runs toggle.
Alert Configurations	Choose a notification group. When the policy exceeds its thresholds, channel recipients are notified. Notification Channels lets you create notification groups. Create notification groups on the Notification Channels page by clicking Add a Notification Channel. You can choose to send notifications for various policy execution statuses like Error, Warning, Success or all of the scenarios. Click the Enable Policy icon to start receiving alerts.
Persistent Path Configuration	The result location of a data quality policy can be configured at the asset level. The data quality results will be stored at the configured location for the asset.
Spark Job Configurations	You can configure the Spark Job Configurations for a particular policy. The configuration parameters to be entered varies depending on the type of option selected during deployment.

Click the Save Policy button.

2. Execute the Policies

After you have created a policy, you must put it into action to check for inconsistencies in your data. When you execute a policy, all of the rules stated in your policy are run, and you can view the results for each rule.

To execute a policy do the following steps:

Navigate to Data Reliability.
Click Policies in the left pane.
To execute the policy, click the Execute icon on the right side of the pane.
Select one of the following option for policy execution.
1. All your data: This option executes the policy on all the data for the asset.
2. Incremental: Policy will be executed based on the incremental strategy set inside the policy.
3. Selective: Selective execution runs jobs over a subset of data, bounded either by an id-based column or a date-based column.

If you opted for Selective execution on your policy, input the following information to perform a selective execution:

Column	Description
Column Type	Select either Id or Timestamp from the drop-down list.
Date Column	Select a date.
Date Format	Enter the format for how you would like the dates to be displayed. For example: dd/mm/yyyy or mm/dd/yy etc.
Initial Offset	Select a date from when execution of the policy should begin.
Final Offset	Select a date from when the execution of the policy should end.

Click the Execute button.

When the execution is complete, a result window appears, displaying the execution status, data quality status, and a description. View the policy execution result by clicking the See execution details button.

If the Data Quality Status of UNRELIABLE is highlighted in red, the policy has failed.

If the Data Quality Status UNRELIABLE is highlighted in orange, it is a warning. It indicates that the quality score has risen above the Warning Threshold.

3. View the Results

To view the execution result of a data quality policy, perform the following steps:

Click the Data Reliability tab.
Click the Policies tab.
Check the Data Quality box from the filter panel to select the policy from the list, or search for the policy by its name in the search bar. The policy is displayed.

Click the policy to view its execution result. top features: other policies, reliability reports.

For more information on how to create other policies types in Data Reliability see,

Last updated on

Was this page helpful?