UDT Helper Guide

This section explains how programmers can create UDT using the interfaces offered by Acceldata and the unit test case suite to write cover test cases.

Please refer to the ADOC-cli-readme.pdf usage documentation for information on how to create projects and manage their lifecycles, which include test runs, packaging, and pushing.

Prerequisites:

  1. Zip and unzip software or command line utility
  2. Java 8
  3. Spark 2.4 or 3.2
  4. Scala 2.12.x in case of Spark 3.2 or Scala 2.11.x in case of Spark 2.4
  5. Gradle 7.4.2
  6. IDE (Integrated Development Environment) supports gradle projects (preferably IntelliJ)

At this time, you must get in touch with the Acceldata Support Representative to download this package.

1. Create a new project

Bash
Copy

The repository structure will look like:

Bash
Copy

Interfaces

There are two types of UDTs:

  1. Transform: When creating a new transform column for a data quality policy, the user wants to have a transformation at runtime. This transformation can also be used to consume a portion of a rule.
  2. Filter: User wants to use filter as a straightforward filter-driven rule in data quality policy.
  3. Lookup Filter: User wants to use lookup filter as a filter-driven rule in data quality policy using reference lookup.
  4. Group By Lookup Filter: User wants to use Group by reference lookup as a filter-driven rule when using By Lookup Filter in data quality policy.

1. Transform

  • Scala Interface - io.ad.udt.types.scalatype.TransformUDT
  • Java Interface - io.ad.udt.types.javatype.TransformUDT

To get the same sample for each programming language and project, please refer to udt/sample src.

Using Scala:

Scala
Copy

Steps:

  • Extends TransformUDT
  • variables : Define variables if they are a part of UDT; otherwise, leave them empty
  • evaluate : There are two parameters
    1. row: Row is an object that contains row data with corresponding column
    2. parameters: Map[String, String] - We use these parameters as dynamic variables, and they will be passed when attaching the UDT when creating the DQ policy.
    3. Supported return type for primitive types

2. Filter

  • Scala Interface - io.ad.udt.types.scalatype.TransformUDT
  • Java Interface - io.ad.udt.types.javatype.TransformUDT

Using Scala:

Scala
Copy

Steps:

  • Extends FilterUDT
  • variables : Define variables if they are a part of UDT; otherwise, leave them empty
  • evaluate : There are two parameters
    1. row: Row is an object that contains row data with corresponding column
    2. parameters: Map[String, String] - We use these parameters as dynamic variables, and they will be passed when attaching the UDT when creating the DQ policy.
    3. Return type should be always Boolean

3. Filter Using Reference Lookup

  • Scala Interface - io.ad.udt.types.scalatype.TransformUDT
  • Java Interface - io.ad.udt.types.javatype.TransformUDT

Using Scala:

Scala
Copy

Steps:

  • Extends LookUpFilterUDT
  • variables : Define variables if they are a part of UDT; otherwise, leave them empty
  • evaluate : There are two parameters
    1. row: Row is an object that contains row data with corresponding column
    2. parameters: Map[String, String] - We use these parameters as dynamic variables, and they will be passed when attaching the UDT when creating the DQ policy.
    3. store: HelperStore is a lookup store that must be used to reference as an asset and retrieve the appropriate columns' data for subsequent use.
    4. Return type should be always Boolean

HelperStore provides a lookup API. Please visit https://docs.acceldata.io/torch/torch/lookup-data-quality-policy-rule#lookup-rule-usage-methods to obtain the API lists.

4. Filter Using Reference Lookup Group By

It is the same as Filter Using Reference Lookup; the only distinction is that we are using group by reference here, for which HelperStore requires the use of the get api and requires the same to be passed from the UI as well.

  • Scala Interface - io.ad.udt.types.scalatype.LookUpFilterUDT
  • Java Interface - io.ad.udt.types.javatype.LookUpFilterUDT
Scala
Copy

Steps:

  • Extends LookUpFilterUDT
  • variables : Define variables if they are a part of UDT; otherwise, leave them empty
  • evaluate : There are two parameters
    1. row: Row is an object that contains row data with corresponding column
    2. parameters: Map[String, String] - We use these parameters as dynamic variables, and they will be passed when attaching the UDT when creating the DQ policy.
    3. store: HelperStore is a lookup store that must be used to reference as an asset and retrieve the appropriate columns' data for subsequent use.
    4. Return type should be always Boolean

5. Plain Filter or Reference Lookup Filter with Failure Metadata

With Plain Filter and Reference Lookup Filter, Failure Meta is an added feature. This enables the user to return the result as true or false as well as any error feedback as meta.

Here you must return io.ad.udt.model.PredicateResult object with two parameters:

  1. result: It is of boolean type, if you want to treat the record as good or bad depending on the business logic.
  2. metadata: Metadata can be of any native collection. It is failure meta per record basis, which will be used later while running Data Quality to collect the feedback of any bad records.

Syntax for Scala PredicateResult

Scala
Copy

Syntax for Java PredicateResult

Java
Copy
  • Scala Interface - io.ad.udt.types.scalatype.FilterUDT
  • Java Interface - io.ad.udt.types.javatype.FilterUDT
  • Scala Interface - io.ad.udt.types.scalatype.LookUpFilterUDT
  • Java Interface - io.ad.udt.types.javatype.LookUpFilterUDT

To obtain the same sample for each programming language and project, please refer to udt/sample src. For the sample below, utilizing Scala:

Scala
Copy
  • Extends FilterUDT
  • variables: Define variables if they are a part of UDT; otherwise, leave them empty
  • evaluate: There are two of parameters
    1. row: Row is an object that contains row data with corresponding column
    2. parameters: Map[String, String] - We use these parameters as dynamic variables, and they will be passed when attaching the UDT when creating the DQ policy.
    3. The return type should be always PredicateResult with result as boolean and metadata as any native collection.

Unit Test Case Suite

  • Scala Test Suite Class - io.ad.udt.suite.UDTScalaTestSuite
  • Java Test Suite Class - io.ad.udt.suite.UDTJavaTestSuite

Here is an example test case for Group By Lookup:

Scala
Copy

Steps:

  • Extends UDTScalaTestSuite or UDTJavaTestSuite depending upon the language
  • input : Create the input data frame and return the same
  • configureReferenceAssets: This one is optional, so we can leave it blank if not used. It will be applied to UDTs that involve lookups. Here, the map must be made using the asset alias key and the reference table data frame. See how we can create by referring to the relevant sample.
  • validate : The user must call validate with two parameters for each test case.
    1. udt: Create a object for the respective UDT definition and pass as parameter
    2. parameters: The UDT must use this map with key and value to resolve internal dependencies.
    3. It will return the data frame with data type as provided and compute in UDT.

For examples of the test cases we've provided based on language, please refer to the test module. For each of the UDT options, test case examples will be provided.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard