Microsoft | Azure Data Factory(ADF)
Azure Data Factory (ADF) serves as Azure's cloud-based ETL (Extract, Transform, Load) service, designed to facilitate scalable and serverless data integration and transformation. It provides a user-friendly interface devoid of coding requirements, ensuring straightforward configuration, along with comprehensive monitoring and management capabilities within a single interface.
In addition to supporting ADF's computational capabilities, ADOC now supports its integration for Data Reliability. This interface allows ADOC to take advantage of ADF's data cleansing and standardization functionalities, inputting standardized and cleansed files directly into ADOC for further data quality analysis.
Configuring Permissions for Azure Data Factory Integration
To set up an Azure Data Factory (ADF) data source on the ADOC platform, you need to grant certain permissions. These permissions allow us to read ADF metadata such as pipelines, activities, and other components. You can authenticate using one of two methods:
- Azure Service Principal
- Azure Managed Identity
The following steps will guide you through granting the necessary permissions using either method.
Granting Permissions at the Azure Subscription Level
You can provide access to a Service Principal or Managed Identity in several ways:
- Through Azure Managed Identities
- Through Azure Subscriptions
- Through Azure Data Factory
Data Plane Setup for Managed Identity (MSI)
If you are using Azure Managed Identities to authenticate with Azure services like ADF, Azure Data Lake Storage, or Azure SQL, you must take these extra steps to set up the data plane.
Step 1: Map Managed Identity to Kubernetes Service account.
1. Navigate to Federated Credentials:
- In your Managed Identity settings, select Federated Credentials.
2. Add Credentials
- Click on Add Credentials.
- For each Kubernetes Service account listed below, create a new credentials:
analysis-service
analysis-standalone-service
spark-scheduler
sparkoperator
torch-monitors
Step 2: Annotate Kubernetes Service Accounts
For each of the service accounts mentioned:
1. Add Annotation:
- Include the following annotation with the client ID of your Managed Identity:
azure.workload.identity/client-id <client id of the managed identity>
Step 3: Label Deployments in Kubernetes
In the deployment YAML files for each of the service, add the following label, under both the metadata
and template
sections:
azure.workload.identity/use"true"
Service to Update:
analysis-service
analysis-standalone-service
spark-scheduler
sparkoperator
torch-monitors
By completing these steps, you'll enable your data plane to use Azure Managed Identities when connecting to Azure Data Factory and other Azure resources.
Ensure you have the necessary administrative privileges to perform these actions. If you encounter any issues, consult your Azure administrator or refer to the official Azure documentation for more detailed guidance.
Configuring an Azure Data Factory Data Source in ADOC
Step 1: Begin the Registration Process
- Access Data Sources: In the ADOC platform, click on Register from the left navigation menu to open the data sources window.
- Add a New Data Source: Click the Add Data Source button.
- Select Azure Data Factory: Choose Azure Data Factory from the list of available data source types.

Step 2: Enter Data Source Details
Provide Basic Information:
- Name: Enter a meaningful name for your data source.
- Description: Optionally, add a description for future reference.
Enable Observability Options:
- Compute Observability: Toggle this on if you want to monitor compute resources.
- Data Reliability: Toggle this on to enable pipeline observability.
- Data Plane Selection: If Data Reliability is enabled, select the appropriate Data Plane.
Proceed to Connection Details:
- Click Next to move to the connection details page.

Step 3: Choose Authentication Method and Enter Credentials
You can authenticate using one of two methods:
Option 1: Service Principal
Select Authentication Method:
- Choose Service Principal.
Enter Credentials:
- Tenant ID: Input your Azure Active Directory tenant ID.
- Client ID: Enter the application (client) ID of your Service Principal.
- Client Secret: Provide the client secret value.
- Subscription ID: Input the Azure subscription ID where your Data Factories are located.
Option 2: Azure Managed Identity (MSI)
Select Authentication Method:
- Choose Managed Identities.
Enter Subscription ID:
- Subscription ID: Provide the Azure subscription ID where your Data Factories are located.

Step 4: Test the Connection
Initiate Connection Test:
- Click Test Connection to verify that the provided credentials can access your Azure Data Factory.
Review Results:
- If the test is successful, proceed to the next step.
- If the test fails, double-check your credentials and ensure the correct permissions are in place.
Step 5: Configure Data Source Settings
Following a successful connection test, you get directed to the data source configuration page.
Set Up Observability Preferences:
Compute Observability:
- Polling Frequency: Determine how frequently ADOC should retrieve compute data.
Data Reliability / Pipeline Observability:
- Select Resource Groups: Select the relevant resource groups from the dropdown.
- Select Data Factories: Select the specific Data Factories you want to monitor.
- Crawler Scheduling: Configure if needed.
- Schema Drift Monitoring: Enable and set up if applicable.
Review and Confirm Settings:
- Ensure all selections align with your monitoring needs.
Step 6: Finalize the Integration
- Save the Data Source: Click Submit or Save to complete the integration process.
- Data Fetching Schedule: Data will be fetched or crawled according to the configured schedule, typically every 24 hours.
Pipeline Observability Features
Once your ADF data source is registered, you gain access to enhanced pipeline observability features:
Automatic Detection:
- Pipelines and Runs: All pipelines, their runs, and associated metadata are automatically detected. All pipeline capabilities within ADOC are applicable to ADF pipelines.
Enhanced Filtering Options:
- Source Type Filter: On the Pipeline Listing page, you can filter pipelines based on the ETL orchestrator (e.g., ADF, other sources).
- Data Factory Filter:
- For ADF sources, you can filter pipelines by Data Factory or specific ADF Data Sources.
- This helps in narrowing down the pipelines for better analysis and monitoring.

Also see, Pipeline.
Role(s) Required to Fetch ADF (Cost + Operations) Data in Azure
You can assign roles in three ways to make ADF integration work.
Role Assignment Method | Descriptions |
---|---|
Assign Reader role at resource group level | Provides read access to all the resources (including cost) present in that resource group. |
Assign Cost Management Reader and Data Factory Contributor roles at resource group level | Provides read access to cost data of all the resources and read access to all the data factories present in the resource group. |
Assign Cost Management Reader role at resource group level and Reader role at data factory level | Provides read access to cost data of all the resources in the resource group and read access to the Data Factory. |
For more information, see the Microsoft Azure documentation.