Data Reliability Settings
The Data Reliability Settings page gives you full control over how reliability data is stored, protected, optimized, validated, and audited. These settings directly affect compliance, operational cost, and system performance, making them critical for managing large-scale enterprise data operations.
ADOC groups reliability configurations into five categories:
Accessing Reliability Settings
To access Reliability settings:
- In the ADOC UI, click the Settings icon in the left navigation pane.
- Under Data Reliability, use the tabs to navigate between Retention, Protection, Data Persistence and Optimization, Asset Validation Schedule, and Audit.
1. Data Retention
Data Retention controls how long reliability data (such as profiling metrics, validation results, and rule execution logs) is stored in ADOC before being purged.
- Keeping data longer supports compliance requirements, auditing, and long-term trend analysis.
- Shorter retention reduces storage usage and system costs.
Example:
- A financial services company may keep reliability logs for 365 days to meet audit requirements.
- A cost-sensitive project may set retention to 30–60 days.
MinHash Calculation
MinHash is a statistical technique used to identify similar or duplicate datasets without scanning every record.
- When enabled, ADOC can highlight duplicate or overlapping data assets, which helps with data governance, quality monitoring, and cost optimization.
- When disabled, the system skips duplicate detection, saving compute resources but reducing visibility into redundancy.
Examples:
- An enterprise managing large data lakes may enable MinHash to detect duplicate customer tables across business units.
- A small project with limited resources may disable MinHash to reduce compute load.
Limitations of MinHash
- Results are probabilistic, not exact — very close datasets may still be missed, or false positives may occur.
- The technique works best on large, diverse datasets; for small datasets, detection accuracy is limited.
- Enabling MinHash can increase compute and storage overhead, especially on very large data sources.
Recommendation: Use MinHash for enterprise-scale environments where duplicate datasets are common and governance is a priority. For cost-sensitive or smaller projects, weigh the benefits of duplicate detection against the added resource usage.
How to Configure
- Navigate to the Data Retention tab.
- To manage MinHash, toggle the setting ON (enabled) or OFF (disabled).
- Enter the desired number of days for retention.
- Click Save Changes.
- Use Reset to Default if you want to return to system defaults.
- Compliance-focused organizations: 180–365 days.
- Cost-focused teams: 30–90 days.
2. Data Protection
Data Protection enforces security rules on sensitive data columns (such as PII, financial information, or regulated fields) across ADOC. When enabled, ADOC automatically applies masking or restrictions to these columns if they are flagged as protected.
Note PII (Personally Identifiable Information) refers to any data that can identify an individual directly (e.g., name, SSN, email) or indirectly when combined with other data (e.g., date of birth, zip code). Handling PII requires stricter privacy, security, and compliance controls under regulations such as GDPR, HIPAA, and CCPA.
- When enabled: PII data is masked and hidden from both unauthorized users and automated workflows such as crawlers, profilers, or reporting jobs.
- When disabled: Masking is not applied, and flagged PII fields remain fully visible.
This ensures your organization’s governance and compliance policies are enforced at the system level, reducing the risk of data leaks or unauthorized access.
How to Configure
Navigate to the Data Protection tab.
Toggle Data Protection Enabled to ON or OFF.
- ON: PII fields are masked automatically.
- OFF: No masking is applied, even if fields are flagged as PII.
Click Save Changes to apply.
Use Reset to Default if you need to revert the setting.
3. Data Persistence and Optimization
Data persistence settings control how ADOC handles the storage and availability of policy execution results. These options let you balance between data availability and system efficiency.
Data Persistence Enabled
- When ON: Policy execution results are stored and available for review and auditing.
- When OFF: Results are not persisted, which reduces storage usage but prevents historical tracking.
Data Violation Download Enabled
- When ON: Users can export or download data violation results for further analysis, reporting, or integration with external systems.
- When OFF: Violation results remain visible in ADOC but cannot be downloaded.
How to Configure
- Navigate to the Data Persistence and Optimization tab.
- Toggle the switches for Data Persistence Enabled and Data Violation Download Enabled according to your needs.
- Click Save Changes to apply.
- Use Reset to Default if you need to restore original settings.
- Enable Data Persistence for production or compliance-focused environments where audit trails and policy history are important.
- Disable Data Persistence for test or cost-sensitive environments where long-term storage is not required.
- Enable Data Violation Download if your teams need to export violations for external reporting, ticketing, or investigations.
- Keep it disabled if you want to restrict sensitive violation data from being downloaded outside ADOC.
4. Asset Validation Schedule
Asset validation schedules define when ADOC runs reference asset validation to check the consistency and health of lookup or reference datasets. Scheduling validation helps ensure that reference data stays accurate and aligned with your operational needs.
- Frequent validation: Detects issues early in business-critical reference datasets.
- Less frequent validation: Reduces system overhead for non-critical datasets.
How to Configure
Navigate to the Asset Validation Schedule tab.
Set the frequency (for example, daily, weekly, or a specific interval).
Choose the time zone and select the day(s) and time for validation.
Enter the parallelization count (1–5) to control how many validation jobs run in parallel.
- Higher values: Faster completion, but more resource usage.
- Lower values: Less resource usage, but longer execution times.
Toggle Enable Asset Validation to turn the schedule on or off.
Click Apply to save your schedule.
Example
- Weekly, Monday 12:00 AM (Asia/Calcutta): Suitable for moderately critical reference datasets.
- Daily, off-peak hours: Recommended for business-critical datasets where stale reference data could impact downstream processes.
- Use weekly schedules for non-critical or slowly changing reference data.
- Use daily schedules for high-impact datasets (e.g., currency exchange rates, compliance lists).
- Set parallelization higher in production clusters with more resources, and lower in cost-sensitive environments.
5. Audit
The Audit tab provides a detailed record of system and user activities in ADOC. It supports accountability, compliance, and troubleshooting by capturing execution history across crawlers, policies, and other system processes.
- Tracks action type, status, method, and timestamp.
- Captures system-initiated calls as well as user actions.
- Allows filtering by Action, Type, User, Action Status, Method, or Entity ID.
- Logs include success or failure outcomes to support debugging.
- Export option available for offline review and audits.
How to View
- Navigate to the Audit tab.
- Use the filter dropdowns (Action, Type, User, Action Status, Method) to narrow results.
- Search by Entity ID for specific executions.
- Adjust the date range (for example, last 7 days).
- Optionally, download logs for audit or compliance records.
Example Use Cases
- Compliance reviews: Verify that policy executions and crawls were performed successfully.
- Security monitoring: Track which users or system accounts performed key actions.
- Debugging: Identify failed or delayed executions by checking action status and method.
- Unauthorized changes or unusual system calls.
- Repeated failures in policy execution or data crawls.
- Activities outside normal business hours.