What Is Global Storage?
Acceldata Data Plane processes a huge amount of data, from profiling and quality checks to advanced analytics and monitoring. All of that data needs to go somewhere, whether it's logs, results, temporary files, or models.
That “somewhere” is what we call Global Storage, a central storage location used by Data Plane services to read and write data.
Depending on your environment, this storage could be:
- Google Cloud Storage (GCS) – If you're running in GCP
- Amazon S3 – If you're on AWS
- Azure Data Lake (ADLS) – If you're on Microsoft Azure
- HDFS or MAPRFS – For on-prem or Hadoop-based setups
- Local disk – For quick tests or minimal deployments (not recommended for production)
Where Is This Configured?
All global storage settings live in a JSON configuration file at:
/opt/acceldata/globalstorage.jsonThis file tells Data Plane:
- What type of storage you’re using (gcs, s3, adls, etc.)
- Where to find the storage (bucket name, project ID, etc.)
- How to securely connect to it (credentials, roles, or keys)
Sample Configuration (GCS Example)
Here’s what this JSON file might look like if you're using Google Cloud Storage:
{ "MEASURE_RESULT_FS_TYPE": "gcs", "MEASURE_RESULT_FS_GCS_BUCKET": "your-bucket-name", "MEASURE_RESULT_FS_GCS_PROJECT_ID": "your-gcp-project-id", "MEASURE_RESULT_FS_GCS_MODE": "SERVICE_ACCOUNT_INLINE", "MEASURE_RESULT_FS_GCS_CLIENT_EMAIL": "example@project.iam.gserviceaccount.com", "MEASURE_RESULT_FS_GCS_PRIVATE_KEY": "-----BEGIN PRIVATE KEY-----\nMIIEv....\n-----END PRIVATE KEY-----", "MEASURE_RESULT_FS_GCS_PRIVATE_KEY_ID": "1234567890abcdef"}Specifying Custom Folder Paths for Good/Bad Records
When a Data Quality (DQ) policy executes, ADOC stores the good and bad records generated during validation in the configured global storage bucket on the data plane.
- Good records are rows that pass all DQ rules.
- Bad records are rows that violate one or more rules.
Previously, users could only specify the bucket name where these results were saved, using environment variables in the Data Plane configuration. All policy execution results were written to the root of that bucket, which could quickly become cluttered as each policy and execution date created a new folder.
New Environment Variable: MEASURE_RESULT_FS_SAVE_PATH
From ADOC v4.7.0, a new environment variable has been introduced to help you better organize these records.
| Variable Name | Description | Example |
|---|---|---|
MEASURE_RESULT_FS_SAVE_PATH | Specifies a subpath (folder) inside the configured bucket where all good/bad records will be stored. | user/ or dq-results/finance/ |
Example Configuration:
MEASURE_RESULT_FS_BUCKET=educ-global-storage-queueMEASURE_RESULT_FS_SAVE_PATH=user/In this example, all Data Quality policy execution results are stored in:
s3://educ-global-storage-queue/user/ADOC automatically organizes results by policy name and execution date:
user/ ├── dq_policy_01/ │ ├── 2024-08-07/ │ │ ├── good/ │ │ └── bad/ │ └── 2024-08-10/ │ ├── good/ │ └── bad/How to Provide Credentials (If Using Cloud Storage)
For cloud-based storage (like GCS or S3), the system needs credentials to authenticate and access the storage bucket.
Rather than putting raw credentials in the file or environment, Data Plane reads them from Kubernetes Secrets for security.
Here’s how that works:
Step 1: Create or update /opt/acceldata/globalstorage.json
Define your storage type (e.g., gcs, s3) and connection settings.
Step 2: Base64-encode the file
Kubernetes requires secrets to be stored in base64-encoded form.
Step 3: Inject the config into the cluster
Run:
kubectl edit secret global-storage -n <your-namespace>In the data: section, add:
globalstorage.json<your-base64-encoded-content>Step 4 (GCP Only): Provide GCP credentials
If using GCS, you'll also need to base64-encode your gcp_cred.json file (your service account credentials) and add it to the gcp-cred Kubernetes Secret:
kubectl edit secret gcp-cred -n <your-namespace>Inside the data: section, add or update:
gcp_cred.json<base64-encoded-credentials>Deploying the Configuration
After updating the JSON file with the encoded secret, restart the data plane services by running the following command:
kubectl rollout restart deploy -n <your-namespace>Once completed, navigate to the Data Plane's Application Configuration page in the UI to verify that the Global Storage is properly set up.