S3 Data Store

What is S3 Data Store?

The S3 Data Store capability allows you to register and manage connections to Amazon S3 buckets and S3-compatible object storage systems directly within the Acceldata xDP platform. By creating a centralized catalog of your S3 data sources, you enable platform-wide data observability, governance, and accessibility for your data processing workloads. This simplifies data pipeline development and provides a single pane of glass for managing critical data connections.

Key Concepts

Data Store: A registered connection profile in xDP that points to an external data system like S3. It is the foundational object for accessing external data, containing the necessary location, configuration, and credential details.
Metadata Management: By registering an S3 bucket, xDP can catalog its technical metadata (e.g., object names, sizes, modification dates). This catalog is essential for data discovery, lineage tracking, and enforcing data governance policies.
Data Access Control: The configuration uses standard, secure AWS authentication methods (such as IAM Roles or Access Keys) to govern how xDP services interact with your S3 data, ensuring that access is secure and auditable.

Capabilities

Centralized Connection Management: Register, configure, and manage all your S3 bucket connections from a single, unified interface.
Secure Authentication: Connect to S3 using standard AWS security principals, including IAM Roles for Service Accounts, EC2 Instance Profiles, or long-lived Access Keys.
Data Source for Processing: Use registered S3 data stores as sources or sinks for data pipelines, Spark jobs, and other applications running on the xDP platform.
S3-Compatible Storage Support: Extend connectivity beyond AWS to other object stores that provide an S3-compatible API, such as MinIO, Ceph, or Google Cloud Storage.

Tutorial (Getting Started)

This tutorial guides you through creating your first S3 Data Store connection in xDP.

Prerequisites

You have a user role with permissions to create and manage Data Stores in xDP.
You have an active AWS S3 bucket that you want to connect to.
You have valid AWS credentials (e.g., an Access Key/Secret Key pair) with permissions to access the specified S3 bucket.
An xDP Compute Cluster is running and available.

Your First Workflow: Create an S3 Data Store

From the left navigation menu, go to Platform > Data Store. The page displays all existing data stores for your selected cluster.
Click Create Data Store.
On the Select Data Store Type screen, choose S3 and click Next.

On the S3 Connection Details page, enter the basic information for your data store.

Example: To connect to a bucket named my-prod-data in the us-west-2 region, you would enter those values here.

In the Authentication Type dropdown, select AWS Access Key / Secret Key.
Provide your AWS Access Key ID and Secret Access Key in the corresponding fields.

Warning

Treat your AWS credentials as sensitive information. Use credentials with the minimum required permissions for the target bucket.

Click Next to finalize the creation.
You are redirected to the Data Stores page, where a success notification appears and your new S3 data store is listed.

How-to Guides

Connect to an S3-Compatible Object Store

Use these steps to connect to a self-hosted or third-party object store with an S3-compatible API.

Navigate to Data Store and click + Create Data Store.
Select the S3 data store type.
Check the box for Use custom S3-compatible endpoint.
An Endpoint field appears. Enter the full URL for your object storage service (e.g., https://minio.example.com).
Fill in the Bucket Name and authentication details as required by your S3-compatible provider. > Note: The Region field is typically not required for custom endpoints but may be needed depending on the provider's implementation.
Click Next.
Verify that the connection is established and the data store is created successfully.

Best Practices

Use Consistent Naming Conventions: Adopt a clear naming scheme for your data stores, such as <project>-<environment>-<bucket-purpose>, to make them easily identifiable as your platform grows.
Specify the Correct Region: Always define the correct AWS region for your S3 bucket to prevent cross-region latency issues and unexpected data transfer costs.
Apply the Principle of Least Privilege: Create a dedicated IAM policy for xDP that grants only the necessary permissions (e.g., s3:GetObject, s3:PutObject, s3:ListBucket) for the specific S3 bucket. Avoid using credentials with administrative privileges.

Last updated on

Was this page helpful?