Amazon | Athena

AWS (Amazon web services) Athena is a serverless query service. Athena is an interactive service and facilitates you to easily analyze data in Amazon S3, using standard SQL. Athena has a serverless infrastructure.

Take a look at this video which explains the process of adding AWS Athena as a data source.

Athena in ADOC

ADOC provides data reliability capability for data stored in your Athena data source. You must create a Data Plane or use an existing Data Plane to add Athena as a Data source in ADOC. Once you add Athena as a Data source, you can view the details of your Athena usage in the Data Reliability tab in ADOC.

Steps to add Athena as a Data Source

To add Athena as a Data source:

  1. Click Register from the left pane.
  2. Click Add Data Source.
  3. Select the AWS Athena Data Source. The Athena Data Source basic Details page is displayed.
AWS Athena Data Source

AWS Athena Data Source

  1. Enter a name for the data source in the Data Source name field.
  2. (Optional) Enter a description for the Data Source in the Description field.
  3. Enable the Data Reliability capability by switching on the toggle switch.
  4. Select a Data Plane from the Select Data Plane drop-down menu.

To create a new Data Plane, click Setup Data Plane.

You must either create a Data Plane or use an existing Data Plane to enable the Data Reliability capability.

  1. Click Next. The AWS Athena Connection Details page is displayed.
AWS Athena Connection Details

AWS Athena Connection Details

AWS Athena Authentication Type

AWS Athena Authentication Type

  1. Choose the authentication method:
  • IAM Instance Profile: This makes use of the IAM role associated with your EC2 instance or Kubernetes service account. In this scenario, leave the Access Key and the Secret Key fields empty.
  • Access Key or Secret Key: Fill up the AWS Access Key and AWS Secret Key fields with your AWS access key and secret key. For more details on how to view AWS access key and Secret key, refer this AWS document and for details on how to view your AWS region, refer this AWS document.
  • IAM Roles for Service Accounts (IRSA): This authentication method is used in Kubernetes environments to securely manage permissions without hard-coded credentials. IRSA allows IAM roles to be associated with Kubernetes service accounts, ensuring least-privilege access.

EKS Pod Identity builds on this by offering native integration with AWS IAM, providing a more seamless authentication experience for workloads running in EKS clusters. However, it is currently not available for Athena.

  1. (Optional) Toggle the Use Secret Manager option if you want to use AWS Secrets Manager to manage your credentials.
  2. Enter the region where your AWS account is located in the AWS Region field.
  3. Enter your S3 bucket path in the S3 Location field. You must enter the full path starting from s3://.
  4. Select the Dataplane Engine, either Spark or Pushdown Data Engine, for profiling and data quality.
  5. Click Test Connection.
AWS Athena Dataplane Engine

AWS Athena Dataplane Engine

If your credentials are valid, you receive a Connected message. If you get an error message, validate the AWS credentials you entered.

AWS Athena Data Source Connected

AWS Athena Data Source Connected

  1. Click Next. The Set Up Observability page is displayed.
  2. Select the databases to be included from the Database Names field. The assets of the selected databases are monitored by the Data Reliability capability of ADOC.
  3. Enable Crawler Execution Schedule : Turn on this toggle switch to select a time tag and time zone to schedule the execution of crawlers for Data Reliability.
  4. Click Submit.
AWS Athena Setup Observability

AWS Athena Setup Observability

AWS Athena is now added to ADOC as a Data Source. You can choose to crawl your Athena account now or later.

Required IAM Policy for AWS Athena

These are the AWS permissions required to connect to an IAM User or Role for Athena Data Source.

Note Replace the placeholder <s3-result-storage-bucket> with the name of the S3 bucket that you specified at the time of Athena data source creation to store the query results.

JSON
Copy

If you need help configuring IRSA for Athena, please contact our support team at www.acceldata.force.com or call our service desk at +1 844 9433282.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard