Amazon Athena

Amazon Athena is a serverless query tool provided by AWS that lets you run SQL queries directly on data stored in Amazon S3—without needing to manage infrastructure. With Acceldata’s Observability Cloud (ADOC), you gain insight into the health and usage of your Athena data. Once connected, you can monitor usage on the Data Reliability dashboard in ADOC.

Prerequisites

Ensure the following requirements are met before you connect Athena as a data source:

  • An active AWS account with access to Athena and S3.

  • An S3 bucket to store query results.

  • Access to AWS Glue, which Athena uses as its data catalog.

  • Proper IAM permissions for Athena, S3, and Glue. These include:

    • Running Athena queries
    • Reading and writing to the S3 results bucket
    • Listing and reading Glue databases and tables
  • For integration with ADOC, a Data Plane ready (either preexisting or newly created).

Required IAM Policy for AWS Athena

Note Replace the placeholder <s3-result-storage-bucket> with the name of the S3 bucket that you specified at the time of Athena data source creation to store the query results.

JSON
Copy

This IAM policy does the following:

  1. Athena Permissions – Allows the user to run queries in Athena, manage prepared statements, and list/query Athena resources.
  2. S3 Permissions – Grants access to the S3 bucket you specify (<s3-result-storage-bucket>) so Athena can read from it and write query results to it.
  3. Glue Permissions – Provides read-only access to AWS Glue’s catalog, databases, and tables so Athena can look up table and schema information.

Add Amazon Athena as a Data Source

Step 1: Start Setup

  1. Select Register from the left main menu.
  2. Select Add Data Source > AWS Athena from the list of data sources.
  3. On the Data Source Details page:
  4. Name the data source.
  5. Add a short description (optional).
  6. Pick a Data Plane from the dropdown. If you don’t have one yet, select Set Up Data Plane to create one.
  7. Click Next.

Step 2: Add Connection Details

  1. On the Connection Details page:
  2. Choose your AWS S3 authentication method:
  3. IAM Instance Profile: relies on the IAM role of your EC2 or Kubernetes service account (no keys are needed).
  4. Access Key / Secret Key: enter your AWS credentials manually.
  5. IAM Roles for Service Accounts (IRSA): for secure, no-credential access in Kubernetes environments; note it's not available for Athena via EKS Pod Identity yet.
  6. (Optional) Turn on Use Secret Manager to keep credentials safely in AWS Secrets Manager.
  7. Enter your AWS Region.
  8. Enter the full S3 bucket path (starting with s3://) where Athena stores query results.
  9. Select your Data Plane Engine: either Spark or Pushdown Data Engine, for profiling and quality checks.
  10. Click Test Connection. If credentials are valid, you'll see a Connected message. If not, double-check your credentials and retry.
  11. Click Next when ready.

Step 3: Set Up Observability

  1. On the Set Up Observability page:
  2. Select the databases you want ADOC to monitor.

Optional Configurations

  • Enable Schema Drift Monitoring to detect changes in file schemas (e.g., added, removed, or renamed columns) over time.
  • Note: Schema drift detection requires a scheduled crawler.*
  • Enable Job Concurrency and set a maximum number of parallel jobs using Maximum Slots. For more information, see Control Plane Concurrent Connections and Queueing Mechanism.
  • Use Crawler Execution Schedule to set when background jobs scan files and collect metadata for observability:
  1. Select how often the crawler runs (e.g., daily)

  2. Set execution time and time zone

  3. Add multiple execution times if needed

  4. Set Notifications

    1. Notify on Crawler Failure: Select one or more channels for failure alerts.
    2. Notify on Success: Receive success notifications (toggle on/off)
  5. Turn on the Crawler Execution Schedule toggle to set when ADOC should crawl Athena. Choose your preferred time and time zone.

  6. Click Submit.

Now ADOC is set up to monitor your Athena data source, and you can run the crawler immediately or let it run based on the crawler schedule.

What’s Next

After you connect Athena as a data source, you can start using Acceldata’s reliability features to better understand and manage your data:

  • Discover assets: Navigate to the Reliability > Discover Assets to locate and explore your newly added Athena data source.
  • Profile your data source: Run profiling to get a summary of data structure, distribution, and quality.
  • Apply reliability policies: Set up rules and checks to monitor data quality, freshness, and consistency over time.
  • Track insights: Use the Data Reliability dashboards to see trends, alerts, and overall health of your Athena data.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard