Snowflake

Snowflake is a cloud computing organization that offers various cloud services. You can host data-intensive applications in Snowflake without facing operational complexities. This document explains how you can add Snowflake as a Data Source in ADOC. Once you add Snowflake as a Data Source, ADOC can monitor your Snowflake account and display critical information and raise alerts, as configured.

Performance and Cost optimization on Snowflake

This section of the guide provides step by step instructions to set up Snowflake for performance monitoring and cost optimization using the ADOC. Follow the steps to ensure proper configuration and access to enable ADOC to effectively monitor your Snowflake account, display critical information and raise alerts as configured.

Prerequisites

  • Ensure that you have the ACCOUNT ADMIN role in Snowflake to execute the necessary command.
  • Remember to define variables at the start of the script as you will use them through out the configuration process.
  • Following the instructions are crucial to fully utilize the ADOC capabilities in monitoring and optimizing your Snowflake account.

Step 1 : Define Variables

Begin by setting up essential variables for roles, warehouse, users, and databases that will be used throughout the script. These variables serves as identifiers in subsequent steps.

SQL
Copy

Note Replace <password> with a secure password of your choice.

Step 2 : Create Role and Warehouse

Initialize the role and warehouse defined in Step 1.

  1. Switch to ACCOUNTADMIN Role: Use the ACCOUNTADMIN role to create warehouses, grant permissions and create other roles.
SQL
Copy
  1. Create Warehouse: Create a warehouse with size x-small for running lightweight queries to fetch metadata.
SQL
Copy
  1. Configure Auto-suspend: Update the auto-suspend time to ensure minimal credit usage.
SQL
Copy
  1. Create Role: Create a role to be assigned to the user for fetching metadata.
SQL
Copy

Step 3 : Grant Necessary Privileges

Assign the required privileges for monitoring and accessing Snowflake resources.

  1. Grant Imported Privileges: Allow the role to access shared tables or views imported from the SNOWFLAKE database. This is crucial to enable resource monitoring and metadata access.
SQL
Copy

It allows the role acceldata_role to access the privileges imported from the SNOWFLAKE database.

Example: If the SNOWFLAKE database contains shared tables or views, theacceldata_role can now access them.

  1. Grant MONITOR Privilege on Account: Enable the role to monitor account-level metrics for resource usage and performance analysis.
SQL
Copy

The MONITOR privilege is essential for:

  1. Monitoring Resource Usage: __A team or user needs access to account-level metrics to ensure efficient use of Snowflake resources (e.g., credits, storage, query performance.
  2. Performance Analysis: A team might want to analyze query and warehouse performance to optimize workloads.
  1. Grant USAGE on Warehouse: Allow the role to use the warehouse for query execution and operations.
SQL
Copy

Shared tables or views in the SNOWFLAKE database may contain essential metadata or configurations that the role needs for monitoring purposes.

Step 4 : Create User

Create a user that will execute the monitoring tasks.

  1. Create User:
SQL
Copy

Note Setting default values ensures the user can pull metadata without additional configurations. If not set, the user might encounter issues fetching data.

  1. Existing Users: If you have an existing user you wish to reuse, ensure all the default roles and settings are assigned.
SQL
Copy
  1. Grant Role to User : Assign the created role to the user.
SQL
Copy

Note Granting the role to the user is essential for enabling permissions required to make JDBC connections and fetch metadata from Snowflake.

Step 5 : Database and Schema Permissions

Ensure the necessary database and schema permissions are assigned to the role.

  1. Create Database (if none exists): Create the monitoring database to store grants and permissions for Snowflake warehouses and resource monitors.
SQL
Copy

Note This database will be used to run the stored procedures for managing grants and permissions.

  1. Grant Usage on Database: Allow the role to access the newly created database.
SQL
Copy
  1. Use the Database: Change the active database to the monitoring database for subsequent operations.
SQL
Copy
  1. Grant Schema Privileges: Assign necessary permissions on the PUBLIC schema of the database.
SQL
Copy

Note These permissions allow the role to monitor, use, and manage objects in the schema.

Step 6 : Execute Stored Procedure

Grant monitor privileges on all warehouses and resource monitors.

  1. Create Stored Procedure:
SQL
Copy
  1. Grant Usage on Procedure:
SQL
Copy
  1. Call Stored Procedure:
SQL
Copy

Grant Monitor Privilege for All Warehouses: This allows the role to monitor the usage and performance of all warehouses.

Grant Monitor Privilege on All Resource Monitors: This allows the role to monitor all resource monitors for improved governance and audit processes.

  • To authorize a certain role to monitor all warehouses and resource monitors.
  • When creating roles for system administrators, auditors, and monitoring tools.
  • When establishing governance or audit mechanisms to monitor resource utilization across the Snowflake account.

Optional Steps

Optional Step 1: Grant Create Integration Privilege

If you need to create integrations (e.g., for external stages using storage integration):

SQL
Copy

This privilege allows the role to create integrations for secure connections with external services like AWS S3.

Optional Step 2: Enable ORGADMIN Role

SQL
Copy

The ORGADMIN role is required to access organization-level usage data.

Optional Step 3: OAuth & Snowflake API Client

For OAuth integration and additional configurations, refer to: Snowflake Integration with ADOC.

Troubleshooting Steps

  1. Adding Snowflake as a Data Source: For detailed procedure, see Adding Snowflake as a Data Source.
  2. Post configuration steps: Ensure all steps have been completed successfully by following the verification checklist. Also see Snowflake Checklist.
  3. Changing User Role After OAuth Configuration: f you need to change the role of an existing user -
SQL
Copy
  1. Stage Creation Failure: If stage creation fails:
    1. Log in using ACCOUNTADMIN.
    2. Select the corresponding database used during onboarding.
    3. Run:
SQL
Copy
  • When to Enable Storage Integration?

    • Enable storage integration if your Snowflake account does not allow stage creation using AWS access and secret keys directly. Storage integration allows a secure connection without storing AWS secrets on Snowflake.
  • Re-authentication via OAuth: If the default role, warehouse, or namespace is not set for the user after configuration (OAuth flow), re-authenticate the user via the OAuth flow in the ADOC platform.

  • Why Enable the ORGADMIN Role?

    • Enabling the ORGADMIN role is necessary to access data from the ORGANIZATION_USAGE schema, which provides organization-level usage metrics.
  • If Stage Creation Fails in Snowflake: Ensure you have the necessary privileges and consider granting ownership of the stage to the appropriate role.

Data Reliability on Snowflake

Ensuring data reliability inside Snowflake systems is critical for enterprises to retain the integrity and availability of their data. ADOC enables enterprises to monitor, evaluate, and optimize data dependability in Snowflake, ensuring that data is accurate, consistent, and easily accessible.

Setting Up User and Role for Data Reliability

To leverage all functionalities of ADOC for Data Reliability operations on Snowflake, a specific user and role setup is required. This section outlines the minimal permissions necessary to achieve comprehensive data reliability monitoring and optimization with ADOC.

  1. Create a Custom Role: Begin by creating a custom role that will be assigned the necessary permissions for data reliability tasks within Snowflake. Optionally, you can add a comment to describe the role's purpose.
SQL
Copy
  1. Grant Usage on Warehouse: The custom role must have usage permissions on the warehouse that will be used for Data Reliability operations. Replace <warehouse_name> and <custom_role_name> with your specific details.
SQL
Copy
  1. Grant Usage on Database: For each database that will be used for Data Reliability operations, grant usage permissions to the custom role. This step should be repeated for every relevant database.
SQL
Copy
  1. Grant Usage on Schema: Within each database, schema-level usage permissions are required. Execute this command for each schema in the database involved in Data Reliability tasks.
SQL
Copy
  1. Grant Select on All Tables Within the Schema : To perform data reliability checks, the custom role needs select permissions on all tables within the target schema.
SQL
Copy
  1. Create Users: Create a new user that will be associated with the custom role. This ensures that every session initiated by this user will automatically utilize the specified role.
SQL
Copy
  1. Set Password for the User: For authentication purposes, define a password for the newly created user. While this example uses simple username/password authentication, Snowflake supports various authentication methods, including Multi-Factor Authentication (MFA).
SQL
Copy
  1. Grant the custom role to the User: Finally, assign the previously created custom role to the user to enable them to perform Data Reliability operations within Snowflake.
SQL
Copy

Following these steps will prepare your Snowflake setup for integration, resulting in complete monitoring and data reliability optimization.

Adding Snowflake as a Data Source

To add Snowflake as a Data source:

  1. Click Register from the left pane.
  2. Click Add Data Source.
  3. Select the Snowflake Data Source. The Snowflake Data Source basic Details page is displayed.

The following table describes parameters to be entered in the Enter Connection Details window:

  1. Enter a name for the data source in the Data Source name field.
  2. (Optional) Enter a description for the Data Source in the Description field.
  3. (Optional) Enable the Compute capability by switching on the Compute toggle switch.
  4. (Optional) Enable the Data Reliability capability by switching on the Data Reliability toggle switch.

You must enable either the Compute or Data Reliability capability. You cannot add Snowflake as a Data Source without enabling at least one of these capabilities.

  1. Select a Data Plane from the Select Data Plane drop-down menu.

To create a new Data Plane, click Setup Dataplane.

You must either create a Data Plane or use an existing Data Plane to enable the Data Reliability capability.

  1. Click Next. The Snowflake Connection Details page is displayed.
  1. Specify the Snowflake URL which is used to locate the database schema in the Snowflake URL field.
  2. (Optional) Enable Snowflake OAuth by toggling the Enable OAuth switch.

You can connect to your Snowflake account with SQL APIs rather than JDBC by clicking the toggle button and entering the Authorization Endpoint and Token Endpoint.

From ADOC v3.6.0 onwards, Proof Key for Code Exchange (PKCE) is supported.

To enable PKCE:

  1. Toggle the Enable PKCE switch to enable Proof Key for Code Exchange (PKCE) for enhanced security.
  2. Set the Refresh Token Validity (in seconds) as needed, for example, 7776000 seconds for a 90-day validity period.
  1. (Optional) Provide the following information to configure Snowflake OAuth:
    1. Authorization Endpoint: This is the URL where Snowflake redirects users to authenticate and authorize access to their data.
    2. Token Endpoint: This is the URL where Snowflake exchanges authorization credentials for an access token.
    3. Client ID: This is a unique identifier assigned to your application by Snowflake when you register it for OAuth.
    4. Client Secret: This is a secret key known only to the application and Snowflake, used to authenticate the identity of the application to the authorization server.
    5. Refresh Token Validity: This specifies the duration for which a refresh token issued by Snowflake remains valid before it expires and needs to be renewed.
    6. OAuth User Name: The user name used for authentication.

12.1. Once you have entered all the details, click the Authenticate button, and you'll be directed to the sign-in page for Snowflake.

12.2. Enter your Username and Password. Upon successful sign-in, the data source will be authenticated.

These details are typically found in the OAuth configuration section or application settings within the Snowflake platform. If you're unable to locate this information, you need to consult Snowflake's documentation or contact their support for assistance.

Ensure that the OAuth User does not have account admin role assigned. User with account admin role assigned will not be able to sign-in during authorization.

  1. Specify the username with access to the Snowflake account in the Username field.

The user must have a default warehouse assigned on Snowflake and must have access to the databases that are expected to be connected from ADOC, along with everything in the hierarchy of those databases (including information schema). You must not add a user who is configured with multi factor authentication in Snowflake. ADOC cannot fetch data from such user accounts.

  1. Specify the password to connect to the Snowflake database in the Password field.
  2. Enter the user role in the Role field.
  3. Select the Dataplane Engine, either Spark or Pushdown Data Engine, for profiling and data quality.
  4. Click Test Connection to check if the connection created is working.

If the connection is successful, you can view a Connected message, as shown in the following image. If there is an error, you must verify the Snowflake details provided and enter the correct details.

Observability Set Up

The Set Up Observability page allows you to setup Compute and Data Reliability capabilities. These sections are active only if you enabled them on the Snowflake Data Source basic Details page.

This section has the following panels:

  • Compute Observability: Enter the following details to set up Compute Capabilities:

    • Warehouse: Select the warehouses from the drop-down list for which you want to set up Compute Observability.

    • Database: Enter the name of the database whose public schema could be used to configure external storage. By default, the database name is AD_MONITOR_DB.

    • Cost per Credit: Enter the cost per credit for your Snowflake account.

    • Snowflake Fetch Past Data From: Select the historical time period for which you want to fetch the data from Snowflake. The available options are Last 15 Days, Last 30 Days, Last 60 Days, Last 90 Days, Last 180 Days, and Last 1 Year.

    • Snowflake Polling Schedule: Configure the polling schedule to determine when data is fetched from Snowflake. By default, the polling time is set to 12:00 AM (00:00) in the selected time zone (e.g., UTC). Additional timings can be added and customized as per your requirements.

    • (Optional) Configure external stage using storage integration: Enable this toggle button to configure an external stage that references a Snowflake storage integration. On clicking the Submit button, the following privileges on a role are checked by ADOC:

      • CREATE INTEGRATION privilege
      • CREATE FILE FORMAT privilege on given schema
      • CREATE STAGE privilege on given schema
      • USAGE privilege on given database
    • If Configure external stage using integration toggle is not enabled then the following privileges on a role are checked by ADOC:

      • CREATE FILE FORMAT privilege on given schema
      • CREATE STAGE privilege on given schema
      • USAGE privilege on given database

Relying on a single Cost per Credit (CPC) for all Snowflake services can result in inaccurate cost attribution when varying rates are applied. To address this, the system retrieves service-specific rates directly from Snowflake metadata for more accurate calculations. If these rates are unavailable, the user-provided CPC serves as a fallback.

  • If a role possesses the OWNERSHIP privilege for a database, it automatically gains the CREATE FILE FORMAT and CREATE STAGE privileges for the associated schema, as well as the USAGE privilege on that database.
  • Similarly, if a role possesses the OWNERSHIP privilege for a schema, it automatically gains the CREATE FILE FORMAT and CREATE STAGE privileges specific to that schema.
Set Up Observability Error on Submit

Set Up Observability Error on Submit

On clicking the submit button, the above privileges are checked by ADOC, and if any one of the privileges are not met, then an error is displayed.

  • Data Reliability: Enter the following details to set up Data Reliability:
    • Warehouse: Select the warehouses from the drop-down list for which you want to set up Data reliability.
    • Databases: Select the database(s) from the drop-down list for which you want to set up Data reliability.
    • Enable Query Analysis: Turn on this toggle switch to enable query analysis on your queries.
    • Enable Crawler Execution Schedule: Turn on this toggle switch to select a time tag and time zone to schedule the execution of crawlers for Data Reliability.
    • Click the Submit button.

Snowflake is now added as a Data Source. You can choose to crawl your Snowflake account now or later.

Important If you have enabled only the Compute capability, the Start Crawler & Go to Data Sources button is not displayed. You can only view the Go to Data Sources button. The Data Source crawl feature is not applicable for the Compute data sources.

You can navigate to the manage page to view the options available after adding the Data Source.

Optimizing Data Partitioning

Controlling the partition size of data being loaded is critical for improving performance and scalability in ADOC's Snowflake integration. The environment option SNOWFLAKE_PARTITION_SIZE_IN_MB allows you to fine-tune the partition size, which affects how the Snowflake connection generates data-loading tasks. This section explains how to set up this environment variable to increase parallelism and data processing efficiency.

ADOC Default BehaviorDescription
Environment Variable:SNOWFLAKE_PARTITION_SIZE_IN_MB
Default Value2000 MB
FunctionalityDetermines the size of each partition that the Snowflake connection attempts to construct, with each partition handling 2000 MB of uncompressed data.
PurposeTo balance the number of partitions with the amount of data processed by each partition, which affects the parallelism and performance of Snowflake data operations in ADOC.

The SNOWFLAKE_PARTITION_SIZE_IN_MB parameter can be adjusted to increase or reduce partition sizes based on the client's individual data volume and processing requirements. Lowering the number from the usual 2000 MB can result in more partitions being created, which is especially useful for huge datasets that require better parallelism.

Note Default Snowflake partition size: 100 MB. | ADOC Default partition size: 2000 MB.

Determine Optimal Size: Choose an optimal partition size based on the size of your dataset and the processing needs. Decreasing the size will result in more partitions and possibly improved parallelism.

Set Environment Variable: Within the analysis service of your ADOC data plane, set the SNOWFLAKE_PARTITION_SIZE_IN_MB environment variable to the desired partition size.

Example: To change the partition size to 10 MB,

Bash
Copy

Reproducing and Monitoring Changes

  1. Initial Data Processing: Run a profile or data quality (DQ) job on a dataset size in GBs. Note the number of partitions and the parallelism level achieved.
  2. Modify Partition Size: Adjust the SNOWFLAKE_PARTITION_SIZE_IN_MB to a value lower than 2000, such as 10 MB, to increase the number of partitions.
  3. Observe Performance Improvement:
    1. Run the same profile/DQ job.
    2. Compare the number of partitions and parallelism level to the initial run. A significant increase in partition count and parallelism should be observed, improving job performance.

After updating the SNOWFLAKE_PARTITION_SIZE_IN_MB to 10 in the analysis service, a notable increase in partition count from 1 to 15 was observed for the same job, enhancing parallelism and reducing job completion time.

Partition Count Before: **1** | Partition Count After: **15**

Performance Metrics: Data compression size, uncompressed size, query time, and query ID metrics improved significantly, proving the efficiency of altering partition sizes.

Note: The performance is based on ADOC running in a specific environment and may vary depending on factors and environmental data.

This guide provides step-by-step instructions to set up AWS Private Link for secure connectivity between Snowflake and Acceldata platform in the USW2 region

Prerequisites

  • AWS Account with necessary permissions.
  • Snowflake account configured
  • VPC setup in USW2 region
  • Share AWS Account ID:
    • Provide your AWS account ID to the Acceldata support team to authorize PrivateLink from that account.

Step 2: Create VPC Endpoints

  1. Navigate to VPC Console:

    • Open the Amazon VPC console.
  2. Create Endpoints:

    • Create two VPC endpoints with the following details:

Note: Our assistance is restricted to the US - West-2 region.

1. ADOC Control Plane: com.amazonaws.vpce.us-west-2.vpce-svc-091c001843d33bbaa 2. Secure Relay: com.amazonaws.vpce.us-west-2.vpce-svc-02830f09899d40f01

Step 3: Configure Route53 DNS

  1. Set Up DNS Routing:

    • Configure Route 53 with the following records:

      • ADOC Control Plane:

        • Record Type: A
        • Record Name: <tenant>.acceldata.app
        • Value: PrivateLink interface IP of ADOC Control Plane VPC Endpoint
      • Secure Relay:

        • Record Type: A
        • Record Name: dataplane.acceldata.app
        • Value: PrivateLink interface IP of Secure Relay VPC Endpoint

Security Considerations

Custom ARN: Utilize a custom ARN with least/restricted access.

IAM Roles and Policies: Ensure proper IAM roles and policies are in place for secure access.

Additional Help
Snowflake Checklist

For additional reference and help see here.

Take a look at this video which explains the process of adding Snowflake as a data source.

Try our demo here.

Thumbnail Image
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard