Configuring ODP with GCS

To configure access to your GCS buckets from your cluster, start by adding the GCS connector jar. Ensure to include the gcs-connector-hadoop3-2.2.16-shaded.jar, as it not only contains the classes and resources for the GCS Connector for Hadoop but also includes its dependencies.

Once you have downloaded the jar mentioned above to your Hadoop cluster, you can go ahead with the following configuration steps to include the gcs-connector jar into the classpath for the relevant files.

Configuration Steps for Respective Classpaths

Hadoop-env.sh

Bash
Copy

Hive-env.sh

Bash
Copy

Mapreduce.application.classpath

Bash
Copy

Core-Site.xml

Bash
Copy

Configure Access using Core-Site.xml

Core-Site.xml

Bash
Copy

Configure Access using Hadoop Credential JCEKS

Bash
Copy

After successfully crafting the JCEKS file, you can validate its functionality by attempting to access your GCS buckets, as shown in the code below.

Bash
Copy

To make the above changes work, you must start the required components on your cluster.

Access to Google Cloud Storage via ADC

Validate connectivity between ODP 3.3.6.x and Google Cloud Storage, and confirm whether the bundled Google GCS connector supports Application Default Credentials.

Scope

Validated

  • ODP 3.3.6.x connectivity to GCS
  • Hadoop filesystem access using gs://
  • GCS connector availability and filesystem implementation
  • ADC authentication using GOOGLE_APPLICATION_CREDENTIALS
  • Configuration using:

google.cloud.auth.type=APPLICATION_DEFAULT

Not Validated

  • Workload Identity Federation
  • fs.gs.auth.type=APPLICATION_DEFAULT
  • fs.gs.auth.type=ACCESS_TOKEN_PROVIDER
  • Spark BigQuery Connector
  • Hive BigQuery Storage Handler
  • BigQuery read/write operations

Environment

ComponentDetails
ODP Version3.3.6.x
Hadoop VersionODP Hadoop Client
GCS Connectorgcs-connector-hadoop3-2.2.26-shaded.jar
Cloud ProviderGoogle Cloud Platform
StorageGoogle Cloud Storage Bucket
Authentication TestedApplication Default Credentials
Authentication Not TestedWorkload Identity Federation

Connector Discovery

The GCS connector was found under the Spark client jars:

Bash
Copy

Example Output:

Bash
Copy

Because the connector was not available in the Hadoop runtime classpath by default, it was manually copied into the Hadoop client library path:

Bash
Copy

GCP Setup

The following GCP resources were created:

  • GCS test bucket
  • GCP service account
  • Service account JSON credential file
  • Required IAM permissions on the bucket

The credential file was placed on the ODP host:

Bash
Copy

Network Validation

Connectivity from ODP to GCS was verified:

Bash
Copy

Output:

Bash
Copy

This confirms:

  • DNS resolution is successful
  • HTTPS connectivity is available
  • No firewall restrictions are blocking access to GCS endpoints

Hadoop Configuration

core-site.xml

  1. In Ambari UI, go to HDFS → Configs → Custom core-site.xml.
  2. In Custom core-site.xml, add the following properties:
Bash
Copy

hadoop-env.sh

Configure the following variable:

Bash
Copy

Validation Steps

  1. Verify GCS Connector Classes

Command:

Bash
Copy

Output:

Bash
Copy

Result:

The connector contains the required Hadoop filesystem implementation class.

  1. Validate Hadoop Filesystem Access to GCS

Command:

Bash
Copy

Output:

Bash
Copy

Result: Hadoop successfully listed objects from the GCS bucket.

Findings

The validation confirms that:

  • ODP 3.3.6.x can successfully connect to Google Cloud Storage.
  • The bundled GCS connector supports:

google.cloud.auth.type=APPLICATION_DEFAULT

  • Application Default Credentials are successfully consumed using:

GOOGLE_APPLICATION_CREDENTIALS=/etc/gcp/odp-gcs-sa.json

  • Hadoop filesystem access to GCS using gs:// is functional.

Additional Connector Inspection

The connector JAR was inspected and contains classes related to:

  • Access token providers
  • External account credentials
  • STS token exchange
  • Identity pool subject token suppliers

Examples:

Bash
Copy

These components are commonly associated with:

  • Application Default Credentials
  • Workload Identity Federation
  • External account authentication
  • Token exchange authentication

Limitations

Workload Identity Federation was not tested end-to-end. The presence of WIF-related authentication classes indicates potential support, but this should not be treated as validated until tested using a GCP-generated WIF credential configuration file.

Example not tested:

Bash
Copy

Conclusion

The lab validation successfully confirmed that ODP 3.3.6.x can access Google Cloud Storage using Application Default Credentials (ADC) with:

Bash
Copy

and

Bash
Copy

While Workload Identity Federation (WIF) was not tested end-to-end, inspection of the bundled GCS connector indicates the presence of Google authentication libraries commonly used for WIF and external account authentication flows.

A future validation can be performed using a GCP-generated WIF credential configuration file (wif-config.json) to confirm end-to-end WIF support.

VariableType to search · ESC to discard
GlossaryType to search · ESC to discard
InsertType to search · ESC to discard
No matches
  Last updated