Title
Create new category
Edit page index title
Edit category
Edit link
Configuring ODP with GCS
To configure access to your GCS buckets from your cluster, start by adding the GCS connector jar. Ensure to include the gcs-connector-hadoop3-2.2.16-shaded.jar, as it not only contains the classes and resources for the GCS Connector for Hadoop but also includes its dependencies.
Once you have downloaded the jar mentioned above to your Hadoop cluster, you can go ahead with the following configuration steps to include the gcs-connector jar into the classpath for the relevant files.
Configuration Steps for Respective Classpaths
Hadoop-env.sh
xxxxxxxxxxexport HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/odp/3.3.6.2-1/hadoop/gcs-connector-hadoop3-2.2.16-shaded.jarHive-env.sh
xxxxxxxxxxexport HIVE_AUX_JARS_PATH=/usr/odp/current/hive-client/gcs-connector-hadoop3-2.2.16-shaded.jarMapreduce.application.classpath
xxxxxxxxxx/usr/odp/3.3.6.2-1/hadoop/gcs-connector-hadoop3-2.2.16-shaded.jarCore-Site.xml
xxxxxxxxxx<property> <name>fs.gs.impl</name> <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem</value> <description>The FileSystem for gs: (GCS) uris.</description> </property> <property> <name>fs.AbstractFileSystem.gs.impl</name> <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value> </property>Configure Access using Core-Site.xml
Core-Site.xml
xxxxxxxxxx<property> <name>google.cloud.auth.type</name> <value>SERVICE_ACCOUNT_JSON_KEYFILE</value> </property> <property> <name>google.cloud.auth.service.account.json.keyfile</name> <value>PATH TO JSON KEYFILE</value> </property>Configure Access using Hadoop Credential JCEKS
hdfs dfs -mkdir /app/gcpclient hadoop credential create fs.gs.auth.service.account.email -provider jceks://user/dataget.jceks -value "SERVICE ACCOUNT EMAIL" hadoop credential create fs.gs.auth.service.account.private.key.id -provider jceks://hdfs/app/gcpclient/dataget.jceks -value "PRIVATE KEY ID VALUE" hadoop credential create fs.gs.auth.service.account.private.key -provider jceks://hdfs/app/dataget.jceks -value "ACCOUNT PRIVATE KEY"• hadoop fs -chown -R hdfs:hdfs /app/gcpclient hadoop fs -chown 500 /app/gcpclient hadoop fs -chown 400 /app/gcpclient/dataget.jceksAfter successfully crafting the JCEKS file, you can validate its functionality by attempting to access your GCS buckets, as shown in the code below.
hadoop fs -Dhadoop.security.credential.provider.path=jceks://hdfs/app/gcpclient/dataget.jceks -ls gs://trialodp/ SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/usr/odp/3.3.6.2-1/hadoop/lib/slf4j-reload4j-1.7.35.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/usr/odp/3.3.6.2-1/hadoop-hdfs/lib/slf4j-reload4j-1.7.35.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/usr/odp/3.3.6.2-1/tez/lib/slf4j-reload4j-1.7.35.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory]Jan 10, 2024 5:42:35 PM com.google.cloud.hadoop.fs.gcs.GhfsStorageStatistics updateStatsWARNING: Detected potential high latency for operation op_get_file_status. latencyMs=1553; previousMaxLatencyMs=0; operationCount=1; context=gs://trialodp/Jan 10, 2024 5:42:35 PM com.google.cloud.hadoop.fs.gcs.GhfsStorageStatistics updateStatsWARNING: Detected potential high latency for operation op_glob_status. latencyMs=1633; previousMaxLatencyMs=0; operationCount=1; context=path=gs://trialodp/; pattern=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase$$Lambda$8/1628998132@39ce27f2Found 1 itemsdrwx------ - hdfs hdfs 0 2024-01-10 16:29 gs://trialodp/testfolderTo make the above changes work, you must start the required components on your cluster.
Access to Google Cloud Storage via ADC
Validate connectivity between ODP 3.3.6.x and Google Cloud Storage, and confirm whether the bundled Google GCS connector supports Application Default Credentials.
Scope
Validated
- ODP 3.3.6.x connectivity to GCS
- Hadoop filesystem access using
gs:// - GCS connector availability and filesystem implementation
- ADC authentication using
GOOGLE_APPLICATION_CREDENTIALS - Configuration using:
google.cloud.auth.type=APPLICATION_DEFAULT
Not Validated
- Workload Identity Federation
fs.gs.auth.type=APPLICATION_DEFAULTfs.gs.auth.type=ACCESS_TOKEN_PROVIDER- Spark BigQuery Connector
- Hive BigQuery Storage Handler
- BigQuery read/write operations
Environment
| Component | Details |
|---|---|
| ODP Version | 3.3.6.x |
| Hadoop Version | ODP Hadoop Client |
| GCS Connector | gcs-connector-hadoop3-2.2.26-shaded.jar |
| Cloud Provider | Google Cloud Platform |
| Storage | Google Cloud Storage Bucket |
| Authentication Tested | Application Default Credentials |
| Authentication Not Tested | Workload Identity Federation |
Connector Discovery
The GCS connector was found under the Spark client jars:
xxxxxxxxxxfind /usr/odp/ -name "*gcs*jar"Example Output:
xxxxxxxxxx/usr/odp/3.3.6.2-1/spark3/jars/gcs-connector-hadoop3-2.2.26-shaded.jarBecause the connector was not available in the Hadoop runtime classpath by default, it was manually copied into the Hadoop client library path:
xxxxxxxxxxcp /usr/odp/3.3.6.2-1/spark3/jars/gcs-connector-hadoop3-2.2.26-shaded.jar \ /usr/odp/3.3.6.2-1/hadoop/client/lib/GCP Setup
The following GCP resources were created:
- GCS test bucket
- GCP service account
- Service account JSON credential file
- Required IAM permissions on the bucket
The credential file was placed on the ODP host:
xxxxxxxxxxmkdir -p /etc/gcp cp odp-gcs-sa.json /etc/gcp/ chmod 400 /etc/gcp/odp-gcs-sa.jsonNetwork Validation
Connectivity from ODP to GCS was verified:
xxxxxxxxxxcurl -I https://storage.googleapis.comOutput:
xxxxxxxxxxHTTP/2 400This confirms:
- DNS resolution is successful
- HTTPS connectivity is available
- No firewall restrictions are blocking access to GCS endpoints
Hadoop Configuration
core-site.xml
- In Ambari UI, go to HDFS → Configs → Custom core-site.xml.
- In Custom core-site.xml, add the following properties:
fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem fs.AbstractFileSystem.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS google.cloud.auth.type=APPLICATION_DEFAULThadoop-env.sh
Configure the following variable:
xxxxxxxxxxexport GOOGLE_APPLICATION_CREDENTIALS=/etc/gcp/odp-gcs-sa.jsonValidation Steps
- Verify GCS Connector Classes
Command:
xxxxxxxxxxjar tf gcs-connector-hadoop3-2.2.26-shaded.jar | grep GoogleHadoopFileSystemOutput:
xxxxxxxxxxcom/google/cloud/hadoop/fs/gcs/GoogleHadoopFileSystem.classResult:
The connector contains the required Hadoop filesystem implementation class.
- Validate Hadoop Filesystem Access to GCS
Command:
xxxxxxxxxxhadoop fs -ls gs://odptestlab01Output:
xxxxxxxxxxFound 1 items -rwx------ 3 hdfs hdfs 9 2026-06-10 02:20 gs://odptestlab01/Test.txtResult: Hadoop successfully listed objects from the GCS bucket.
Findings
The validation confirms that:
- ODP 3.3.6.x can successfully connect to Google Cloud Storage.
- The bundled GCS connector supports:
google.cloud.auth.type=APPLICATION_DEFAULT
- Application Default Credentials are successfully consumed using:
GOOGLE_APPLICATION_CREDENTIALS=/etc/gcp/odp-gcs-sa.json
- Hadoop filesystem access to GCS using
gs://is functional.
Additional Connector Inspection
The connector JAR was inspected and contains classes related to:
- Access token providers
- External account credentials
- STS token exchange
- Identity pool subject token suppliers
Examples:
com/google/cloud/hadoop/util/AccessTokenProvider.class com/google/auth/oauth2/ExternalAccountCredentials.class com/google/auth/oauth2/StsTokenExchangeRequest.class com/google/auth/oauth2/IdentityPoolSubjectTokenSupplier.classThese components are commonly associated with:
- Application Default Credentials
- Workload Identity Federation
- External account authentication
- Token exchange authentication
Limitations
Workload Identity Federation was not tested end-to-end. The presence of WIF-related authentication classes indicates potential support, but this should not be treated as validated until tested using a GCP-generated WIF credential configuration file.
Example not tested:
xxxxxxxxxxexport GOOGLE_APPLICATION_CREDENTIALS=/path/to/wif-config.jsonConclusion
The lab validation successfully confirmed that ODP 3.3.6.x can access Google Cloud Storage using Application Default Credentials (ADC) with:
xxxxxxxxxxgoogle.cloud.auth.type=APPLICATION_DEFAULTand
xxxxxxxxxxexport GOOGLE_APPLICATION_CREDENTIALS=<credential-file>While Workload Identity Federation (WIF) was not tested end-to-end, inspection of the bundled GCS connector indicates the presence of Google authentication libraries commonly used for WIF and external account authentication flows.
A future validation can be performed using a GCP-generated WIF credential configuration file (wif-config.json) to confirm end-to-end WIF support.