Configuring ODP with GCS
In order to configure access to your GCS buckets from your cluster, start by adding the GCS connector jar. Ensure to include the gcs-connector-hadoop3-2.2.16-shaded.jar
, as it not only contains the classes and resources for the GCS Connector for Hadoop but also includes its dependencies.
Once you have downloaded the jar mentioned above to your Hadoop cluster, proceed with the following configuration steps to include the gcs-connector jar into the classpath for the relevant files.
Configuration Steps for Respective Classpaths
Hadoop-env.sh
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/odp/3.2.3.3-2/hadoop/gcs-connector-hadoop3-2.2.16-shaded.jar
Hive-env.sh
export HIVE_AUX_JARS_PATH=/usr/odp/current/hive-client/gcs-connector-hadoop3-2.2.16-shaded.jar
Mapreduce.application.classpath
/usr/odp/3.2.3.3-2/hadoop/gcs-connector-hadoop3-2.2.16-shaded.jar
Core-Site.xml
<property>
<name>fs.gs.impl</name>
<value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem</value>
<description>The FileSystem for gs: (GCS) uris.</description>
</property>
<property>
<name>fs.AbstractFileSystem.gs.impl</name>
<value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
</property>
Configure Access using Core-Site.xml
Core-Site.xml
<property>
<name>google.cloud.auth.type</name>
<value>SERVICE_ACCOUNT_JSON_KEYFILE</value>
</property>
<property>
<name>google.cloud.auth.service.account.json.keyfile</name>
<value>PATH TO JSON KEYFILE</value>
</property>
Configure Access using Hadoop Credential JCEKS
x
hdfs dfs -mkdir /app/gcpclient
hadoop credential create fs.gs.auth.service.account.email -provider jceks://user/dataget.jceks -value "SERVICE ACCOUNT EMAIL"
���
hadoop credential create fs.gs.auth.service.account.private.key.id -provider jceks://hdfs/app/gcpclient/dataget.jceks -value "PRIVATE KEY ID VALUE"
hadoop credential create fs.gs.auth.service.account.private.key -provider jceks://hdfs/app/dataget.jceks -value "ACCOUNT PRIVATE KEY"
hadoop fs -chown -R hdfs:hdfs /app/gcpclient
hadoop fs -chown 500 /app/gcpclient
hadoop fs -chown 400 /app/gcpclient/dataget.jceks
After successfully crafting the JCEKS file, you can validate its functionality by attempting to access your GCS buckets, as shown in the code below.
hadoop fs -Dhadoop.security.credential.provider.path=jceks://hdfs/app/gcpclient/dataget.jceks -ls gs://trialodp/
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/odp/3.2.3.3-2/hadoop/lib/slf4j-reload4j-1.7.35.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/odp/3.2.3.3-2/hadoop-hdfs/lib/slf4j-reload4j-1.7.35.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/odp/3.2.3.3-2/tez/lib/slf4j-reload4j-1.7.35.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory]
Jan 10, 2024 5:42:35 PM com.google.cloud.hadoop.fs.gcs.GhfsStorageStatistics updateStats
WARNING: Detected potential high latency for operation op_get_file_status. latencyMs=1553; previousMaxLatencyMs=0; operationCount=1; context=gs://trialodp/
Jan 10, 2024 5:42:35 PM com.google.cloud.hadoop.fs.gcs.GhfsStorageStatistics updateStats
WARNING: Detected potential high latency for operation op_glob_status. latencyMs=1633; previousMaxLatencyMs=0; operationCount=1; context=path=gs://trialodp/; pattern=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase$$Lambda$8/1628998132@39ce27f2
Found 1 items
drwx------ - hdfs hdfs 0 2024-01-10 16:29 gs://trialodp/testfolder
To make the above changes work, you must start the required components on your cluster.
Was this page helpful?