Configuring ODP with GCS
In order to configure access to your GCS buckets from your cluster, start by adding the GCS connector jar. Ensure to include the gcs-connector-hadoop3-2.2.16-shaded.jar, as it not only contains the classes and resources for the GCS Connector for Hadoop but also includes its dependencies.
Once you have downloaded the jar mentioned above to your Hadoop cluster, proceed with the following configuration steps to include the gcs-connector jar into the classpath for the relevant files.
Configuration Steps for Respective Classpaths
Hadoop-env.sh
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/odp/3.2.2.0-1/hadoop/gcs-connector-hadoop3-2.2.16-shaded.jarHive-env.sh
export HIVE_AUX_JARS_PATH=/usr/odp/current/hive-client/gcs-connector-hadoop3-2.2.16-shaded.jarMapreduce.application.classpath
/usr/odp/3.2.2.0-1/hadoop/gcs-connector-hadoop3-2.2.16-shaded.jarCore-Site.xml
<property> <name>fs.gs.impl</name> <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem</value> <description>The FileSystem for gs: (GCS) uris.</description> </property> <property> <name>fs.AbstractFileSystem.gs.impl</name> <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value> </property>Configure Access using Core-Site.xml
Core-Site.xml
<property> <name>google.cloud.auth.type</name> <value>SERVICE_ACCOUNT_JSON_KEYFILE</value> </property> <property> <name>google.cloud.auth.service.account.json.keyfile</name> <value>PATH TO JSON KEYFILE</value> </property>Configure Access using Hadoop Credential JCEKS
x
hdfs dfs -mkdir /app/gcpclienthadoop credential create fs.gs.auth.service.account.email -provider jceks://user/dataget.jceks -value "SERVICE ACCOUNT EMAIL"hadoop credential create fs.gs.auth.service.account.private.key.id -provider jceks://hdfs/app/gcpclient/dataget.jceks -value "PRIVATE KEY ID VALUE"hadoop credential create fs.gs.auth.service.account.private.key -provider jceks://hdfs/app/dataget.jceks -value "ACCOUNT PRIVATE KEY"hadoop fs -chown -R hdfs:hdfs /app/gcpclienthadoop fs -chown 500 /app/gcpclienthadoop fs -chown 400 /app/gcpclient/dataget.jceksAfter successfully crafting the JCEKS file, you can validate its functionality by attempting to access your GCS buckets, as shown in the code below.
hadoop fs -Dhadoop.security.credential.provider.path=jceks://hdfs/app/gcpclient/dataget.jceks -ls gs://trialodp/SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/usr/odp/3.2.2.0-2/hadoop/lib/slf4j-reload4j-1.7.35.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/usr/odp/3.2.2.0-2/hadoop-hdfs/lib/slf4j-reload4j-1.7.35.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/usr/odp/3.2.2.0-2/tez/lib/slf4j-reload4j-1.7.35.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory]Jan 10, 2024 5:42:35 PM com.google.cloud.hadoop.fs.gcs.GhfsStorageStatistics updateStatsWARNING: Detected potential high latency for operation op_get_file_status. latencyMs=1553; previousMaxLatencyMs=0; operationCount=1; context=gs://trialodp/Jan 10, 2024 5:42:35 PM com.google.cloud.hadoop.fs.gcs.GhfsStorageStatistics updateStatsWARNING: Detected potential high latency for operation op_glob_status. latencyMs=1633; previousMaxLatencyMs=0; operationCount=1; context=path=gs://trialodp/; pattern=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase$$Lambda$8/1628998132@39ce27f2Found 1 itemsdrwx------ - hdfs hdfs 0 2024-01-10 16:29 gs://trialodp/testfolderTo make the above changes work, you must start the required components on your cluster.
Was this page helpful?