Hive

This guide covers configuring Hive to access S3A storage at the session level, enabling users to work with S3 buckets without requiring cluster-wide configuration changes or service restarts.

Configuration Methods

This method uses secure credential files (JCEKS) stored on HDFS.

Step 1: Create a credential file on HDFS

Bash
Copy

Step 2: Add to hive-site.xml

Bash
Copy

Step 3: Set session properties in Beeline

Bash
Copy

Method 2: Static Configuration (Cluster-Wide)

Add the following properties to HDFS, Hive, Tez, and Spark XML configuration files:

Bash
Copy

Update hive.conf.hidden.list in Custom hiveserver2-site to exclude S3A credentials:

Bash
Copy

Static configuration exposes credentials in plain text across the cluster and is not recommended for multi-tenant environments.

Known Limitations

  1. Multiple Buckets Accessing multiple S3 buckets within a single session requires additional HMS changes and is currently not supported.
  2. Impala Session-level S3 credential configuration is not supported.
  3. Spark SQL INSERT S3 credentials do not propagate to HMS. Use the Spark DataFrame API instead.

Support Matrix

Supported Versions

  • ODP 3.2: 3.2.3.5-2 and later
  • ODP 3.3: 3.3.6.3-101 and later
OperationHive (Beeline)Spark (Direct)Spark + HWC
Create Table
Insert (Static/Dynamic)⚠️*
Show Partitions
MSCK Repair
Alter Table
Drop Table
Alter / Drop Partition

Legend

  • ✅ Supported
  • ❌ Not Supported
  • ⚠️* Use DataFrame API; run MSCK REPAIR in Beeline after insert

Spark + HWC Example

Bash
Copy

Spark DataFrame Write (Workaround for INSERT Limitation)

Bash
Copy
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
  Last updated