Hive

This guide covers configuring Hive to access S3A storage at the session level, enabling users to work with S3 buckets without requiring cluster-wide configuration changes or service restarts.

Configuration Methods

This method uses secure credential files (JCEKS) stored on HDFS.

Step 1: Create a credential file on HDFS

Bash
Copy

Step 2: Add to hive-site.xml

Bash
Copy

Step 3: Set session properties in Beeline

Bash
Copy

Method 2: Static Configuration (Cluster-Wide)

Add the following properties to HDFS, Hive, Tez, and Spark XML configuration files:

Bash
Copy

Update hive.conf.hidden.list in Custom hiveserver2-site to exclude S3A credentials:

Bash
Copy

Static configuration exposes credentials in plain text across the cluster and is not recommended for multi-tenant environments.

Known Limitations

  1. Multiple Buckets Accessing multiple S3 buckets within a single session requires additional HMS changes and is currently not supported.
  2. Impala Session-level S3 credential configuration is not supported.
  3. Spark SQL INSERT S3 credentials do not propagate to HMS. Use the Spark DataFrame API instead.

Support Matrix

Supported Versions

  • ODP 3.2: 3.2.3.5-2 and later
  • ODP 3.3: 3.3.6.3-101 and later
OperationHive (Beeline)Spark (Direct)Spark + HWC
Create Table
Insert (Static/Dynamic)⚠️*
Show Partitions
MSCK Repair
Alter Table
Drop Table
Alter / Drop Partition

Legend

  • ✅ Supported
  • ❌ Not Supported
  • ⚠️* Use DataFrame API; run MSCK REPAIR in Beeline after insert

Spark + HWC Example

Bash
Copy

Spark DataFrame Write (Workaround for INSERT Limitation)

Bash
Copy
VariableType to search · ESC to discard
GlossaryType to search · ESC to discard
InsertType to search · ESC to discard
No matches
  Last updated