Copying the Encrypted Data Between Two ODP Clusters Using Ranger KMS

ODP Hadoop supports at-rest encryption by designating the HDFS directories as encryption zones. To facilitate this, Hadoop integrates with a Key Management Service (KMS) that stores and manages encryption keys securely. Ranger KMS, an open-source and scalable key management solution, seamlessly integrates with Hadoop. These capabilities simplify security for critical workloads running on Hadoop but also introduce complexities in fault tolerance and disaster recovery (DR) planning.

Organizations typically replicate data to an off-site DR cluster to ensure business continuity. However, this replication process becomes more involved when the data is encrypted. To retain data confidentiality and avoid unnecessary overhead, the organizations must consider how Ranger KMS keys are shared or managed between clusters. The following scenarios illustrate different approaches for copying encrypted data between two ODP clusters using Ranger KMS.

Scenario 1: Separate the KMS Instances Completely

In this scenario, Cluster A (Production) and Cluster B (DR) each run independent Ranger KMS instances, with no key exchange. This approach offers enhanced security isolation:

If Cluster A’s KMS is compromised, Cluster B’s KMS remains unaffected.
Each cluster generates unique encryption keys, even for identical HDFS paths.

However, this method introduces a performance overhead when replicating data because the data must be decrypted on Cluster A and re-encrypted on Cluster B.

Data Copy Procedure

Use DistCp as you would in a non-encrypted environment. DistCp automatically handles decryption on the source and encryption on the target:

Bash
    
​x
 
# On ClusterA (Production):​hadoop distcp -update \​  hdfs://ClusterA:8020/data/encrypted/fileA1.txt \​  hdfs://ClusterB:8020/data/encrypted/
Copy

Scenario 2: Two KMS Instances, One Shared Databases

Here, Cluster A and Cluster B each run a separate Ranger KMS server but share the same database. Both KMS servers access identical keys, enabling transparent encryption and decryption across clusters.

Configure Ranger KMS on ClusterA (Production) Follow the standard ODP Ranger KMS setup process, including configuring the database.
Configure ClusterB (DR) to Use the Same Database
- Replicate the KMS database configuration from ClusterA to ClusterB.
- Disable any “Setup Database and Database User” option to prevent re-initializing or overwriting the shared database.
Create Encryption Keys and Zones
- On ClusterA (Production):

Bash
    
 
hadoop key create ProdKey1hdfs crypto -createZone -keyName ProdKey1 -path /data/encrypted
Copy

On ClusterB (DR):

Bash
    
 
hdfs crypto -createZone -keyName ProdKey1 -path /data/encrypted
Copy

Copy Data Without Re-Encryption: Since both clusters share the same keys, data can remain encrypted during transit by using the hidden /.reserved/raw path. Include the -px flag to preserve extended attributes (which store the EDEK—Encrypted Data Encryption Key):

Bash
    
 
hadoop distcp -px \  hdfs://ClusterA:8020/.reserved/raw/data/encrypted/fileA1.txt \  hdfs://ClusterB:8020/.reserved/raw/data/encrypted/
Copy

Scenario 3: Two KMS Instances, Two Separate Databases

In this scenario, each cluster operates its own Ranger KMS server and maintains independent databases. Keys must be exported from ClusterA (Production) and imported into ClusterB (DR) to allow ClusterB to decrypt the data.

Set Up Each KMS Independently Install and configure both KMS servers according to the ODP documentation, including separate databases.
Create Keys on ClusterA (Production)
Export Keys from ClusterA (Production) on the node where the ClusterA KMS runs:

Bash
    
 
cd /usr/odp/current/ranger-kms./exportKeysToJCEKS.sh ClusterAProd.keystore JCEKS
Copy

You will be prompted for passwords for the keystore and for the individual keys.

Securely Transfer the Keystore File

Bash
    
 
scp ClusterAProd.keystore ClusterBNode:/usr/odp/current/ranger-kms/
Copy

Import Keys into ClusterB (DR) on the node where the Cluster B KMS runs:

Bash
    
 
cd /usr/odp/current/ranger-kms./importJCEKSKeys.sh ClusterAProd.keystore JCEKS
Copy

Use the same passwords entered during export.

Create the Corresponding Encryption Zone on ClusterB

Bash
    
 
hdfs crypto -createZone -keyName ProdKey1 -path /data/encrypted
Copy

Copy Encrypted Data: Now that Cluster B’s KMS has the corresponding keys, the data can be replicated via /.reserved/raw to avoid unnecessary decryption and re-encryption:

Bash
    
 
hadoop distcp -px \  hdfs://ClusterA:8020/.reserved/raw/data/encrypted/fileA1.txt \  hdfs://ClusterB:8020/.reserved/raw/data/encrypted/
Copy

Each time new keys are created or existing keys are rotated on Cluster A (Production), you must repeat the export–import procedure to keep Cluster B’s KMS in sync.

Known Issues and Resolutions

Below are some common issues you might encounter when setting up or using Ranger KMS in ODP Hadoop, along with recommended solutions.

KMS Key Export Script Fails
- Symptom: Running the exportKeysToJCEKS.sh script ends with an error or does not proceed.
- Cause: JAVA_HOME is not set or not set correctly.
- Solution:
  1. Verify that Java is installed on the system.
  2. Export the correct JAVA_HOME path: export JAVA_HOME=/path/to/java
  3. Rerun the export script.
Export to JCEKS Fails with a “Rejected by the jceks.key.serialFilter” Exception
- Symptom: The following exception appears:
  - Caused by: java.security.UnrecoverableKeyException:
  - Rejected by the jceks.key.serialFilter or jdk.serialFilter property
- Cause: The JCEKS key serialization filter is rejecting the serialized object.
- Solution:
  1. Edit the java.security file (location may vary depending on your Java installation).
  2. Add or modify the filter to allow the org.apache.hadoop.crypto.key package:

Bash
    
 
jceks.key.serialFilter = java.lang.Enum;                          java.security.KeyRep;                          java.security.KeyRep$Type;                          javax.crypto.spec.SecretKeySpec;                          org.apache.hadoop.crypto.key.**;!*
Copy

Save the changes and restart the KMS or relevant services.

DistCp Job Fails with Ranger KMS HA Enabled: “Unable to Get Kerberos TGT”
- Symptom: A DistCp job aborts or fails to authenticate when using a high-availability (HA) Ranger KMS setup.
- Cause: Ranger KMS is not properly handling Kerberos tickets, and the Hadoop DistCp job cannot acquire valid credentials.
- Solution:
  - Change the Ranger KMS authentication type from Kerberos to simple:

Bash
    
 
# In Ranger KMS configurationshadoop.authentication.type = simple
Copy

- Restart the Ranger KMS service. - Rerun the DistCp job, which should now complete successfully.

Last updated on Jan 22, 2025

Was this page helpful?