Enabling Cross-Cluster and Cross-Realm Kerberos Authentication for Hadoop Data Migration

Overview

Migrating data between Hadoop clusters located in different Kerberos realms requires establishing cross-realm authentication. This comprehensive guide provides detailed, step-by-step instructions for the following scenarios:

Follow the instructions provided in this page to to set up cross-realm trusts, configure Kerberos and Hadoop settings, and perform data migration using distcp.

Prerequisites

  • Administrative Access:

    • For MIT KDCs: Root or administrative access to both KDC servers.
    • For Active Directory Domains: Domain Administrator privileges.
  • Network Connectivity:

    • Ensure all clusters and their respective KDCs or domain controllers can communicate over the network.
  • Consistent User and Group Identities:

    • Usernames and group names should be consistent across clusters for seamless access control.
  • DNS Configuration:

    • Proper DNS setup for name resolution between clusters and KDCs.
  • Time Synchronization:

    • All systems must have synchronized clocks (use NTP) to prevent Kerberos authentication failures.

Configure and Validate the DNS Settings

The correct DNS configuration is crucial for Kerberos authentication and Hadoop operations.

Configure the DNS Forwarding or Conditional Forwarders

On each domain controller and KDC:

  • Active Directory Domain Controllers:

    • DNS Forwarding:
      • Open DNS Manager.
      • Right-click on the DNS server and select Properties.
      • Go to the Forwarders tab and add the IP address of the DNS server from the other domain.
  • Conditional Forwarders:

    • In DNS Manager, expand the server and right-click Conditional Forwarders.
    • Select New Conditional Forwarder.
    • Enter the domain name of the other realm and the IP address of its DNS server.
  • MIT KDC Servers:

    • Update /etc/resolv.conf:

      • Add the nameserver entries for the other realm's DNS servers.
    • Configure DNS Zones:

      • Modify your DNS server to include zones for the other domain, if applicable.

Validate the DNS Resolution

On a node in each domain:

  1. Test Forward Lookup
Bash
Copy
  • Replace hostname.otherdomain.com with an actual hostname from the other domain.
  • Verify that it resolves to the correct IP address.
  1. Test Reverse Lookup
Bash
Copy
  • Replace IP_ADDRESS with the IP address of a host in the other domain.
  • Verify that it resolves to the correct hostname.

Verify Network Connectivity

  1. Ping Test
Bash
Copy
  1. Port Connectivity: Test the connectivity to critical ports (For example, Kerberos port 88):
Bash
Copy

For a cross-realm trust to function properly, both Key Distribution Centers (KDCs) must have the same krbtgt principal and password, and must be configured to use the same encryption type.

Scenario 1: Cross-Realm Trust Between Clusters with Both MIT KDCs

Configure each MIT KDC Servers

On MIT KDC Server A (REALM_A.COM):

  1. Create Cross-Realm Principal:
Bash
Copy
  1. Export Keytab:
Bash
Copy
  1. Transfer Keytab to KDC B:
Bash
Copy

On MIT KDC Server B (REALM_B.COM):

  1. Create Cross-Realm Principal:
Bash
Copy
  1. Export Keytab:
Bash
Copy
  1. Transfer Keytab to KDC A:
Bash
Copy

Merge Keytabs on Both KDCs:

  • On KDC A:
Bash
Copy
  • On KDC B:
Bash
Copy

Edit /etc/krb5.conf on Both KDCs:

  • Add the following configurations:
Bash
Copy
  • Explanation:
    • The [capaths] section defines the authentication paths between realms.
    • The dot (.) indicates a direct trust relationship.

Scenario 2: Cross-Realm Trust Between an MIT KDC and an Active Directory KDC

Configure the Active Directory Domain Controller

On the AD Domain Controller (ADDOMAIN.COM):

  1. Create a User for Cross-Realm Trust:

    • Open Active Directory Users and Computers.
    • Create a user named krbtgt/MITREALM.COM.
    • Set a strong password and select Password never expires.
    • Uncheck User must change password at next logon.
  2. Map the MIT Realm to the AD Domain:

    • Open Command Prompt as Administrator.
    • Run the following command and
Bash
Copy

- Enter the password when prompted.

  1. Copy the Keytab to the MIT KDC Server:
    • Transfer krbtgt_MITREALM.COM.keytab securely to the MIT KDC server.

Configure the MIT KDC Server

On the MIT KDC Server (MITREALM.COM):

  1. Create the Trust Principal:
Bash
Copy
  • Use the same password set on the AD side.
  1. Import the Keytab from AD:
Bash
Copy
  1. Verify the Keytab Entries:
Bash
Copy
  1. Update /etc/krb5.conf:
    1. Add the AD realm under [realms] and update [capaths].

Scenario 3: Cross-Realm Trust Between Clusters with Both Active Directory KDCs

Types of Trust

  • External Trust: Domain-to-domain trust outside the forest.
  • Forest Trust: Trust between two AD forests, allowing all domains within to trust each other.

Trust Direction and Authentication Scope

  • Trust Direction:

    • One-Way Trust: Only one domain trusts the other.
    • Two-Way Trust: Both domains trust each other.
  • Authentication Scope:

    • Forest-Wide Authentication: All users can authenticate.
    • Selective Authentication: Only specified users/groups can authenticate.

Choose the Appropriate Trust Type

  • Recommended: Two-Way Forest Trust with Forest-Wide Authentication.

Create a Two-Way Forest Trust

Option A: Using GUI (Active Directory Domains and Trusts)

On Domain Controller of DOMAIN_A.COM:

  1. Open Active Directory Domains and Trusts:

    • Navigate to Start > Administrative Tools > Active Directory Domains and Trusts.
  2. Create New Trust:

    • Right-click DOMAIN_A.COM > Properties > Trusts tab > New Trust.
  3. Follow the Wizard:

    • Enter DOMAIN_B.COM as the trust name.
    • Select Forest Trust.
    • Choose Two-way trust.
    • Select Both this domain and the specified domain.
    • Choose Forest-wide authentication.
    • Set a secure trust password.
    • Complete the wizard.

Repeat the steps on DOMAIN_B.COM Domain Controller.

Option B: Using the Command-Line (netdom)

On DOMAIN_A.COM Domain Controller:

  1. Run the below command and enter passwords when prompted.
Bash
Copy

On DOMAIN_B.COM Domain Controller:

Bash
Copy

Option C: Using PowerShell

On DOMAIN_A.COM Domain Controller:

Bash
Copy

On DOMAIN_B.COM Domain Controller:

Bash
Copy

Validate and Confirm the Trust

Using GUI:

  • On both domain controllers:
    • Active Directory Domains and Trusts > Right-click domain > Properties > Trusts tab > Select trust > Properties > Validate.

Using Command-Line (netdom)

On DOMAIN_A.COM:

Bash
Copy

On DOMAIN_B.COM:

Bash
Copy

Using PowerShell

On DOMAIN_A.COM*: *

Bash
Copy

On DOMAIN_B.COM:

Bash
Copy

Scenario 4: Data Migration Between Secure and Unsecure Clusters

When migrating data between a secure (Kerberos-enabled) and an unsecure Hadoop cluster, specific configurations are required.

Configure the Unsecure Cluster

Replace SECURE_REALM.COM with your secure cluster's realm name.

Update the core-site.xml

Add the following properties:

  1. Set the Secure Cluster Realm:
Bash
Copy
  1. Modify hadoop.security.auth to local Rules.
Bash
Copy

Restart Affected Services

  • Restart the HDFS, YARN, and MapReduce services.

Perform the distcp Operation

From the Secure Cluster to the Unsecure Cluster:

Bash
Copy

Explanation:

  • The -D ipc.client.fallback-to-simple-auth-allowed=true flag allows the secure cluster to communicate with the unsecure cluster using simple authentication.

Common Steps for All Scenarios

Configure Kerberos (krb5.conf) on All Cluster Nodes

On all nodes in both Clusters, update /etc/krb5.conf to recognize all involved realms.

Sample /etc/krb5.conf :

Bash
Copy

Update the Hadoop Configuration

Modify **core-site.xml:** Add or update the hadoop.security.auth_to_local property.

Bash
Copy

Update hdfs-site.xml:Add the following property.

Bash
Copy

Synchronize Time Across Clusters

  • Ensure that all systems use NTP or similar services for time synchronization.

Restart Hadoop Services

  • Restart the Hadoop services on all clusters to apply new configurations.

Verification Steps

Test Kerberos Authentication

On a Node in Each Cluster:

  1. Obtain a Kerberos Ticket:
Bash
Copy
  1. Access HDFS on the Other Cluster:
Bash
Copy

Expected Result: You must see the content of the HDFS directory without authentication errors.

Troubleshooting

  • Check Kerberos Tickets
Bash
Copy
  • Review the Hadoop Logs

    • Check logs under/var/log/hadoop/ for errors.
  • Common Issues

    • Clock skew between servers.
    • Incorrect krb5.conf configurations.
    • Firewall blocking necessary ports.

Performing the Data Migration

Run distcp Between Clusters

  • For Secure Clusters:
Bash
Copy
  • For Secure to Unsecure Cluster (Scenario 4):
Bash
Copy

Additional Options:

  • If experiencing issues, force Kerberos to use TCP by adding to /etc/krb5.conf:
Bash
Copy

Verify Data Migration

  • Use hdfs dfs -ls to check the destination directory.
  • Verify file integrity and permissions.

Summary of Steps

  1. DNS Configuration:

    • Configure DNS forwarding or conditional forwarders.
    • Validate DNS resolution and network connectivity.
  2. Establish Cross-Realm Trust:

    • Scenario 1: Configure cross-realm principals on MIT KDCs.
    • Scenario 2: Set up trust between AD DC and MIT KDC.
    • Scenario 3: Create a two-way forest trust between AD domains.
    • Scenario 4: Configure unsecure cluster to accept connections from the secure cluster.
  3. Kerberos Configuration:

    • Update /etc/krb5.conf with realm and KDC details.
    • Define [capaths] for authentication paths.
  4. Hadoop Configuration:

    • Modify core-site.xml and hdfs-site.xml with necessary properties.
    • Distribute configurations across all nodes.
  5. Time Synchronization:

    • Ensure all systems have synchronized clocks using NTP.
  6. Restart Services:

    • Restart Hadoop services to apply changes.
  7. Verification:

    • Test Kerberos authentication.
    • Access HDFS across clusters.
  8. Data Migration:

    • Use hadoop distcp for data transfer.
    • Verify successful data migration.

This guide assumes familiarity with Kerberos and Hadoop administration. Always ensure you have backups before making significant changes to production systems. Consult with your organization's security policies before implementing cross-realm trusts.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
  Last updated