Deploying ODP In Production Data Centers With Firewalls

A typical Open Source Data Platform (ODP) install, requires access to the internet in order to fetch software packages from a remote repository. Since, corporate networks typically have various levels of firewalls, these firewalls may limit or restrict internet access, making it impossible for your cluster nodes to access the ODP repository during installation.

The solution for this is to either:

  • Create a local mirror repository inside your firewall hosted on a local mirror server; or
  • Provide a trusted proxy server inside your firewall that can access the hosted repositories.

Note Many of the descriptions in this section assume you are using RHEL or CentOS 7.

This document covers these two options in detail, discusses the trade-offs, provides configuration guidelines, and recommendations for your deployment strategy.

In general, before installing Open Source Data Platform in a production data center, it is best to ensure that both the Data Center Security team and the Data Center Networking team are informed and engaged to assist with these aspects of the deployment.

Terminology

The table below lists the various terms used throughout this section.

Table 6.1. Terminology

ItemDescription
Yum Package Manager (yum)A package management tool that fetches and installs software packages and performs automatic dependency resolution.
Local Mirror RepositoryThe yum repository hosted on your Local Mirror Server that serves the ODP software.
Local Mirror ServerThe server in your network that will host the Local Mirror Repository. This server must be accessible from all hosts in your cluster where you install ODP.
ODP RepositoriesA set of repositories hosted by Acceldata that contains the ODP software packages. ODP software packages include the ODP Repository and the ODP-UTILS Repository.
ODP Repository TarballA tarball image that contains the complete contents of the ODP Repositories.

Mirroring or Proxying

ODP uses yum to install the software, and this software is obtained from the ODP Repositories. If your firewall prevents internet access, you must mirror or proxy the ODP Repositories in your Data Center.

Mirroring a repository involves copying the entire repository and all its contents onto a local server and enabling an HTTPD service on that server to serve the repository locally. Once the local mirror server setup is complete, the *.repo configuration files on every cluster node must be updated, so that the given package names are associated with the local mirror server instead of the remote repository server.

Two methods exist for setting up a local mirror server, with detailed explanations provided in subsequent sections of this document.

  • Mirror server has no access to internet at all: Use a web browser on your workstation to download the ODP Repository Tarball, move the tarball to the selected mirror server using scp or an USB drive, and extract it to create the repository on the local mirror server.
  • Mirror server has temporary access to internet: Temporarily configure a server to have internet access, download a copy of the ODP Repository to this server using the reposync command, then reconfigure the server so that it is back behind the firewall.

Note

  • Option I is probably the least effort, and in some respects, is the most secure deployment option.
  • Option III is best if you want to be able to update your Hadoop installation >periodically from the Acceldata Repositories.

Trusted proxy server: Proxying a repository involves setting up a standard HTTP proxy on a local server to forward repository access requests to the remote repository server and route responses back to the original requestor. Effectively, the proxy server makes the repository server accessible to all clients, by acting as an intermediary.

Once the proxy is configured, change the /etc/yum.conf file on every cluster node, so that when the client attempts to access the repository during installation, the request goes through the local proxy server instead of going directly to the remote repository server.

Considerations for Choosing a Mirror or Proxy Solution

The following table lists some benefits provided by these alternative deployment strategies:

However, each of the above approaches are also known to have the following disadvantages:

  • Mirrors have to be managed for updates, upgrades, new versions, and bug fixes.
  • Proxy servers rely on the repository provider to not change the underlying files without notice.
  • Caching proxies are necessary, because non-caching proxies do not decrease WAN traffic and do not speed up the install process.

Recommendations for Deploying ODP

This section provides information on the various components of the Apache Hadoop ecosystem.

In many data centers, using a mirror for the ODP Repositories can be the best deployment strategy. The ODP repositories are small and easily mirrored, allowing you secure control over the contents of the Hadoop packages accepted for use in your data center.

Note The installer pulls many packages from the base OS repositories (repos). If you do not have a complete base OS available to all your machines at the time of installation, you may run into issues. If you encounter problems with base OS repos being unavailable, please contact your system administrator to arrange for these additional repos to be proxied or mirrored.

Detailed Instructions for Creating Mirrors and Proxies

Option I - Mirror server has no access to the internet

Complete the following instructions to set up a mirror server that has no access to the internet:

  1. Check Your Prerequisites.

Select a mirror server host with the following characteristics:

  • The server OS is CentOS (7), RHEL (7), RHEL (8), RL(8), or Ubuntu (20,22), and has several GB of storage available.
  • This server and the cluster nodes shall all be running the same OS.
  • The firewall should let all cluster nodes (the servers on which you want to install ODP) access this serve.
  1. Install the Repos.

a. Use a workstation with access to the internet and download the tarball image of the appropriate Acceldata ODP repository.

Table 6.2. Acceldata ODP Repositories

Cluster OSODP Repository Tarballs
RHEL/CentOS 7wget [INSERT_URL]
RHEL 8/RL 8wget [INSERT_URL]
Ubuntu 20/22wget [INSERT_URL] wget [INSERT_URL]

b. Create an HTTP server.

• On the mirror server, install an HTTP server (such as Apache httpd) using the instructions provided here.

• Activate this web server.

• Ensure that the firewall settings (if any) allow inbound HTTP access from your cluster nodes to your mirror server.

Note

  • If you are using EC2, make sure that SELinux is disabled.
  • If you are using EC2, make sure that SELinux is disabled.

c. On your mirror server, create a directory for your web server.

For example, from a shell window, type:

  • For RHEL/CentOS 7:
Bash
Copy
  • For Ubuntu 18/20:
Bash
Copy

If you are using a symlink, enable the following symlinks on your web server.

d. Copy the ODP Repository Tarball to the directory created in step 3, and untar it.

e. Verify the configuration.

  • The configuration is successful, if you can access the above directory through your web browser.

To test this out, browse to the following location: http://$yourwebserver/odp/$os/ODP-3.2.3.3-2/.

You should see directory listing for all the ODP components along with the RPMs at: $os/ODP-3.2.3.3-2.

Note $os can be Centos7, Ubuntu 18/20. Use the following options table for $os parameter.

Table 6.3. ODP Component Options

Operating SystemValue
RHEL/CentOs 7centos7
RHEL 8/RL 8rhel8
Ubuntu 20ubuntu20
Ubuntu 22ubuntu22

f. Configure the yum clients on all the nodes in your cluster.

  • Fetch the yum configuration file from your mirror server.
  • Store the odp.repo file to a temporary location.
  • Edit the odp.repo file changing the value of the base url property to point to your local repositories based on your cluster OS.

where

  • $yourwebserver is the FQDN of your local mirror server.
  • $os can be RHEL 7, Centos7, RHEL (8), RL(8),or Ubuntu 18/20. Use the following options table for $os parameter:

Table 6.4. Yum Client Options

Operating SystemValue
RHEL/CentOs 7centos 7
RHEL 8/RL 8rhel8
Ubuntu 20ubuntu20
Ubuntu 22ubuntu22

For RHEL/CentOS 7 and RHEL 8/RL 8 :

  • Add the following file on every node in the cluster.
Bash
Copy

For Ubuntu 20/22 :

  • Add the following file on every node in the cluster.
Bash
Copy

Option II - Mirror server has temporary or continuous access to the internet

Complete the following instructions to set up a mirror server that has temporary access to the internet:

  1. Check Your Prerequisites.

Select a local mirror server host with the following characteristics:

  • The server OS is CentOS (7), RHEL (7), RHEL (8), RL(8), or Ubuntu (20,22), and has several GB of storage available.
  • The local mirror server and the cluster nodes must have the same OS. If they are not running CentOS or RHEL, the mirror server must not be a member of the Hadoop cluster.

Note To support repository mirroring for heterogeneous clusters requires a more complex procedure than the one documented here.

  • The firewall allows all cluster nodes (the servers on which you want to install ODP) to access this server.
  • Ensure that the mirror server has yum installed.
  • Add the yum-utils and createrepo packages on the mirror server. yum install yum-utils createrepo
  1. Install the Repos.
  • Temporarily reconfigure your firewall to allow internet access from your mirror server host.
  • Execute the following command to download the appropriate Acceldata yum client configuration file and save it in /etc/yum.repos.d/ directory on the mirror server host.

Table 6.5. ODP Client Configuration Commands

Cluster OSODP Repository Tarballs
RHEL/CentOS 7wget [INSERT_URL]
RHEL 8/RL 8wget [INSERT_URL]
Ubuntu 20wget [INSERT_URL]
Ubuntu 22wget [INSERT_URL]
  • Create an HTTP server.
    • On the mirror server, install an HTTP server (such as Apache httpd using the instructions provided
    • Activate this web server.
    • Ensure that the firewall settings (if any) allow inbound HTTP access from your cluster nodes to your mirror server.

Note If you are using EC2, make sure that SELinux is disabled.

Bash
Copy

On your mirror server, create a directory for your web server.

• For example, from a shell window, type:

• For RHEL/CentOS 7:

Bash
Copy

• For Ubuntu 20/22:

Bash
Copy

If you are using a symlink, enable the follow symlinks on your web server.

• Copy the contents of entire ODP repository for your desired OS from the remote

  • Continuing the previous example, from a shell window, type:
  • For RHEL/CentOS 7/Ubuntu 20/22:
Bash
Copy

Then for all hosts, type:

  • ODP Repository
Bash
Copy

You should see both an ODP-3.2.3.1-2 directory and an ODP-UTILS-1.1.0.21 directory, each with several subdirectories.

  • Generate appropriate metadata.

This step defines each directory as a yum repository. From a shell window, type:

  • For RHEL/CentOS 7:
    • ODP Repository:
Bash
Copy

You should see a new folder called repodata inside both ODP directories.

  • Verify the configuration.
  • The configuration is successful, if you can access the above directory through your web browser.

To test this out, browse to the following location:

  • ODP:http://$yourwebserver/odp/ODP-3.2.3.3-2/
  • You should now see directory listing for all the ODP components.
  • At this point, you can disable external internet access for the mirror server, so that the mirror server is again entirely within your data center firewall.
  • Depending on your cluster OS, configure the yum clients on all the nodes in your cluster
  • Edit the repo files, changing the value of the baseurl property to the local mirror URL.
  • Edit the /etc/yum.repos.d/odp.repo file, changing the value of the baseurl property to point to your local repositories based on your cluster OS.

where

  • $yourwebserver is the FQDN of your local mirror server.
  • $os can be Centos7, RHEL 8, or Ubuntu 20/22. Use the following options table for $os parameter:

Table 6.6. $OS Parameter Values

Operating SystemValue
RHEL 7/CentOs 7centos7
RHEL 8/RL 8rhel8
Ubuntu 20ubuntu20
Ubuntu 22ubuntu22

For RHEL/CentOS 7 and RHEL 8/RL 8 :

  • Add the following file on every node in the cluster.
Bash
Copy

- If using Ambari, verify the configuration by deploying an Ambari server on one of the cluster nodes.

Bash
Copy

For Ubuntu 18/20 :

Add the following file on every node in the cluster.

Bash
Copy

If using Ambari, verify the configuration by deploying an Ambari server on one of the cluster nodes.

Bash
Copy
  • Set up a Trusted Proxy Server

Complete the following instructions to set up a trusted proxy server:

  1. Check Your Prerequisites.

Select a mirror server host with the following characteristics:

  • This server runs on either RHEL/CentOS 7, RHEL 8/RL 8, or Ubuntu 20/22, and has several GB of storage available.
  • The firewall allows all cluster nodes (the servers on which you want to install ODP) to access this server, and allows this server to access the internet (at least those internet servers for the repositories to be proxied)Install the Repos
  1. Create a caching HTTP Proxy server on the selected host.

• It is beyond the scope of this document to show how to set up an HTTP PROXY server, given the many variations that may be required, depending on your data center’s network security policy. If you choose to use the Apache HTTPD server, it starts by installing httpd, using the instructions provided here , and then adding the mod_proxy and mod_cache modules, as stated here. Please engage your network security specialists to correctly set up the proxy server.

  • Activate this proxy server and configure its cache storage location.
  • Ensure that the firewall settings (if any) allow inbound HTTP access from your cluster nodes to your mirror server, and outbound access to the desired repo sites, including: public-repo-1.acceldata.com.

If you are using EC2, make sure that SELinux is disabled.

  • Depending on your cluster OS, configure the yum clients on all the nodes in your cluster.

The following description is taken from the CentOS documentation. On each cluster node, add the following lines to the /etc/yum.conf file. (As an example, the settings below will enable yum to use the proxy server mycache.mydomain.com, connecting to port 3128, with the following credentials: yum-user/query.

Bash
Copy
  • Once all nodes have their /etc/yum.conf file updated with appropriate configuration info, you can proceed with the ODP installation just as though the nodes had direct access to the internet repositories.
  • If this proxy configuration does not seem to work, try adding a / at the end of the proxy URL. For example:
Bash
Copy
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
  Last updated