1. Pre-installation Preparation
Check DNS and NSCD
All hosts in your system must be configured for both forward and reverse DNS.
If you are unable to configure DNS in this way, you must edit the /etc/hosts
file on every host in your cluster to contain the IP address and Fully Qualified Domain Name of each of your hosts. The following instructions are provided as an overview and cover a basic network setup for generic Linux hosts. Different versions and flavors of Linux might require slightly different commands and procedures. Please refer to the documentation for the operating system(s) deployed in your environment.
Hadoop relies heavily on DNS, and as such performs many DNS lookups during normal operation. To reduce the load on your DNS infrastructure, it's highly recommended to use the Name Service Caching Daemon (NSCD) on cluster nodes running Linux. This daemon caches host, user, and group lookups and provides better resolution performance, and reduced load on DNS infrastructure.