Configuring Dynamic Management for HDFS NameNode and DataNode

In some scenarios, it is crucial to mark specific configuration properties as reconfigurable, allowing them to be updated dynamically at runtime without restarting critical Hadoop components like the NameNode or DataNodes. This capability is especially beneficial in large, production-grade HDFS clusters, where restarts can be disruptive—impacting system availability, client throughput, and active data pipeline jobs.

This configuration helps optimize HDFS performance and reliability while ensuring minimal service downtime.

Reconfigurable properties allow administrators to fine-tune system behavior in response to changing workloads, performance bottlenecks, or operational incidents—without compromising high availability SLAs or triggering failovers.

To view the list of properties that can be dynamically reconfigured, run the following command:

Bash
Copy

DataNode Properties

PropertyNameConfigNameDescriptionRelated Apache Jirasite.xmlDefault Values
DFS_DATANODE_DATA_DIR_KEYdfs.datanode.data.dirDirectories where DataNode stores HDFS block data.HDFS-6727hdfs-site.xmlConfigured as per cluster
DFS_DATANODE_BALANCE_MAX_NUM_CONCURRENT_MOVES_KEYdfs.datanode.balance.max.concurrent.movesMaximum concurrent block moves during rebalancing.hdfs-site.xml100
DFS_BLOCKREPORT_INTERVAL_MSEC_KEYdfs.blockreport.intervalMsecInterval (ms) for block reports sent to NameNode.hdfs-site.xml21600000
DFS_BLOCKREPORT_SPLIT_THRESHOLD_KEYdfs.blockreport.split.thresholdThreshold to split large block reports.hdfs-site.xml1000000
DFS_BLOCKREPORT_INITIAL_DELAY_KEYdfs.blockreport.initialDelayInitial delay (ms) before the first block report after startup.hdfs-site.xml120 (ODP)
DFS_DATANODE_MAX_RECEIVER_THREADS_KEYdfs.datanode.max.transfer.threadsMaximum threads for receiving block data.hdfs-site.xml16384(ODP)
DFS_CACHEREPORT_INTERVAL_MSEC_KEYdfs.cachereport.intervalMsecInterval (ms) for cache reports.hdfs-site.xml10000
DFS_DATANODE_PEER_STATS_ENABLED_KEYdfs.datanode.peer.stats.enabledEnable peer statistics collection.hdfs-site.xmlfalse
DFS_DATANODE_MIN_OUTLIER_DETECTION_NODES_KEYdfs.datanode.min.outlier.detection.nodesMinimum nodes required for peer outlier detection.Require dfs.datanode.peer.stats.enabled=truehdfs-site.xml10
DFS_DATANODE_SLOWPEER_LOW_THRESHOLD_MS_KEYdfs.datanode.slowpeer.low.threshold.msThreshold (ms) to classify a peer as slow.Require dfs.datanode.peer.stats.enabled=truehdfs-site.xml5
DFS_DATANODE_PEER_METRICS_MIN_OUTLIER_DETECTION_SAMPLES_KEYdfs.datanode.peer.metrics.min.outlier.detection.samplesMinimum samples for peer metrics outlier detection.Require dfs.datanode.peer.stats.enabled=truehdfs-site.xml1000
DFS_DATANODE_FILEIO_PROFILING_SAMPLING_PERCENTAGE_KEYdfs.datanode.fileio.profiling.sampling.percentagePercentage of file IO operations to profile.

HDFS-16397

Require dfs.datanode.peer.stats.enabled=true

hdfs-site.xml0
DFS_DATANODE_OUTLIERS_REPORT_INTERVAL_KEYdfs.datanode.outliers.report.intervalInterval for reporting outlier metrics.Require dfs.datanode.fileio.profiling.sampling.percentage to be reconfigured with some value first.hdfs-site.xml30m
DFS_DATANODE_MIN_OUTLIER_DETECTION_DISKS_KEYdfs.datanode.min.outlier.detection.disksMinimum disks required for disk outlier detection.Require dfs.datanode.fileio.profiling.sampling.percentage to be reconfigured with some value first.hdfs-site.xml5
DFS_DATANODE_SLOWDISK_LOW_THRESHOLD_MS_KEYdfs.datanode.slowdisk.low.threshold.msThreshold (ms) to classify a disk as slow.Require dfs.datanode.fileio.profiling.sampling.percentage to be reconfigured with some value first.hdfs-site.xml20
FS_DU_INTERVAL_KEYfs.du.intervalInterval for disk usage calculation.core-site.xml600000
FS_GETSPACEUSED_JITTER_KEYfs.getspaceused.jitterJitter interval to stagger getSpaceUsed calls.core-site.xml60000
FS_GETSPACEUSED_CLASSNAMEfs.getspaceused.classnameCustom class for getSpaceUsed implementation.HDFS-16457

The following four implementation classes are supported for tracking disk space usage:

  1. org.apache.hadoop.fs.DU (default)
  2. org.apache.hadoop.fs.WindowsGetSpaceUsed
  3. org.apache.hadoop.fs.DFCachingGetSpaceUsed
  4. org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
NULL

NameNode Properties

PropertyNameConfigurationNameDescriptionRelated Apache Jirasite.xmlDefault Values
DFS_HEARTBEAT_INTERVAL_KEYdfs.heartbeat.intervalInterval (s) for DataNode heartbeats to NameNode.hdfs-site.xml3
DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEYdfs.namenode.heartbeat.recheck-intervalInterval for NameNode to recheck DataNode heartbeat status.hdfs-site.xml300000
FS_PROTECTED_DIRECTORIESfs.protected.directories (in core-site.xml for shared FS configurations)Directories protected from accidental deletion or modification.HDFS-9349core-site.xmlNULL
HADOOP_CALLER_CONTEXT_ENABLED_KEYhadoop.caller.context.enabledEnable caller context tracking for debugging.hdfs-site.xmlfalse
DFS_STORAGE_POLICY_SATISFIER_MODE_KEYdfs.storage.policy.satisfier.modeMode for storage policy satisfier (e.g., AUTO, MANUAL).hdfs-site.xmlNONE
DFS_NAMENODE_REPLICATION_MAX_STREAMS_KEYdfs.namenode.replication.max-streamsMaximum concurrent replication streams.hdfs-site.xml2
DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_KEYdfs.namenode.replication.max-streams-hard-limitHard limit for replication streams to prevent overload.hdfs-site.xml4
DFS_NAMENODE_REPLICATION_WORK_MULTIPLIER_PER_ITERATIONdfs.namenode.replication.work.multiplier.per.iterationMultiplier to adjust replication workload per iteration.hdfs-site.xml2
DFS_BLOCK_REPLICATOR_CLASSNAME_KEYdfs.block.replicator.classnameCustom block replicator class.hdfs-site.xmlorg.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault
DFS_BLOCK_PLACEMENT_EC_CLASSNAME_KEYdfs.block.placement.ec.classnameCustom block placement class for erasure coding.HDFS-15120

hdfs-site.xml

(worked with org.apache.hadoop.hdfs.server.blockmanagement. AvailableSpaceBlockPlacementPolicy, for other properties may require restart of namenode)

org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyRackFaultTolerant
DFS_IMAGE_PARALLEL_LOAD_KEYdfs.image.parallel.loadEnable parallel fsimage loading for faster startup.HDFS-15830hdfs-site.xmlfalse
DFS_NAMENODE_AVOID_SLOW_DATANODE_FOR_READ_KEYdfs.namenode.avoid.read.slow.datanodeAvoid slow DataNodes during read operations.hdfs-site.xmlfalse
DFS_NAMENODE_BLOCKPLACEMENTPOLICY_EXCLUDE_SLOW_NODES_ENABLED_KEYdfs.namenode.block-placement-policy.exclude-slow-nodes.enabledExclude slow nodes in block placement policy.hdfs-site.xmlfalse
DFS_NAMENODE_MAX_SLOWPEER_COLLECT_NODES_KEYdfs.namenode.max.slowpeer.collect.nodesMaximum slow peers to track for metrics.hdfs-site.xml5
DFS_BLOCK_INVALIDATE_LIMIT_KEYdfs.block.invalidate.limitMaximum blocks invalidated per iteration.hdfs-site.xml1000
DFS_DATANODE_PEER_STATS_ENABLED_KEYdfs.datanode.peer.stats.enabled (shared with Datanode)Enable peer stats collection (NameNode-side).hdfs-site.xmldefault=false

Configure Runtime Changes

Follow the below steps to configure the runtime changes.

  1. Dynamic Reconfiguration: For Runtime-changeable properties, edit the hdfs-site.xml file located at: /etc/hadoop/conf/.

DataNode exmaple:

Bash
Copy

NameNode example:

Bash
Copy
  1. Verify the configuration status.
Bash
Copy
  1. Permanent Configuration: For the properties that require a restart:
    1. Go to HDFS > Advanced Configurations in Ambari.
    2. Make the necessary changes.
    3. Restart the HDFS service to apply the updates.

Temporary vs. Persistent Configuration

  • Temporary (Runtime) Changes: Runtime reconfigurable properties take effect immediately but are lost after a service restart.
  • Persistent Changes: To make changes permanent, update the hdfs-site.xml file and restart the service.

Log Monitoring

After applying changes, monitor the logs for errors: tail -f /var/log/hadoop/hdfs/*.log.

Cluster Impact

Always validate changes in a non-production environment before applying them to a live cluster.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
  Last updated