Configuring Dynamic Management for HDFS NameNode and DataNode
In some scenarios, it is crucial to mark specific configuration properties as reconfigurable, allowing them to be updated dynamically at runtime without restarting critical Hadoop components like the NameNode or DataNodes. This capability is especially beneficial in large, production-grade HDFS clusters, where restarts can be disruptive—impacting system availability, client throughput, and active data pipeline jobs.
This configuration helps optimize HDFS performance and reliability while ensuring minimal service downtime.
Reconfigurable properties allow administrators to fine-tune system behavior in response to changing workloads, performance bottlenecks, or operational incidents—without compromising high availability SLAs or triggering failovers.
To view the list of properties that can be dynamically reconfigured, run the following command:
hdfs dfsadmin -reconfig datanode <datanode_host>:8010 properties
hdfs dfsadmin -reconfig namenode <namenode_host>:8020 properties
DataNode Properties
PropertyName | ConfigName | Description | Related Apache Jira | site.xml | Default Values |
---|---|---|---|---|---|
DFS_DATANODE_DATA_DIR_KEY | dfs.datanode.data.dir | Directories where DataNode stores HDFS block data. | HDFS-6727 | hdfs-site.xml | Configured as per cluster |
DFS_DATANODE_BALANCE_MAX_NUM_CONCURRENT_MOVES_KEY | dfs.datanode.balance.max.concurrent.moves | Maximum concurrent block moves during rebalancing. | hdfs-site.xml | 100 | |
DFS_BLOCKREPORT_INTERVAL_MSEC_KEY | dfs.blockreport.intervalMsec | Interval (ms) for block reports sent to NameNode. | hdfs-site.xml | 21600000 | |
DFS_BLOCKREPORT_SPLIT_THRESHOLD_KEY | dfs.blockreport.split.threshold | Threshold to split large block reports. | hdfs-site.xml | 1000000 | |
DFS_BLOCKREPORT_INITIAL_DELAY_KEY | dfs.blockreport.initialDelay | Initial delay (ms) before the first block report after startup. | hdfs-site.xml | 120 (ODP) | |
DFS_DATANODE_MAX_RECEIVER_THREADS_KEY | dfs.datanode.max.transfer.threads | Maximum threads for receiving block data. | hdfs-site.xml | 16384(ODP) | |
DFS_CACHEREPORT_INTERVAL_MSEC_KEY | dfs.cachereport.intervalMsec | Interval (ms) for cache reports. | hdfs-site.xml | 10000 | |
DFS_DATANODE_PEER_STATS_ENABLED_KEY | dfs.datanode.peer.stats.enabled | Enable peer statistics collection. | hdfs-site.xml | false | |
DFS_DATANODE_MIN_OUTLIER_DETECTION_NODES_KEY | dfs.datanode.min.outlier.detection.nodes | Minimum nodes required for peer outlier detection. | Require dfs.datanode.peer.stats.enabled =true | hdfs-site.xml | 10 |
DFS_DATANODE_SLOWPEER_LOW_THRESHOLD_MS_KEY | dfs.datanode.slowpeer.low.threshold.ms | Threshold (ms) to classify a peer as slow. | Require dfs.datanode.peer.stats.enabled =true | hdfs-site.xml | 5 |
DFS_DATANODE_PEER_METRICS_MIN_OUTLIER_DETECTION_SAMPLES_KEY | dfs.datanode.peer.metrics.min.outlier.detection.samples | Minimum samples for peer metrics outlier detection. | Require dfs.datanode.peer.stats.enabled =true | hdfs-site.xml | 1000 |
DFS_DATANODE_FILEIO_PROFILING_SAMPLING_PERCENTAGE_KEY | dfs.datanode.fileio.profiling.sampling.percentage | Percentage of file IO operations to profile. |
Require | hdfs-site.xml | 0 |
DFS_DATANODE_OUTLIERS_REPORT_INTERVAL_KEY | dfs.datanode.outliers.report.interval | Interval for reporting outlier metrics. | Require dfs.datanode.fileio.profiling.sampling.percentage to be reconfigured with some value first. | hdfs-site.xml | 30m |
DFS_DATANODE_MIN_OUTLIER_DETECTION_DISKS_KEY | dfs.datanode.min.outlier.detection.disks | Minimum disks required for disk outlier detection. | Require dfs.datanode.fileio.profiling.sampling.percentage to be reconfigured with some value first. | hdfs-site.xml | 5 |
DFS_DATANODE_SLOWDISK_LOW_THRESHOLD_MS_KEY | dfs.datanode.slowdisk.low.threshold.ms | Threshold (ms) to classify a disk as slow. | Require dfs.datanode.fileio.profiling.sampling.percentage to be reconfigured with some value first. | hdfs-site.xml | 20 |
FS_DU_INTERVAL_KEY | fs.du.interval | Interval for disk usage calculation. | core-site.xml | 600000 | |
FS_GETSPACEUSED_JITTER_KEY | fs.getspaceused.jitter | Jitter interval to stagger getSpaceUsed calls. | core-site.xml | 60000 | |
FS_GETSPACEUSED_CLASSNAME | fs.getspaceused.classname | Custom class for getSpaceUsed implementation. | HDFS-16457 | The following four implementation classes are supported for tracking disk space usage:
| NULL |
NameNode Properties
PropertyName | ConfigurationName | Description | Related Apache Jira | site.xml | Default Values |
---|---|---|---|---|---|
DFS_HEARTBEAT_INTERVAL_KEY | dfs.heartbeat.interval | Interval (s) for DataNode heartbeats to NameNode. | hdfs-site.xml | 3 | |
DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY | dfs.namenode.heartbeat.recheck-interval | Interval for NameNode to recheck DataNode heartbeat status. | hdfs-site.xml | 300000 | |
FS_PROTECTED_DIRECTORIES | fs.protected.directories (in core-site.xml for shared FS configurations) | Directories protected from accidental deletion or modification. | HDFS-9349 | core-site.xml | NULL |
HADOOP_CALLER_CONTEXT_ENABLED_KEY | hadoop.caller.context.enabled | Enable caller context tracking for debugging. | hdfs-site.xml | false | |
DFS_STORAGE_POLICY_SATISFIER_MODE_KEY | dfs.storage.policy.satisfier.mode | Mode for storage policy satisfier (e.g., AUTO , MANUAL ). | hdfs-site.xml | NONE | |
DFS_NAMENODE_REPLICATION_MAX_STREAMS_KEY | dfs.namenode.replication.max-streams | Maximum concurrent replication streams. | hdfs-site.xml | 2 | |
DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_KEY | dfs.namenode.replication.max-streams-hard-limit | Hard limit for replication streams to prevent overload. | hdfs-site.xml | 4 | |
DFS_NAMENODE_REPLICATION_WORK_MULTIPLIER_PER_ITERATION | dfs.namenode.replication.work.multiplier.per.iteration | Multiplier to adjust replication workload per iteration. | hdfs-site.xml | 2 | |
DFS_BLOCK_REPLICATOR_CLASSNAME_KEY | dfs.block.replicator.classname | Custom block replicator class. | hdfs-site.xml | org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault | |
DFS_BLOCK_PLACEMENT_EC_CLASSNAME_KEY | dfs.block.placement.ec.classname | Custom block placement class for erasure coding. | HDFS-15120 | hdfs-site.xml (worked with org.apache.hadoop.hdfs.server.blockmanagement. AvailableSpaceBlockPlacementPolicy, for other properties may require restart of namenode) | org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyRackFaultTolerant |
DFS_IMAGE_PARALLEL_LOAD_KEY | dfs.image.parallel.load | Enable parallel fsimage loading for faster startup. | HDFS-15830 | hdfs-site.xml | false |
DFS_NAMENODE_AVOID_SLOW_DATANODE_FOR_READ_KEY | dfs.namenode.avoid.read.slow.datanode | Avoid slow DataNodes during read operations. | hdfs-site.xml | false | |
DFS_NAMENODE_BLOCKPLACEMENTPOLICY_EXCLUDE_SLOW_NODES_ENABLED_KEY | dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled | Exclude slow nodes in block placement policy. | hdfs-site.xml | false | |
DFS_NAMENODE_MAX_SLOWPEER_COLLECT_NODES_KEY | dfs.namenode.max.slowpeer.collect.nodes | Maximum slow peers to track for metrics. | hdfs-site.xml | 5 | |
DFS_BLOCK_INVALIDATE_LIMIT_KEY | dfs.block.invalidate.limit | Maximum blocks invalidated per iteration. | hdfs-site.xml | 1000 | |
DFS_DATANODE_PEER_STATS_ENABLED_KEY | dfs.datanode.peer.stats.enabled (shared with Datanode) | Enable peer stats collection (NameNode-side). | hdfs-site.xml | default=false |
Configure Runtime Changes
Follow the below steps to configure the runtime changes.
- Dynamic Reconfiguration: For Runtime-changeable properties, edit the
hdfs-site.xml
file located at:/etc/hadoop/conf/
.
DataNode exmaple:
hdfs dfsadmin -reconfig datanode datanode_host:8010 start
NameNode example:
hdfs dfsadmin -reconfig namenode namenode_host:8020 start
- Verify the configuration status.
hdfs dfsadmin -reconfig datanode datanode_host:8010 status
hdfs dfsadmin -reconfig namenode namenode_host:8020 status
- Permanent Configuration: For the properties that require a restart:
- Go to HDFS > Advanced Configurations in Ambari.
- Make the necessary changes.
- Restart the HDFS service to apply the updates.
Temporary vs. Persistent Configuration
- Temporary (Runtime) Changes: Runtime reconfigurable properties take effect immediately but are lost after a service restart.
- Persistent Changes:
To make changes permanent, update the
hdfs-site.xml
file and restart the service.
Log Monitoring
After applying changes, monitor the logs for errors: tail -f /var/log/hadoop/hdfs/*.log.
Cluster Impact
Always validate changes in a non-production environment before applying them to a live cluster.