Troubleshooting premature swapping
If you come across any of these issues in your ODP ecosystem, follow the instructions mentioned below to understand the issue in-depth and troubleshoot it.
- Premature swapping due to a change in swap behavior between RHEL 7 and RHEL 8 kernels.
- High consumer lag on the newly deployed ODP-Kafka cluster on Rocky Linux 8.9 and Rocky Linux 8.5 servers.
Observation
High input or output wait and swap usage of 100% leads to slowness in Kafka consumers. The same is not observed on Centos7 and RHEL7 servers.
Root Cause Analysis (RCA)
This issue is related to a change in swap behavior between RHEL 7 and RHEL 8 kernels. For more information, see The change in swap behavior between RHEL 7 and RHEL 8 kernels.
Impact
The RHEL 8.x host systems can experience swap space usage, even though the swap behavior is very low and there is plenty of memory available. This issue significantly slows down:
- The application behavior
- Pushing in-memory activity to swap space on the device (RHEL swap space on disk or flash storage).
Description
Numerous virtual memory management changes were made between RHEL 7 and RHEL 8 to account for much faster storage subsystem interaction with virtual memory operations.
One of the changes involved the RHEL 8 swappiness algorithm. The swappiness setting in the algorithm was changed and requires resetting back to the RHEL 7 default value.
In RHEL 9, the swappiness algo was reverted back to the RHEL 7 default value and does not require the following-mentioned fix.
All the existing cgroups have memory.swappiness=60 in RHEL 8 compared to the default value of 1 in the RHEL 7 servers.
There are multiple workarounds available to address this issue of Premature swapping while there are still plenty of page cache to be reclaimed. However, Acceldata has focussed on the below steps to avoid migrating to cgroup v2 and rebooting the servers as these are runtime changes.
- Adjust all memory cgroups' swappiness to the desired value, here considering the value 1 to be optimal for Kafka servers. Command: for cgfile in $(find /sys/fs/cgroup -name *swappiness); do cat /proc/sys/vm/swappiness > $cgfile; done
- Use the following commands to disable or re-enable swapping.
- To disable: sudo swapoff -a
- To re-enable: sudo swapon -a