Kswapd0 hogging CPU, freezing device

I have a problem with the kswapd0 hogging CPU and freezing the system. Has anyone else dealt with this problem?

I am running Linux on a Colibri iMX8x:
Linux colibri-imx8x 5.15.77-0+git.ddc6ca4d76ea #1 SMP PREEMPT Thu Jun 29 10:14:22 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

If our system is left running long enough, it freezes and stops responding. The user interface stops moving or responding, and the target machine cannot be accessed. This problem happens whether a monitoring run is underway or not. Sometimes it only takes an hour or two of operation, sometimes longer. Sometimes the system recovers and starts running again, but not always.

Running top until the problem happens shows that kswapd0 suddenly uses most or all of CPU time. kswapd0 is the swapping process. I looked up possible options to address the issue.

One option is to edit /etc/sysctl.conf (vm.swappiness=0 ) to change the swapping threshold from 60% to 0%, so that swapping is not done until absolutely necessary. I tried this approach. It seemed to delay the problem, but not fix it.

A second proposed option is to enable the drop cache:

echo 1 | sudo tee /proc/sys/vm/drop_caches

But this is a one-time solution, to be applied as needed. But we need the system to run perpetually, and unattended.

The third option I found is to increase the size of the swap file, e.g.

    sudo swapoff /swapfile

But I don’t see that a swap file exists.

Another approach might be to find the cause of swapping, and reduce it somehow. But before digging through that, I would like to hear from anyone else with insight.

"When kswapd0 dominates the CPU, it typically indicates that the system is actively working to free up memory. You’ve adjusted vm.swappiness to 0, making the kernel less inclined to swap. This can potentially postpone the issue if it’s swap-related, but it’s not a complete solution. It’s crucial to pinpoint which processes are using the most memory, especially moments before the system freezes.

Is there a specific application running when the system has been “left running long enough”?

To monitor memory usage, consider tools like free -m, vmstat, and htop. Inspect the kernel logs using dmesg or journalctl -k to identify any warnings or errors, particularly those associated with memory or storage.

It’s important to mention that creating a swap file on a flash-based drive isn’t recommended. This can lead to very fast flash degradation.