Hello folks, after upgrading from BSP 5.7.3 to 6.4.0 (Colibri iMX7D) I have a weird behaviour of my systemd
services which are using the watchdog functionality related to suspend/resume. When resuming the system from suspend mode after a timespan exceeding the watchdog limit, services are considered as stuck and are killed by systemd
immediately.
I’ve also implemented a thread-level supervision in C++ using deadlines on the std::chrono::steady_clock
which was working on BSP 5.7.3 and now shows a similar behaviour like systemd
. Thus the issues seems to be connected somehow.
I guess that the monotonic clock behaves differently with kernel 6, more precise it seems to keep running in suspend mode, but I was unable to track the issue down. Does anybody have similar problems or any clue how to deal with it?
Cheers, Marc
Hi @marc.windisch ,
I assume you’re referring to the software watchdog interface systemd can use for its services i.e. not the hardware one that can reboot the SoM.
I guess that the monotonic clock behaves differently with kernel 6, more precise it seems to keep running in suspend mode, but I was unable to track the issue down. Does anybody have similar problems or any clue how to deal with it?
I think you’re on the right track here. Searching about your issue I found a bug report on RHEL that may be the same problem you’re having:
In summary when using the newer sleep mode called s2idle
the kernel’s monotonic clock can be resumed by some interrupts that wake up the kernel but not the entire system. According to the link above this can be solved by using the old deep
sleep mode.
Checking our BSP 6 minimal reference image it uses s2idle
by default:
root@colibri-imx7-emmc-06674594:~# cat /etc/os-release
ID=tdx-xwayland-upstream
NAME="TDX Wayland with XWayland Upstream"
VERSION="6.4.0+build.8 (kirkstone)"
VERSION_ID=6.4.0-build.8
PRETTY_NAME="TDX Wayland with XWayland Upstream 6.4.0+build.8 (kirkstone)"
DISTRO_CODENAME="kirkstone"
root@colibri-imx7-emmc-06674594:~# cat /sys/power/mem_sleep
[s2idle]
Whereas our BSP 5 images (both downstream and upstream kernel versions) uses deep
:
root@colibri-imx7-emmc-06674594:~# cat /etc/os-release
ID=tdx-xwayland
NAME="TDX Wayland with XWayland"
VERSION="5.7.2+build.21 (dunfell)"
VERSION_ID=5.7.2-build.21
PRETTY_NAME="TDX Wayland with XWayland 5.7.2+build.21 (dunfell)"
DISTRO_CODENAME="dunfell"
root@colibri-imx7-emmc-06674594:~# cat /sys/power/mem_sleep
s2idle shallow [deep]
root@colibri-imx7-emmc-06674594:~# cat /etc/os-release
ID=tdx-xwayland-upstream
NAME="TDX Wayland with XWayland Upstream"
VERSION="5.7.2+build.21 (dunfell)"
VERSION_ID=5.7.2-build.21
PRETTY_NAME="TDX Wayland with XWayland Upstream 5.7.2+build.21 (dunfell)"
DISTRO_CODENAME="dunfell"
root@colibri-imx7-emmc-06674594:~# cat /sys/power/mem_sleep
s2idle [deep]
Can you try changing the sleep mode and see if that solves your problem?
Best regards,
Lucas Akira
Hi @lucas_a.tx, thanks for the reply!
Your cat
on /sys/power/mem_sleep
shows that s2idle
is not only the default but also the only sleep mode available. So switching the mode is not possible.
I tried to reconfigure the kernel, but I did not find any options to enable additional modes. Was sleep mode deep
dropped in kernel 6?
Cheers, Marc
Hi @marc.windisch ,
Your cat
on /sys/power/mem_sleep
shows that s2idle
is not only the default but also the only sleep mode available. So switching the mode is not possible.
You’re right, I didn’t notice s2idle
was the only option on BSP 6.
I tried to reconfigure the kernel, but I did not find any options to enable additional modes. Was sleep mode deep
dropped in kernel 6?
I don’t think the S3/deep sleep mode was dropped from the kernel. The BSP 5 upstream ref. minimal image and BSP 6 have pretty much the same configs enabled related to suspend, so I don’t think it’s a kernel config missing either.
BSP 5 Upstream configs:
root@colibri-imx7-emmc-06674594:~# cat /etc/os-release
ID=tdx-xwayland-upstream
NAME="TDX Wayland with XWayland Upstream"
VERSION="5.7.2+build.21 (dunfell)"
VERSION_ID=5.7.2-build.21
PRETTY_NAME="TDX Wayland with XWayland Upstream 5.7.2+build.21 (dunfell)"
DISTRO_CODENAME="dunfell"
root@colibri-imx7-emmc-06674594:~# zcat /proc/config.gz | grep -i suspend
CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
# CONFIG_SUSPEND_SKIP_SYNC is not set
CONFIG_PM_TEST_SUSPEND=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARM_CPU_SUSPEND=y
CONFIG_OLD_SIGSUSPEND3=y
# CONFIG_BT_HCIBTUSB_AUTOSUSPEND is not set
CONFIG_USB_AUTOSUSPEND_DELAY=2
BSP 6 configs:
root@colibri-imx7-emmc-06674594:~# cat /etc/os-release
ID=tdx-xwayland-upstream
NAME="TDX Wayland with XWayland Upstream"
VERSION="6.4.0+build.8 (kirkstone)"
VERSION_ID=6.4.0-build.8
PRETTY_NAME="TDX Wayland with XWayland Upstream 6.4.0+build.8 (kirkstone)"
DISTRO_CODENAME="kirkstone"
root@colibri-imx7-emmc-06674594:~# zcat /proc/config.gz | grep -i suspend
CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
# CONFIG_SUSPEND_SKIP_SYNC is not set
CONFIG_PM_TEST_SUSPEND=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARM_CPU_SUSPEND=y
CONFIG_OLD_SIGSUSPEND3=y
CONFIG_USB_AUTOSUSPEND_DELAY=2
I’ll ask the team internally about it for a more thorough investigation on this matter. Does this issue block your development?
Best regards,
Lucas Akira
Hi @lucas_a.tx,
this is indeed blocking us in a way we had to remove the functionality from our code. Doing so can be just a temporary workaround, due to we hurt our requirements by disabling thread supervision. This is a medical device close to its final release, so we’re quite in a hurry to solve this issue.
Cheer, Marc
Hi @marc.windisch ,
Can you do a quick test on BSP 5 to see if this issue occurs if you change the suspend-to-RAM mode to s2idle
? If it does then this newer sleep mode is most likely the cause of this problem.
I’ll see if we can reproduce this on our side in the following days.
Best regards,
Lucas Akira
@marc.windisch
We picked this up as a bug, but I was told this may be a little more complicated to solve and it will probably take some time. At the moment, I cannot give a timeline for when this is going to be solved. We will keep you informed of the progress.
Best regards,
Rafael
Hi @lucas_a.tx and @rafael.tx,
first of all, sorry for the late reply. I was too busy to check anything because of our release, which was shipped without watchdog functionality, unfortunately.
However, within the last days, I conducted some tests to pin the issue down.
In my original post I mentioned two problems: my C++ thread watcher implementation and systemd
.
C++ part
For my thread watcher I fixed the issue by implementing suspend/resume methods that flush the priority queue holding the deadlines on suspend, and re-calculate the deadlines at resume. I’ve connected to the D-Bus and subscribed to systemd-logind
’s PrepareForSleep signal to automatically suspend and resume thread watching, so the current implementation does not care about clocks that continue running. The thread watcher is also responsible for generation of heartbeats for systemd
by invoking sd_notify(0, "WATCHDOG=1");
every 5s. On resume, the watchdog is immediately reset. This solves the issue in the part of code that is under my control.
systemd part
I backported everything to BSP 5.7.3 and switched the sleep mode to s2idle
. Using this mode, BSP 5 behaves exactly the same as BSP 6. In my test scenario the system is mostly sleeping, but is woken up every two minutes. As long as a GPIO is not pulled low, the system would fall asleep again after 10s. Keeping the board awake after a while and checking the logs results in various restarts of the all services with WatchdogSec=x
parameter set due to the watchdog:
Nov 17 15:07:35 connectivity-node systemd[1]: xxx.service: Failed with result 'watchdog'.
Nov 17 15:14:07 connectivity-node systemd[1]: xxx.service: Failed with result 'watchdog'.
Nov 17 15:14:07 connectivity-node systemd[1]: yyy.service: Failed with result 'watchdog'.
Nov 17 15:16:19 connectivity-node systemd[1]: zzz.service: Failed with result 'watchdog'.
Nov 17 15:16:19 connectivity-node systemd[1]: xxx.service: Failed with result 'watchdog'.
...
For me it seems like systemd
cannot deal with the monotonic clock keeping running during suspend to idle. I verified that my implementation instantly resets systemd
’s watchdog after resume, so it must be a deadline miscalculation on systemd
side, maybe the same problem I had to solve in my C++ code.
Thank you for picking this up as a bug, I’m locking forward for your feedback 
Cheers, Marc
thank you for the information.
Just to be clear, in our test scenario, whenever we started the watchdog on systemd by adding
WatchdogSec=30
to /etc/systemd/system.conf
and put the system to sleep, it would reset during sleep because the watchdog timer expired.
Your test on BSP 5 also seems to confirm that this is related to s2idle
and what we’re going to investigate is why there’s no deep
sleep option on BSP 6 anymore.
Best regards,
Rafael