Watchdog rebooting iMX8MP out of suspend

Hello,

I don’t understand, is this the deep sleep mode? What’s the output of

cat /sys/power/mem_sleep

The other alternative you propose seems a bit risky because that would imply no watchdog monitoring during the bootloader stage.

Yes, that’s correct. It may be possible to change the bootloader to enable the WDW bit when it’s enabling the watchdog. But that would only be relevant if the issue you’re seeing is the one I reproduced, which I’m not sure is the case.

This is interesting. If it’s the same for @rodring10, then I really don’t know what’s going on. I tried to reproduce the behavior you both are describing on our Reference Image, and the only way that worked was when mem_sleep was set to s2idle, and the motive for this behavior is already described in my previous response.

From my research, the imx2_wdt driver already does exactly this. Please take a look at the imx2_wdt_suspend function:

/* Disable watchdog if it is active or non-active but still running */                                                                 
static int __maybe_unused imx2_wdt_suspend(struct device *dev)                                                                         
{                                                                                                                                      
        struct watchdog_device *wdog = dev_get_drvdata(dev);                                                                           
        struct imx2_wdt_device *wdev = watchdog_get_drvdata(wdog);                                                                     
                                                                                                                                                                                                    
        /* The watchdog IP block is running */                                                                                         
        if (imx2_wdt_is_running(wdev)) {                                                                                                                                                  
                /*                                                                                                                     
                 * Don't update wdog->timeout, we'll restore the current value                                                         
                 * during resume.                                                                                                      
                 */                                                                                                                    
                __imx2_wdt_set_timeout(wdog, IMX2_WDT_MAX_TIME);                                                                       
                imx2_wdt_ping(wdog);                                                                                                   
        }                                                                                                                              
                                                                                                                                       
        if (wdev->no_ping) {                                                                                                           
                clk_disable_unprepare(wdev->clk);                                                                                      
                                                                                                                                                                                          
                wdev->clk_is_on = false;                                                                                               
        }                                                                                                                              
                                                                                                                                       
        return 0;                                                                                                                      
}

I also checked, and in my case the function runs during suspend.

When I was looking into it, back in June, I too put a print inside static int __maybe_unused imx2_wdt_suspend(struct device *dev) to verify that function was being called, and it was.

I remember talking to a few AI bots about the issue. One idea was that something else was holding the clock active that feeds the wdt. I remember stripping my device tree right back to bare bones and I still couldn’t get it to stay asleep.

So it seems what Rodrigo and I have in common is using Buildroot rather than Yocto. I wonder if there is something set by the init system causing it not to work. I’m using BusyBox for init. I could maybe try changing to something else and see if it starts working.

I’ve also loaded in a Toradex image and verified it can sleep indefinitely, so its not related to hardware.

Thinking now, I never actually confirmed that the watchdog was actually running in the Toradex image. Rafael, could you please confirm it is running?

I wonder if its something to do with it not actually getting into the low power state when using busybox as the init system.

I changed uboot to set bit7 which is suspend timer in wait hoping that would fix it, but still no luck.

Yes, the watchdog is running, I also confirmed that when I was trying to reproduce the issue described here.

I think this would only make a difference if your sleep mode was s2idle. In deep mode it’s not necessary, as can be seen on our Reference Image.

During my tests, one hypothesis I had was that your kernel configuration might be different from ours. I removed the WDT driver completely from the kernel, and I still got the same result of being able to sleep indefinitely. Without the WDT driver, the module will be reset by the watchdog constantly unless it’s sleeping. The suspend function of the WDT driver will not be called in this case.

This somewhat supports the idea there’s a shared clock somewhere that’s being disabled by our Reference Image on suspend, but it’s not being disabled in your case.

Thank you @rafael.tx and @phil for your comments.
Rafael, what I meant in my previous post is that my system currently has sleep mode = deep.

I couldn’t find a solution so far.
I wonder if you guys at Toradex are applying any patches during the build process that could change the behaviour for your image compared to the one Phil and I are building with Buildroot?

If I run the following:

# echo core > /sys/power/pm_test
# echo mem > /sys/power/state

I get

[  613.423667] PM: suspend entry (deep)
[  613.434093] Filesystems sync: 0.010 seconds
[  613.465858] Freezing user space processes
[  613.467220] Freezing user space processes completed (elapsed 0.001 seconds)
[  613.467233] OOM killer disabled.
[  613.467237] Freezing remaining freezable tasks
[  613.468441] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[  613.468451] printk: Suspending console(s) (use no_console_suspend to debug)
[  613.496879] imx-dwmac 30bf0000.ethernet eth0: Link is Down
[  613.497611] imx-dwmac 30bf0000.ethernet eth0: FPE workqueue stop
[  613.600764] Disabling non-boot CPUs ...
[  613.601835] psci: CPU1 killed (polled 4 ms)
[  613.603713] psci: CPU2 killed (polled 0 ms)
[  613.605469] psci: CPU3 killed (polled 0 ms)
[  613.605853] PM: suspend debug: Waiting for 5 second(s).
[  613.605854] Enabling non-boot CPUs ...
[  613.606222] Detected VIPT I-cache on CPU1
[  613.606248] GICv3: CPU1: found redistributor 1 region 0:0x00000000388a0000
[  613.606276] CPU1: Booted secondary processor 0x0000000001 [0x410fd034]
[  613.606731] CPU1 is up
[  613.607061] Detected VIPT I-cache on CPU2
[  613.607080] GICv3: CPU2: found redistributor 2 region 0:0x00000000388c0000
[  613.607098] CPU2: Booted secondary processor 0x0000000002 [0x410fd034]
[  613.607501] CPU2 is up
[  613.607824] Detected VIPT I-cache on CPU3
[  613.607841] GICv3: CPU3: found redistributor 3 region 0:0x00000000388e0000
[  613.607857] CPU3: Booted secondary processor 0x0000000003 [0x410fd034]
[  613.608309] CPU3 is up
[  613.613083] caam 30900000.crypto: registering rng-caam
[  613.614098] imx-dwmac 30bf0000.ethernet eth0: configuring for phy/rgmii-id link mode
[  613.626956] imx-dwmac 30bf0000.ethernet eth0: No Safety Features support found
[  613.626979] imx-dwmac 30bf0000.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported
[  613.627132] imx-dwmac 30bf0000.ethernet eth0: FPE workqueue start
[  613.656618] flexcan 308c0000.can can0: can_put_echo_skb: BUG! echo_skb 0 is occupied!
[  613.656814] flexcan 308d0000.can can1: can_put_echo_skb: BUG! echo_skb 0 is occupied!
[  613.659603] usb-conn-gpio 38100000.usb:connector: repeated role: device
[  613.714077] imx-uart 30860000.serial: RX flood detected: soft reset.
[  613.722081] imx-uart 30860000.serial: RX flood detected: soft reset.
[  613.722126] imx-uart 30860000.serial: RX flood detected: soft reset.
[  613.722212] imx-sdma 30bd0000.dma-controller: restart cyclic channel 1
[  613.722262] OOM killer enabled.
[  613.722265] Restarting tasks ... done.
[  613.723451] random: crng reseeded on system resumption
[  613.727248] PM: suspend exit
[  615.936495] imx-dwmac 30bf0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx

Which to me looks normal (but I am no expert).

Also here are my watchdog register contents:

# devmem 0x30280000 16
0x77BF
# devmem 0x30280002 16
0xAAAA
# devmem 0x30280004 16
0x0010
# devmem 0x30280006 16
0x0004

I wonder if I find where exactly in the kernel it puts the system into the low power mode which should cause the watchdog to halt, and verify its actually going into that state.

Hi Phil,

So, in my case these are the values for the watchdog registers:

# devmem 0x30280000 16
0x773F
# devmem 0x30280002 16
0xAAAA
# devmem 0x30280004 16
0x0010
# devmem 0x30280006 16
0x0004

So, the first value is different to yours.

On another hand, I can’t run

echo core > /sys/power/pm_test

and when I run

echo mem > /sys/power/state

I get the lines below and after some time it reboots:

[  177.466772] PM: suspend entry (deep)
[  177.481487] Filesystems sync: 0.011 seconds
[  177.487160] Freezing user space processes
[  177.492452] Freezing user space processes completed (elapsed 0.001 seconds)
[  177.499450] OOM killer disabled.
[  177.502692] Freezing remaining freezable tasks
[  177.508355] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[  177.515777] printk: Suspending console(s) (use no_console_suspend to debug)

We are not applying patches on top of the downstream kernels as can be seen here:

All patches are committed to our downstream kernel repository, which is used directly on the yocto recipe. Here is the source:

We should maybe compare the kernel configs; perhaps there’s something missing there.

Here’s the config file I took out of the running module:

config.gz (45.1 KB)

Hello @rodring10 , @phil,

Did you compare your kernel configuration to the one I posted? Have you progressed on solving this issue?

I just wanted to add that beginning on BSP 7 we started using a kernel cache configuration repository, which you can use to base your configurations on:

I run the following small script from the kernel directory to configure it for manual build. This assumes that the kernel cache repository was cloned at the home directory:

export TDX_BASE_KCONFIG=~/toradex-kernel-cache/cfg/base/base.cfg                                                       
export TDX_ARM_KCONFIG=~/toradex-kernel-cache/cfg/$1/$1.cfg                                                            
./scripts/kconfig/merge_config.sh $TDX_BASE_KCONFIG $TDX_ARM_KCONFIG 

Best regards,

Rafael

Hello @rafael.tx, thanks for all your help.

I am currently working on some other stuff right now (the application running on the Linux) but will get back to the watchdog thing in a few days and let you know.

I was able to do some quick comparisons and noticed the kernel configuration you attached has a few different settings regarding the Watchdog compared to mine so I want to modify my config file and run some tests.

Regards

Hi Guys,

I too am distracted with other tasks, but will be getting back to focusing on this one soon.

I just compared config, there were some differences. So I took your whole config, compiled with it and tested. The device still wouldn’t stay asleep.

Can you please post your bootloader config, I can compare that too?

Thanks,

Phil

Can you please post your bootloader config, I can compare that too?

The bootloader configuration is part of the U-boot repository:

Hi Rafael,

So far I have, at the same time:

  • Used your kernel config
  • Used your bootloader config
  • Used your imx8mp.dtsi and imx8mp-verdin.dtsi files

Still no success.

To be clear, when I put it to sleep. I power it on, then simply execute echo mem > /sys/power/state , then it immediately sleeps. It then reboots 128 seconds later.

It could be related to the init system or mdev?

I will also update to using the exact same kernel and bootloader as you. I last updated 6 months ago now (3 months prior to posting this). I only just considered that it could have been something fixed in-between times.

Any further ideas or anyone else you could ask?

Thanks,
Phil

ok I now have toradex_6.6-2.2.x-imx with the config you provided, and uboot toradex_imx_lf_v2024.04 with the verdin-imx8mp_defconfig,
still no luck

Hey guys,

So, on our end we have compared the kernel.config used by Toradex (attached above by Rafael) to the one we are using, and although there are some differences, it seems none of them are very relevant to the issue.
Also, because Phil has tried with the kernel configuration file provided by Rafael and it is still not working, then probably the issue is not there?

In our case we are theoretically already using same u-boot, kernel and device tree as used by Toradex on their Yocto image, with no luck.

Currently we have implemented a sleep module that sleeps in chunks of 100 secs to make sure the Watchdog doesn’t reboot the board but it is not ideal to wake up 36 times in an hour as it affects power consumption (which we still need to quantify).

Another piece to the puzzle. I loaded on the easy installer recovery kernel.

This appears to be not using systemd (/etc/init.d/ present), Linux 5.5 and it uses the imx2_wdt.c driver for the watchdog. It sets the watchdog time down to 60s too.

I verified the watchdog kernel module was running too.

This software setup can sleep indefinitely.

Linux 5.5 was quite a while ago so it can’t be a recent change that has added support.

I wonder if by chance, Rodrigo and I have something in user space that is preventing the system to going properly to sleep.

I’ll do some tests killing off everything before sleeping and will report back.

Just killed everything apart from kernel tasks and init and it still reboots from sleep.

Running low on ideas to try now.

I’m also running low on ideas. The only way I could reproduce something similar to what you’re seeing is when the sleep mode was set to s2idle.
I doubt that the init system could have an influence, but at this point I wouldn’t be too surprised either.

What I understood from my analysis of the driver is the following:

On probe, it will check whether the watchdog is running and set up the watchdog subsystem to start pinging the watchdog.

I can offer you the patch I created to add some debugging information to the driver. Maybe if you add it to your kernel and post the outputs when it runs, we can get other ideas of what’s going on.

imxwdt_debug.patch (3.2 KB)

# dmesg | grep wdt
[    0.175347] imxwdt: imx2_wdt_probe:331
[    0.175354] imxwdt: imx2_wdt_probe:337
[    0.175360] imxwdt:  running

EDIT: Ignore this message, it was only working because the watchdog wasn’t actually running lol. I incorrectly assumed it would start the dog if it wasn’t running.

!!!

Okay i think I understand what is going on.

If i disable the watchdog support in u-boot, then let linux start the dog with the fsl,suspend-in-wait flag set, it sets bit 7 WDW

WDW
Watchdog Disable for Wait. This bit determines the operation of WDOG during Low Power WAIT mode.
This is a write once only bit.
0 - Continue WDOG timer operation (Default).
1 - Suspend WDOG timer operation.

Then everything works.

However, the u-boot driver doesn’t even know about the WDW flag, even in master: u-boot/include/fsl_wdog.h at master · u-boot/u-boot · GitHub

What I don’t understand is this. The u-boot config you linked me earlier has the watchdog turned on and autostarting. It’s a once only write register, so how can you have WDW set?

Is there something else overriding this config as part of your yocto build? Or something in the uboot environment.

Anyway, I don’t think it really matters, but it would be good to understand.

Thanks again for your help, I really appreciate it.

I’m going to patch uboot so it sets that flag and see if that works. Will report back.