Enable HW Watchdog for Apalis TK1 at Boot

Florian_K · May 3, 2017, 6:07am

How can I enable the HW watchdog for the Apalis TK1 for SoM version V1.1A?

Some precondition: I can not use systemd

To use the hw watchdog I would assume something to be done like follows (with regards to instructions for the colibri v61):

enable the hw watchdog in u-boot by executing in the u-boot prompt something like mw.w 0x4003e000 xx34 1) (detects: crash of kernel during boot)
add the driver of the watchdog to the device-tree
configure the Linux driver that it does not reset the watchdog settings to “watchdog disabled” at begin of startup
configure the kernel to softlookup panic with one of (a) a sysctl, (b) “kernel.softlockup_panic”, (c) a kernel parameter, (d) “softlockup_panic” (see “Documentation/admin-guide/kernel-parameters.rst” for details), and (e) a compile option, “BOOTPARAM_SOFTLOCKUP_PANIC” (detects: softlockup = bug that causes the kernel to loop in kernel mode for more than 20 seconds without giving other tasks a chance to run → timer/interrrupt can still service watchdog)
configure the kernel to hardlockup panic with one of (a) a sysctl, (b) ‘hardlockup_panic’, (c) a compile time knob, (d) “BOOTPARAM_HARDLOCKUP_PANIC”, (e) and a kernel parameter “nmi_watchdog” (see “Documentation/admin-guide/kernel-parameters.rst” for details). (detects: ‘hardlockup’ = bug that causes the CPU to loop in kernel mode for more than 10 seconds without letting other interrupts have a chance to run. → timer/interrupt can not service watchdog)
boot kernel with boot option something like tegra_wdt.heartbeat=$seconds that the kernel services the watchdog with a kernel level timer
service watchdog during init with init process
user space application which services watchdog from user space regularly (detects: crash of kernel, crash of this user space application, user space application process not executed regularly)

BTW: Is the Watchdog (Linux) article also valid for the TK1?

dominik.tx · May 3, 2017, 10:50am

Watchdog is enabled by default during kernel boot.
Yes, most of tegra section in the watchdog article applies to TK1.

Florian_K · May 3, 2017, 11:08am

For the TK1 do I have to make sure that the watchdog is not disabled during startup like with the colibri v61?

dominik.tx · May 3, 2017, 12:07pm

By default we build our TK1 kernels with option CONFIG_TEGRA_WATCHDOG_ENABLE_ON_PROBE enabled, which leaved watchdog re-enabled after probe is done with it’s configuration. if you’re using custom kernel just make sure that this config option is enabled.

Florian_K · May 3, 2017, 1:52pm

Ok

(Note: the option CONFIG_TEGRA_WATCHDOG_ENABLE_ON_PROBE was renamed to TEGRA_WATCHDOG_ENABLE_HEARTBEAT)

Then the following should be a TK1 valid way to implement a watchdog for kernel panic (but is it softlockup or hardlockup?):

"When using the kernel level heartbeat, the Kernel will not necessarily reboot when a kernel panic occurs since interrupts might still be handled. In order to reboot on kernel panic, use the command line option panic= or the sysctrl.conf option "kernel.panic = “.”

Florian_K · May 9, 2017, 5:14pm

@dominik.tx how can I test that easiest?

dominik.tx · May 10, 2017, 9:49am

To test kernel panic you’ll need to recompile the kernel with CONFIG_MAGIC_SYSRQ=y
then all you need to do is
echo c > /proc/sysrq-trigger
to test hw hang you can try accessing inaccessible address with devmem2, e.g.:
devmem2 0x70009000 w
(NOR peripheral that is by default in reset and unclocked)

Florian_K · May 10, 2017, 3:50pm

@dominik.tx Without recompiling the kernel with CONFIG_MAGIC_SYSRQ=y and without triggering sysrq the HW hang triggered with sudo devmem2 0x70009000 w should be detected by the watchdog, right? What is the TK1 default kernel behaviour then?

dominik.tx · May 10, 2017, 3:59pm

At that point chip is locked and no kernel code will be executed. Watchdog will reset the SoC after a timeout.

Florian_K · May 10, 2017, 4:25pm

@dominik.tx I try just to get around the recompilation of the kernel with CONFIG_MAGIC_SYSRQ=y somehow. This would be not required if the behaviour of the watchdog would be the same for (a) sudo devmem2 0x70009000 w and for (b) echo c > /proc/sysrq-trigger (system crash by a NULL pointer dereference https://web.archive.org/web/20160816230132/https://www.kernel.org/doc/Documentation/sysrq.txt). I assume that (a) and (b) leads to the same behavior/result and I should be fine with just testing with sudo devmem2 0x70009000 w then, right? I am just not sure about: "the Kernel will not necessarily reboot when a kernel panic occurs since interrupts might still be handled. In order to reboot on kernel panic, use the command line option panic= or the sysctrl.conf option "kernel.panic = “.” High performance, low power Embedded Computing Systems | Toradex Developer Center

dominik.tx · May 10, 2017, 4:41pm

If the kernel is mostly intact after panic (for example after a simple null pointer deference) it will still service hw watchdog. If no panic parameter is passed in kernel cmd line, default behaviour is panic=0 (meaning wait forever).

(a) and (b) lead to different system states, but in both watchdog function is constant, it reboots a system if it’s not ‘kicked’ within set period. In (a) there is a likely possibility that system will carry on servicing it and it will not reboot the system, that’s why specifying panic_timeout is recommended.

Florian_K · May 11, 2017, 7:42am

@dominik.tx What means “Watchdog is enabled by default during kernel boot?” Do I have to enable the watchdog in u-boot (I use a customized u-boot from the apalis tk1 linux bsp image v2.7b1)? Or is it enabled per default in u-boot of the apalis tk1 linux bsp image v2.71b?

Florian_K · May 12, 2017, 6:34am

@dominik.tx ok, then I should just add panic=<seconds> to the u-boot boot command bootcmd, right?

dominik.tx · May 12, 2017, 12:58pm

There is no watchdog support in u-boot for TK1.