Enable HW Watchdog for Apalis TK1 at Boot

How can I enable the HW watchdog for the Apalis TK1 for SoM version V1.1A?

Some precondition: I can not use systemd

To use the hw watchdog I would assume something to be done like follows (with regards to instructions for the colibri v61):

  • enable the hw watchdog in u-boot by executing in the u-boot prompt something like mw.w 0x4003e000 xx34 1) (detects: crash of kernel during boot)
  • add the driver of the watchdog to the device-tree
  • configure the Linux driver that it does not reset the watchdog settings to “watchdog disabled” at begin of startup
  • configure the kernel to softlookup panic with one of (a) a sysctl, (b) “kernel.softlockup_panic”, (c) a kernel parameter, (d) “softlockup_panic” (see “Documentation/admin-guide/kernel-parameters.rst” for details), and (e) a compile option, “BOOTPARAM_SOFTLOCKUP_PANIC” (detects: softlockup = bug that causes the kernel to loop in kernel mode for more than 20 seconds without giving other tasks a chance to run → timer/interrrupt can still service watchdog)
  • configure the kernel to hardlockup panic with one of (a) a sysctl, (b) ‘hardlockup_panic’, (c) a compile time knob, (d) “BOOTPARAM_HARDLOCKUP_PANIC”, (e) and a kernel parameter “nmi_watchdog” (see “Documentation/admin-guide/kernel-parameters.rst” for details). (detects: ‘hardlockup’ = bug that causes the CPU to loop in kernel mode for more than 10 seconds without letting other interrupts have a chance to run. → timer/interrupt can not service watchdog)
  • boot kernel with boot option something like tegra_wdt.heartbeat=$seconds that the kernel services the watchdog with a kernel level timer
  • service watchdog during init with init process
  • user space application which services watchdog from user space regularly (detects: crash of kernel, crash of this user space application, user space application process not executed regularly)

BTW: Is the Watchdog (Linux) article also valid for the TK1?

Watchdog is enabled by default during kernel boot.
Yes, most of tegra section in the watchdog article applies to TK1.

For the TK1 do I have to make sure that the watchdog is not disabled during startup like with the colibri v61?

By default we build our TK1 kernels with option CONFIG_TEGRA_WATCHDOG_ENABLE_ON_PROBE enabled, which leaved watchdog re-enabled after probe is done with it’s configuration. if you’re using custom kernel just make sure that this config option is enabled.

Ok

(Note: the option CONFIG_TEGRA_WATCHDOG_ENABLE_ON_PROBE was renamed to TEGRA_WATCHDOG_ENABLE_HEARTBEAT)

Then the following should be a TK1 valid way to implement a watchdog for kernel panic (but is it softlockup or hardlockup?):

"When using the kernel level heartbeat, the Kernel will not necessarily reboot when a kernel panic occurs since interrupts might still be handled. In order to reboot on kernel panic, use the command line option panic= or the sysctrl.conf option "kernel.panic = “.”

@dominik.tx how can I test that easiest?

To test kernel panic you’ll need to recompile the kernel with CONFIG_MAGIC_SYSRQ=y
then all you need to do is
echo c > /proc/sysrq-trigger
to test hw hang you can try accessing inaccessible address with devmem2, e.g.:
devmem2 0x70009000 w
(NOR peripheral that is by default in reset and unclocked)

@dominik.tx Without recompiling the kernel with CONFIG_MAGIC_SYSRQ=y and without triggering sysrq the HW hang triggered with sudo devmem2 0x70009000 w should be detected by the watchdog, right? What is the TK1 default kernel behaviour then?

At that point chip is locked and no kernel code will be executed. Watchdog will reset the SoC after a timeout.

@dominik.tx I try just to get around the recompilation of the kernel with CONFIG_MAGIC_SYSRQ=y somehow. This would be not required if the behaviour of the watchdog would be the same for (a) sudo devmem2 0x70009000 w and for (b) echo c > /proc/sysrq-trigger (system crash by a NULL pointer dereference https://web.archive.org/web/20160816230132/https://www.kernel.org/doc/Documentation/sysrq.txt). I assume that (a) and (b) leads to the same behavior/result and I should be fine with just testing with sudo devmem2 0x70009000 w then, right? I am just not sure about: "the Kernel will not necessarily reboot when a kernel panic occurs since interrupts might still be handled. In order to reboot on kernel panic, use the command line option panic= or the sysctrl.conf option "kernel.panic = “.” http://developer.toradex.com/knowledge-base/watchdog-(linux)#NVIDIA_Tegra_based_Modules

If the kernel is mostly intact after panic (for example after a simple null pointer deference) it will still service hw watchdog. If no panic parameter is passed in kernel cmd line, default behaviour is panic=0 (meaning wait forever).

(a) and (b) lead to different system states, but in both watchdog function is constant, it reboots a system if it’s not ‘kicked’ within set period. In (a) there is a likely possibility that system will carry on servicing it and it will not reboot the system, that’s why specifying panic_timeout is recommended.

@dominik.tx What means “Watchdog is enabled by default during kernel boot?” Do I have to enable the watchdog in u-boot (I use a customized u-boot from the apalis tk1 linux bsp image v2.7b1)? Or is it enabled per default in u-boot of the apalis tk1 linux bsp image v2.71b?

@dominik.tx ok, then I should just add panic=<seconds> to the u-boot boot command bootcmd, right?

There is no watchdog support in u-boot for TK1.