TK1 kernel freezes after varying time

Hi,

I work on a Apalis TK1 based device running with the BSP 2.8.7 (not Mainline) linux images. I have some problems with unpredictable crashes/freezes of the complete linux system (kernel).

# cat /etc/issues
The Angstrom Distribution \n \l
Angstrom v2017.12 - Kernel 
Apalis-TK1_LXDE-Image 2.8b7. 20200610

# uname -a
Linux apalis-tk1 3.10.40-2.8.7+g063d16eceb57 #1 SMP PREEMPT Wed Jun 10 16:02:37 UTC 2020 armv7l GNU/Linux

Most of the freezes appears when a user starts to interacts with the system after a longer idle period.
The journal log shows the following entries right before the crash:

[upload|B2mzLKCchxPRuGxRYqDaC7holcA=]

Due to this error message I tried to disable the dyntick-idle mode by adding the nohz=off parameter to the kernel options.

After that the freezes disappeared for some time and I have not seen NOHZ: local_softirq_pending messages anymore.

Nevertheless today I had some new kernel crashes. This time the journal shows a different message before the system freezes:

[upload|q71iLDN9lyYPIILT3sVUmwMLDSk=]

Both errors seem to have a reference to kernel functionality to control the cpu frequency or to set the cpu in an idle mode.
Is this error caused by a known kernel bug or a missing cpu support for that feature or might any additional kernel settings help ?

Many thanks for any ideas or help to solve this issue !

Hi @soff and Welcome to the Toradex Community!

Could you provide the version of the hardware of your module (including carrier board)?

What is your application?

Most of the freezes appears when a user starts to interacts with the system after a longer idle period. The journal log shows the following entries right before the crash:

Are you putting the SoM to sleep mode?
How often does it happen?
Have you seen this error on one module only or also on other modules?

Both errors seem to have a reference to kernel functionality to control the cpu frequency or to set the cpu in an idle mode. Is this error caused by a known kernel bug or a missing cpu support for that feature or might any additional kernel settings help ?

We are not aware of such errors. Did you change the kernel config or the devicetre? If yes, could you share these changes?

Thanks and best regards,
Jaski

Hi @Jaski,

Could you provide the version of the hardware of your module (including carrier board)?

The module is labeled with “Apalis TK1 2GB V1.2A”.

The carrier board is a custom board, not one of the developer boards offered by toradex. I only doing the software part of the device, so I could not provide any exact details about the board design.

What is your application?

My application is a nodejs app running in an electron framework environment. So basically the app is a local html/javascript website running in an embedded chromium browser. The application is packaged as an executable AppImage and runs under a user account without any specially additional rights (default rights, no root rights).

Are you putting the SoM to sleep mode?

The application does not activate any sleep or power save modes, but maybe the operating system does because of missing activity.

How often does it happen?
Have you seen this error on one module only or also on other modules?

Currently the app runs on 5-6 identical devices and each board had this problems from time to time. Unfortunately the freezes appears in varying intervals. Sometimes the system freezes multiple times per day (mostly after around 10 to 15 minutes without any interactions) and on other days there is absolutely no freeze. Moreover the different devices behave differently. Some crashes more often then others, but this changes over time.

Maybe it is important to highlight, that the complete system freezes and not only the running app. All processes like ssh connections, x11, … are freezed. So I think this problem should not be a problem of the application, because I think in normmal scenarios it should not be possible to crash the whole system with an app running with restricted user rights.

Did you change the kernel config or the devicetre?

No. Initially the device has just been setup with the toradex easy installer by installing the BSP2.8.7 image.
Currently the only change in the kernel configuration is an additional nohz=off parameter. But this was just a try to fix the problem and since this additional parameter the last messages in the journal before a freeze changed.

Many thanks for any help.

Greetings,
Sven

Hi Sven

The carrier board is a custom board, not one of the developer boards offered by toradex. I only doing the software part of the device, so I could not provide any exact details about the board design.

Could you test this using one of the Toradex Boards as Ixora 1.2?

Regarding the crash, could you try to get the log before the crash happens and provide the output of ‘journalctl -b -1’ after crash?

Thanks and best regards,
Jaski