Periodic reboots of my Toradex - How to debug?

masterfoo · May 9, 2025, 8:44am

Hello,

I have Colibri IMX8DX and I use the linux image torizon-core-docker-colibri-imx8x-Tezi_6.6.1+build.14.tar with torizoncore-builder .

My system reboots after few hours:

torizon@colibri-imx8x-14726876:~$ journalctl --list-boots
 -8 b367d3bd0e6b40b3a511cff30375a589 Mon 2025-05-05 10:21:03 UTC—Mon 2025-05-05 13:00:27 UTC
 -7 1a7efccf862f454bbbb6b07f78bb8bb0 Mon 2025-05-05 13:01:16 UTC—Tue 2025-05-06 02:02:48 UTC
 -6 dcdb36422f364226bf04a1ac9394e967 Tue 2025-05-06 02:03:16 UTC—Tue 2025-05-06 15:00:51 UTC
 -5 4785fab2dc734fe4b53ad6b83dff2e96 Tue 2025-05-06 15:01:18 UTC—Wed 2025-05-07 03:52:33 UTC
 -4 4646fe8975484f769d71813f98a46bdc Wed 2025-05-07 03:52:59 UTC—Wed 2025-05-07 16:46:39 UTC
 -3 fd3a114ec0294543a7418a262826ea70 Wed 2025-05-07 16:47:25 UTC—Thu 2025-05-08 05:38:14 UTC
 -2 ab95a844d76247cd869bb9c63223f038 Thu 2025-05-08 05:38:33 UTC—Thu 2025-05-08 18:38:01 UTC
 -1 5c65443adb5a4080a4d58dfe186fdd77 Thu 2025-05-08 18:38:45 UTC—Thu 2025-05-08 23:33:14 UTC
  0 a6f1cf13274649abaeabaebd59be8d73 Thu 2025-05-08 23:33:28 UTC—Fri 2025-05-09 07:10:48 UTC

Thank you,
Greg

ToradexAI · May 9, 2025, 8:44am

Hi, I’m the Toradex AI Assistant.

I might make mistakes. Our support team will step in if needed. If I answered your question, please tick the “Solution” checkbox. For follow-up questions to me, please use @ToradexAI in your post.

Did you like my answer?

Hello Greg,

Thank you for your detailed question. Random or unexplained reboots can be challenging to diagnose. Here’s a recommended approach to make further progress:

1. Enable Core Dumps

Capturing core dumps can help identify if a specific process is crashing and triggering the reboot.

Execute these commands to enable core dumps:

ulimit -c unlimited
echo 1 > /proc/sys/kernel/core_uses_pid
echo "/tmp/core-%e-%s-%u-%g-%p-%t" > /proc/sys/kernel/core_pattern

To enable dumps for setuid processes:

echo 2 > /proc/sys/fs/suid_dumpable

Core dumps will appear in /tmp with filenames that include the process details. More info:

Install gdb to analyze dumps:

opkg install gdb

2. Check for Hardware Watchdog, Integrity Checks, and Uptime

Torizon includes integrations for hardware watchdogs and other reliability features. Ensure your device isn’t rebooting due to a watchdog timeout or integrity check failure. Review logs around each reboot for messages about “watchdog” or “integrity”.

More on Torizon’s reliability features:
https://www.toradex.com/torizon

3. Inspect Peripheral and Power Supply

The most common causes for unexplained resets on embedded systems (besides software faults) include:

Power supply interruptions or instability
Overheating (ensure there are no silent thermal events)
Hardware watchdogs getting triggered due to application hangs
Kernel panics or OOM (Out-Of-Memory) events

Although you’ve already searched for most keywords, check for subtle clues in dmesg and journalctl (messages around the last timestamps before reboot).

4. Instrument and Monitor your Application and System

Use process supervision (systemd) to automatically restart crashed apps and log reasons.
Add custom logs at application start/stop for more visibility.
Schedule periodic status or resource reports (CPU/mem usage).

5. Enable Persistent Journals

Ensure you’re not losing logs at reboot:

sudo mkdir -p /var/log/journal
sudo systemctl restart systemd-journald

Then future logs will persist across reboots.

By enabling core dumps and reviewing the logs immediately before each reboot, you should get closer to identifying whether it’s a software, kernel, or external issue. If you collect a core dump or a suspicious kernel/syslog message, you can analyze with gdb and further narrow down the cause.

If you need more details on these steps, the following resources may be helpful:

Best regards,
Toradex AI Assistant.

williamclark · May 9, 2025, 12:03pm

It looks like your Colibri IMX8DX is having unexpected reboots without clear logs, so that can be tricky to say concretely what the issue might be. Since you’ve already searched for hints with terms like “thermal”, “watchdog”, or “fatal” with no luck, it may be worth checking into hardware related stability issues or low level power management. First, I suggest looking for hardware related errors or warnings which may not match common log types in rarer log types (e.g., dmesg) which may contain clues (e.g. power? thermal?) uncaptured by journalctl . A different approach is simply to increase log level or enable more verbose kernel log types. Or you could try removing any power saving features (e.g. dynamic frequency scaling with your CPU, deep sleeps, etc.) as these could trigger instability. If you are still having unexpected reboots, I suggest making sure power supply stability is stable as this is a common cause of unexpected reboots in an embedded design. And finally check if the firmware is up to date as some of these issues would have been resolved in later releases.

jeremias.tx · May 9, 2025, 8:23pm

Greetings @williamclark,

Without logs or a hint of some kind it is quite difficult to think of a cause.

Do you only have the 1 Colibri i.MX8DX? If you have multiple are they all rebooting randomly? Or just a specific one?

Software wise have you tried running something other than version 6.6.1? Does the OS that is running seem to make a difference regarding the periodic reboots?

It would help to try and narrow down whether the issue seems to be more software or hardware related if possible.

Best Regards,
Jeremias

masterfoo · May 13, 2025, 8:58am

Thank you for your reply.
The problem is visible on several boards, it’s clearly a SW issue.
I have 3 docker containers running. Suddenly the load on the system increases, and after few minutes the system hangs and eventually reboots. I have no idea why it’s stable for hours and then the load increases. CPU usage is not increasing, it’s always around 50%, so it’s more linked to processes waiting on IOs. This is confirmed by the the number of processes in state “D” which is increasing at the same pace of system load.

jeremias.tx · May 13, 2025, 9:18pm

I have 3 docker containers running. Suddenly the load on the system increases, and after few minutes the system hangs and eventually reboots. I have no idea why it’s stable for hours and then the load increases.

Okay then, so the reboots are due to these containers that suddenly consume too much system resources, correct?

That is to say if these containers are not running then the system is fine, correct?

It’s hard to say what is happening in your containers without further knowledge. Though since your system is rebooting I imagine it’s hard for you to examine these containers afterwards.

Perhaps try this. There are settings in Docker to limit the amount of resources containers can use: Resource constraints | Docker Docs

Maybe try constraining the resources so the containers don’t consume all the system resources.
This could prevent a system reboot and allow you to examine the containers when they start increasing in load. I’m not sure what will happen to the containers when they use all the resources allotted to them. Perhaps they’ll stop or crash, but hopefully it won’t affect the whole system since they have been constrained.

Best Regards,
Jeremias