Hi
We are using Colibri imx8x with custom images based on TorizonCore 5.1.0+build.1, 5.3.0+build.7 and 5.5.0+build.11 and docker containers. We build 2 products based on this plattform:
1x Controller with display
1x Controller without display
Both products have the same software with a different configuration. Our Software communicates with our Backend and receives commands or send states to it. For the controller with display, we implemented a simple Browser without own controls (hardened kiosk mode) that shows our UI as webpage, which is offered by our software.
Now we have sometimes the problem, that the whole device freezes: First we discovered it on the device with display because no interaction was possible anymore and we thought that we maybe have a problem with the display. But then we have the same behaviour on devices without a display.
Behaviour is:
No UI interaction possible (no reaction on touch screen)
When the device is in the hung state and you are connected over local ssh, you should be able to use the systemd journaling commands to view the logs of various services. Examples:
@alex.tx’s suggestion of using a serial console is advisable as well in case there is a kernel oops or something on the console. Of course that means that you need to be connected to the console before the issue happens.
Additionally the output of sudo dmesg might be instructive.
The device is still in freeze state. I attached the output of command “sudo dmesg” dmesg_output_10.5.2023.txt (40.3 KB)
On this device we have the following software setup:
1 Container with business logic software that provides the UI as simple HTML (available on http://localhost:9090)
1 Container based on Weston implemented a simple browser in
Kioskmode that shows the UI provided by the first Container
We connected a Mouse and basically the website is usable and seems to work. With the touchscreen no navigation is possible. We have no error or warning in our logs and it seems that there is a problem with the touch input device.
What else can we do? Where can we find system logfiles (maybe some driver logfiles)
I don’t know how to reproduce. It appears on devices that are in productive usage (customer installation) from time to time. On our test installation in the office we had to wait 2 weeks till the problem appeared again.
We can give it a try and create a new image based on the latest LTS version but we need time for that
It takes some time after a device boot, it’s hard to tell the conditions
Yes usually the devices are working as expected. We delivered our first device 2 years ago and till now we delivered about 35 devices to our customers.
Find the output of the commands attached
One more thing. Today we found out, that we have 2 issues and not 1. Issue 1 with the controller with display is a different issue than the one we have with the controller without display.
Issue with display: Our application is working and we have a connection (to our backend and to toradex OAT), we can start a ssh-connection. It seems that this is an issue with the display / display driver.
Issue 2 without controller: There is no connection (to our backend or to toradex OAT system), we don’t know the state and the device has to be restarted. This happens to all our devices with an image based on torizoncore 5.5, here we wan’t to downgrade to 5.3 where we have the experience that this works.
I think it’s better I open a new thread for issue 2. dockerps.txt (393 Bytes) docker-stats (586 Bytes) free-h.txt (207 Bytes) tdx-info.txt (27.6 KB)
Since you found out that the two issues are different, could you please create another thread here in Community? So we can focus on one issue here and on the other issue on the new thread?
we are aware of this, and this has been discussed with technical sales of Toradex.
However, after a reboot of the system, the touchinput runs again as desired. Which supports our assumption that it is rather a problem on the software level (outside docker).
Therefore it would be helpful if there is a way to check if something is crashed on the operating system level, e.g. the touch driver.
So, as you said, let’s make this thread about the controller with the display.
And you can open another thread about the controller without display, ok?
Could you please share with us a minimal way of reproducing the issue you are facing?
Hi @rudhi.tx
Last week we had a talk with Michael and Drew and we agreed on this approach:
we try to setup a new image based on the latest 5.7.2 LTS Torizoncore image (and may be on the latest 6)
then we will test the image for about 1-2 weeks and maybe the problem is gone or not. However we will come back with results then.
after we know how to reproduce it, we will inform you immediately. atm we are not able to reproduce the issue (or we have always to wait for about 2 weeks till it occurs)