Weston crashes on IMX8mp verdin + Dahlia with HDMI quick plugin/plugout sequence

Hi,

We are running into some issues with HDMI output on out product around the Verdin IMX8mp. When we quickly/repeatedly connect/disconnect the HDMI cable, weston or our application crashes

I was able to reproduce the issue on a Toradex Verdin + Dahlia setup, using image BSP 6 Downstream Linux Reference Multimedia, version 6.6.0+build.12 from the Toradex download site.

This image supports HDMI output and runs the QT5 cinematicexperience demo (using service wayland-app-launch).

The issue is that when the HDMI cable is quickly and repeatedly plugged and unplugged, the demo application crashes with a segfault. This seems somewhere caused between weston/wayland/the application.

Way to reproduce:

  • Install the BSP 6 image on the Verdin/Dahlia

  • Boot the board and log in

  • Adapt /lib/systemd/system/wayland-app-launch.service to not restart to make the failure more visible, I commented the following lines:
    #Restart=on-failure
    #RestartSec=1

  • Reboot (or daemon-reload and restart wayland-app-launch to let the change take effect)

  • Follow the journal of the wayland-app-launch service:
    journalctl -fu wayland-app-launch

  • plug/unplug the HDMI cable quickly / frequently (I tested it both with actual unplugging and plugging, but also with an HDMI switch that allows connecting/disconnecting the HDMI with a button).

  • After some plugin/plugout events, the application crashes and the journal shows the segfault.

root@verdin-imx8mp-06848993:~# journalctl -fu wayland-app-launch
Apr 28 17:56:44 verdin-imx8mp-06848993 systemd[1]: Started Start a wayland application.
Apr 28 17:56:55 verdin-imx8mp-06848993 systemd[1]: wayland-app-launch.service: Main process exited, code=killed, status=11/SEGV
Apr 28 17:56:55 verdin-imx8mp-06848993 systemd[1]: wayland-app-launch.service: Failed with result ‘signal’.

  • In some cases when I reproduce this, weston keeps running while the application crashes (e.g. The weston compositor is still shown on the HDMI display). In some other cases, I also see that weston crashes with a segfault.

As reference, the output of tdx-info:
Software summary

Bootloader: U-Boot
Kernel version: 5.15.148-6.6.0-6.6.0+git.23a8e831749d #1 SMP PREEMPT Thu Feb 29 20:25:21 UTC 2024
Kernel command line: root=PARTUUID=0da7f2ce-02 ro rootwait console=tty1 console=ttymxc2,115200 consoleblank=0 earlycon
Distro name: NAME=“TDX Wayland with XWayland”
Distro version: VERSION_ID=6.6.0-build.12
Distro variant: -
Hostname: verdin-imx8mp-06848993

Hardware info

HW model: Toradex Verdin iMX8M Plus WB on Verdin Development Board
Toradex version: 0058 V1.0B
Serial number: 06848993
Processor arch: aarch64

Is this issue known to you? I already took a look around the imx version of the Weston compositor. It seems that NXP has patched alot in it compared to mainline Weston (e.g. to get g2d support in)

Hi, I’m the Toradex AI Assistant. I’m in early beta, and I would appreciate your feedback.
The Toradex Support team will follow up with you shortly!

Did you like my answer?

:+1: :-1:


Hello,

The issue you’re encountering with HDMI connectivity on the Verdin iMX8M Plus module, particularly with the rapid connect/disconnect sequence causing crashes in Weston or the application, is not uncommon in scenarios involving dynamic video output changes. This behavior can stem from a variety of factors, including how the underlying graphics stack (Weston/Wayland) handles hot-plugging events and the resilience of the application to display server restarts or interruptions.

Given the detailed context you’ve provided, including the use of BSP 6 and the specific failure mode, there are a few steps and considerations that might help in diagnosing and potentially mitigating the issue:

  1. Weston and Wayland Debugging: Increase the verbosity of Weston and Wayland logging to capture more detailed information about what happens internally during the plug/unplug events. This can sometimes highlight specific operations or sequences that lead to the crash.

  2. Graphics Stack and Driver Stability: Ensure that the graphics drivers and the Weston compositor are up to date with the latest patches from NXP, as you’ve noted there are significant modifications to support specific hardware features like g2d. Sometimes, the issue might have been addressed in a more recent patch or version.

  3. Workaround Implementation: If the issue is isolated to rapid plug/unplug scenarios, implementing a debounce mechanism in the application or modifying the system’s response to HDMI connect/disconnect events could provide a more stable experience. This might involve ignoring rapid changes within a certain timeframe.

  4. Application Resilience: Enhancing the application’s resilience to display server restarts or crashes can also be beneficial. This could involve catching signals for unexpected terminations and gracefully handling reinitialization of the graphical context.

  5. Consult NXP Resources and Community: Given the modifications to Weston by NXP, exploring NXP forums or resources might reveal similar issues encountered by others and potential patches or workarounds specific to the iMX8M Plus.

  6. Kernel and Driver Logs: Monitor kernel and driver logs (dmesg) for any errors or warnings that coincide with the HDMI plug/unplug events. This can sometimes reveal lower-level issues with the HDMI driver or hardware interface that manifest as application or compositor crashes.

  7. Hardware Test: Although less likely, it’s worth considering if there’s any hardware sensitivity with the HDMI interface on the Verdin Development Board, especially under rapid connect/disconnect scenarios. Testing with another board or HDMI interface could help rule out hardware-specific issues.

Addressing such issues often requires

Please, also check if these links can help you solve your problem:

2D/3D Graphics Acceleration (GPU) | Toradex Developer Center.

Hello @mathijs,

Thanks for the description of the problem.

Just to confirm, are you using the native HDMI output or the DSI to HDMI adapter?

Does your application require support for frequent and repeated HDMI hotplug?
If so, could you explain your use case?

I will try to reproduce the problem here as well.

Best Regards,
Bruno

Hi,

I am using the native HDMI output on the Dahlia board.

It is not that we need this as a feature, the typical use case is that the HDMI will be plugged in and then the display should work, and after some time the HDMI will be disconnected again by the user.

However, it is a fault condition that could happen in the field due to items outside of our control: e.g. a bad HDMI cable, misbehaving display or other HDMI peripheral such as a mux, could for example cause the HPD signal to become twitchy and trigger these events in a short amount of time, making Weston or the application crash. It is a needed feature that the application does not crash or restarts.

Kind regards,

Mathijs

Hello @mathijs,

Thanks for your explanation.
I understand your concern and this is a valid problem.

Trying to reproduce the issue, I saw two failure scenarios for the demo application on the Reference Multimedia Image 6.6.0:

Random failure after unplugging and plugging the native HDMI interface:

Jun 04 16:11:58 verdin-imx8mp-06849027 systemd[1]: wayland-app-launch.service: Main process exited, code=exited, status=1/FAILURE
Jun 04 16:11:58 verdin-imx8mp-06849027 systemd[1]: wayland-app-launch.service: Failed with result 'exit-code'.

The above failure can happen randomly, there is no clear pattern to it.

Failure caused by memory leak in demo application:

Jun 04 16:20:56 verdin-imx8mp-06849027 systemd[1]: wayland-app-launch.service: Main process exited, code=killed, status=11/SEGV
Jun 04 16:20:56 verdin-imx8mp-06849027 systemd[1]: wayland-app-launch.service: Failed with result 'signal'.

This error happens after around 15 reconnections of the HDMI cable, with some variation.
Your problem seems to be the same as this one.

I call the failure above a memory leak because you can see the memory usage of the application increase after each reconnection:

The first problem is likely linked to the second one, as the memory management of the application is clearly done improperly in some way.
To be sure, testing another graphical framework would be advisable.


This issue could be the consequence of either a problem with the application itself or the Qt platform plugin being used, as Weston itself never crashed or had significant changes to its memory usage during my testing.
Therefore, this problem does not seem to be an OS-level issue.

What framework do you intend to use with your application?
If you plan to use Qt, which version of Qt?

Best Regards,
Bruno

Hi,

Thank you for your analysis. It indeed seems that with the example application there is some issue. However, I did see Weston crash on certain occasions. I will look in whether/how I can reproduce that.

We have a QT 5.15 application that we are working on, which provides the UI for our device.

About the other graphical environments. Currently / in the past year we have 4 solution directions found and tried:

  1. Weston.imx compositor (advised by NXP)
  2. WLroots compositor (both a custom one and open source ones like tinywl and sway)
  3. QT application directly with EGLFS
  4. QTWayland compositor (Qt Wayland Compositor 5.15.17)

With option 2. we can get fairly stable behavior with respect to HDMI hotplugging. However, Wayland-EGL does not seem to work / make wlroot crash, and we see some weird artifacts. This seems in line with what NXP reports: they don’t support wlroots and advice to use their patched Weston instead (https://community.nxp.com/t5/i-MX-Processors/GLFW-Vulkan-applications-crash-on-imx8mp-with-wlroots-based-WM/td-p/1572600, not 100% the same issue, but NXP reports it does not support wlroots)

With option 3 and 4 we get robust performance. However, both do not support hotplugging (we contacted QT about it, and they referred to the open bug: Loading...).

Do you have other graphical environments that we can try?

The intend of the product we are designing is to have an LVDS display and a secondary HDMI output. So being able to hotplug HDMI is a must.

If the issue is indeed in weston.imx, I hope your contacts at NXP can look into it.

Hello @mathijs,

Thanks for the explanation.
As you are using the exact Qt version that is having the issue, I think there will be limited utility in testing other graphical frameworks.

The Weston compositor for i.MX is what is used by default on the Reference Multimedia Image, so we also recommend its usage.


I did some further testing and this seems to be somewhat related to the Qt platform plugin for Wayland, when interacting with Weston.
Other Qt applications on the Toradex Reference Multimedia Image 6.6.0 also have this memory leak.
A much smaller memory leak also seems to be present in the same scenario on BSP 5.7.6.


When Weston crashed, did you have any other displays connected?
If yes, I can try to reproduce the problem with a display connected here.
Also, in general, have you seen an application or Weston crash when another display was connected?

I don’t believe that this is a problem inherent to the Weston for i.MX, because I did additional tests using the Torizon OS Containers and the problem was not present in Qt 5 or Qt 6.
Torizon OS containers also use the Weston for i.MX as a compositor, therefore this is a configuration that can work without problems.

Just to confirm, do you see this same problem with your application?

We will look into what may be the cause for the memory leak when reconnecting the HDMI output on the Reference Multimedia Image 6.6.0. When we have more updates on this, I will send them on this thread.

Best Regards,
Bruno

Hi,
Thank you for your reply.

I did some additional tests. I used 2 types of hardware setups:
setup 1: With only the native HDMI present. So no dsi to HDMI or dsi to LVDS board present, and also the overlay for these boards where removed from overlays.txt
setup 2: With the native HDMI present + the Toradax dsi to LVDS board + Toradex 10.1 inch capacitive touch LVDS display attached. (in this case, the needed devicetree overlay was added again to overlays.txt)

The issue is hard to reproduce, but seems most present when the display is updating. To exclude the QT issue, I disabled the cinematic demo by disabling its service and I tested with 2 other applications that are not QT based:

  • weston-presentation-shm
  • weston-terminal

The weston-presentation-shm was only usable when hardware setup 1, as it is spawned on the lvds screen. The weston-terminal is movable, so I moved the terminal so it is on the HDMI output. To get the terminal to actively update the screen, I let it execute this command to print a constant stream of numbers, and move it with the touchscreen to the HDMI output.
od -An -i /dev/urandom

It seems that by just unplugging/ plugging in the HDMI cable I can’t reproduce the Weston segfault. It seems I can’t physically unplug / plug in the connector fast enough (around 1 cycle per 2 seconds) to be able to trigger this segfault.

A way I can toggle it faster, is to use a HDMI multiplexer, that can switch the HDMI + HPD to the HDMI monitor between the Dahlia/Verdin and some other source on the press of a button. With this, I can simulate a plug/unplug sequence way faster (around 1-1.5 cycles per 1 second).

I also checked whether there is some other way to trigger it, as such a tool as a HDMI multiplexer may not be available at your side. I found, that the unplug/plug can be simulated by echoing “on” or “off” to the drm sys entry:
/sys/class/drm/card0-HDMI-A-1/status (when using setup 1)
/sys/class/drm/card1-HDMI-A-1/status (when using setup 2)

I created a small bash script that executes this some times in a sequence.

#/bin/bash
counter=1
 
while [ $counter -le 100 ]
do
    counter=$((counter + 1))
    echo off > /sys/class/drm/card1-HDMI-A-1/status
    sleep 0.5;
    echo on > /sys/class/drm/card1-HDMI-A-1/status
    sleep 0.5;
done

With this sequence or with the HDMI multiplexer, I can in some cases get Weston to crash, the journal then states:

root@verdin-imx8mp-06848993:~# journalctl -u weston
Apr 28 20:56:46 verdin-imx8mp-06848993 systemd[1]: Starting Weston, a Wayland compositor, as a system service...
Apr 28 20:56:47 verdin-imx8mp-06848993 systemd[1]: Started Weston, a Wayland compositor, as a system service.
Apr 28 21:04:03 verdin-imx8mp-06848993 systemd[1]: weston.service: Main process exited, code=killed, status=11/SEGV
Apr 28 21:04:03 verdin-imx8mp-06848993 systemd[1]: weston.service: Failed with result 'signal'.

Of course, as indicated, this is not normal behaviour. However, it could be that an attached HDMI devices can trigger this unintended behaviour.

On our own hardware (which uses native LVDS and native HDMI), I originally found this issue, both with our QT application running and with the cinematic demo running, but also with other non QT based applications such as the Weston example applications. To exclude any problems with our software environment or custom hardware as root cause, I then moved to the Dahlia board with Toradex reference image.

Hope this information helps. If I can provide further details/ logs, please let me know.

Kind regards,

Mathijs

Hello @mathijs,

Thanks for the detailed explanation.

I could only reproduce the issue with Weston in the following scenario:

  • Native HDMI output only
  • Qt Cinematic Demo Running
  • Run the script to repeatedly “reconnect” the HDMI

If I use the native LVDS together with the native HDMI, Weston does appear to stop working, but it does not exit.
The same can be observed when not using the Qt Cinematic Demo, with or without a native LVDS output.

The overall memory usage also increases slowly when running just Weston, so it may be an indication of what is wrong with Weston.

I have escalated this issue, if more information is needed we will ask again on this thread.

Best Regards,
Bruno

Hello @mathijs,

We have found some more ways to reproduce the problem but we have not found a workaround yet.
When there are further updates I will send them here.

Best Regards,
Bruno

Hello @mathijs,

This issue is now being investigated by our BSP team.
A fix would likely come in a subsequent patch release of BSP 6.7 as we are very close to the release of BSP 6.7.0.

When there are further updates I will send them here.

Best Regards,
Bruno Mello

Thanks for the update!

If the BSP team has a solution with which we can help test, please let us know!

Hello @mathijs,

I can inform you that the BSP team is actively working on this topic with high priority. We will update you as soon as we have some news. Thanks in advance for your patience!

Hi, is there any news from the BSP team?

Hello @mathijs,

We are still waiting for the BSP team to review and merge the solution. It should happen in a couple of days. Thanks for your patience.

Hi, any news on this issue? Thanks in advance!

Hey @mathijs,

Sorry for the delay! The team found some issues on the merge requests and they are reviewing them. This is why it is taking a little longer. We are trying to speed up the process. I will update you as soon as this is done.

Hi @rudhi.tx, Is there any update on the progress?
kind regards,
Mathijs

Hello @mathijs,

I am very sorry that this is taking longer than usual.
The current status is that we are still working with high priority on fixing a conflict we have in the merge request. We had a working solution but we discovered a regression while testing it. So far we weren’t able to figure out a different path that both solves the issue and doesn’t bring the regression back.