The XKEYBOARD keymap compiler (xkbcomp) reports: > Warning: Unsupported maximum keycode 569, clipping
*> X11 cannot support keycodes above 255 * Errors from xkbcomp are not fatal to the X server (EE) (EE) Backtrace: (EE) (EE) Segmentation fault at address 0x40: (EE) Fatal server error: (EE) Caught signal 11 (Segmentation fault). Server aborting The XKEYBOARD keymap compiler (xkbcomp) reports: > Warning: Unsupported maximum keycode 569, clipping
*> X11 cannot support keycodes above 255 * Errors from xkbcomp are not fatal to the X server weston: …/…/cairo-1.16.0/src/cairo-xcb-screen.c:219: _get_screen_index: Assertion `!“reached”’ failed. (EE) failed to read Wayland events: Broken pipe
The ticket #1035 you referenced eventually seems to be resolved with a couple of patches. See here for details. Can you try to apply those in your local Yocto build and see if it resolves the issue?
I can but if you look at the actual error for #1035 “Fatal server error: (EE) [destroyed object]: error 7: importing the supplied dmabufs failed” it’s nothing resembling the assertion I’m seeing
The exact reason why #407 references #1035 is vague and rather unclear. “This might be linked to [xorg/xserver#1035 (closed) which happens in the exact same environment, except without using `DRI_PRIME=1”
I can try the patches but it seems quite a stretch to me. I’m not even sure how cleanly the patches will apply.
Creating smaller test case, maybe, but it’s an issue of what is least effort (expediency).
I have a follow-on question plus one that is still unanswered from first post.
I suspect the issue will not occur with X.org. Does Toradex have any pre-canned bbappend files or instructions on how to disable Wayland from BSP5 and go back to X.org. Obviously I can figure it out myself.
Still would like to know best method of enabling debug logging for XWayland. Per the bugs referenced in first post it is supposedly achieved by starting XWayland with WAYLAND_DEBUG but I’ve not been able to successfully pass this thru from Weston.
I recommend having a look at this community post here on how to disable or remove wayland. There’s no out-of-the-box solution for including X.org in BSP5, as Toradex has switched to Wayland per default.
I’ll try running weston free of the weston@.service when I have a chance.
We are also trying to come up with a smaller test case that we can pass along that demonstrates the assertion failure bug, and it is clearly a bug.
I also see the following related issue; output of /var/log/weston.log. Lines starting with “#” are my inline comments.
Date: 2022-05-03 UTC
[04:18:34.591] weston 9.0.0
https://wayland.freedesktop.org
Bug reports to: https://gitlab.freedesktop.org/wayland/weston/issues/
Build: 9.0.0-35-g230e9bc3+
...
...
[04:18:34.976] xserver listening on display :clock1:
# the following is the on-demand spawn from our app starting and making an X request
[04:18:35.382] Spawned Xwayland server, pid 646
[04:18:35.654] xfixes version: 5.0
[04:18:35.755] created wm, root 71
# this is where the Xwayland crashes due to it's assertion failure
[04:24:53.561] xserver exited, code 6
# our app exits with failure because the connection to X died and is restarted by a separate systemd service, the following is the respawn from this second run
[04:24:54.871] Spawned Xwayland server, pid 1098
[04:24:55.045] xfixes version: 5.0
[04:24:55.110] created wm, root 71
# this is what I am confused about, at this point weston unexpectedly stops and is restarted by the existing /usr/bin/weston-launch; why?
Date: 2022-05-03 UTC
[04:24:56.625] weston 9.0.0
https://wayland.freedesktop.org
Bug reports to: https://gitlab.freedesktop.org/wayland/weston/issues/
Build: 9.0.0-35-g230e9bc3+
So what happens is the following timeline:
weston@.service starts /usr/bin/weston-start which runs /usr/bin/weston-launch which runs /usr/bin/weston
our systemd service (similar to your wayland-app-launch.service) has a requires on weston@root.service
Your layers/meta-toradex-demos/recipes-graphics/wayland-app-launch/wayland-app-launch.sh does the following which we copied:
# wait for weston
while [ ! -e $XDG_RUNTIME_DIR/wayland-0 ] ; do sleep 0.1; done
sleep 1
Once wayland-0 exists then our app starts
We quit our app but Xwayland crashes due to the assertion error. Since our app quit it is restarted by our systemd service. wayland-0 exists, so our app runs. It makes an X call which results in the second “[04:24:54.871] Spawned Xwayland server, pid 1098” line in weston.log.
weston then, for reasons unknown, stops and is restarted by the existing /usr/bin/weston-launch but this causes the second run of our app to also fail.
6 our systemd service restarts the app again but now the timing is somehow off (100% reproducible), When our app restarts again the wayland-0 socket exists from the previous run, but /usr/bin/weston-launch hasn’t yet started /usr/bin/weston. So our app is already running and trying to make X calls before “[04:24:56.625] weston 9.0.0” occurs in the weston log.
Checking for the existence of ‘$XDG_RUNTIME_DIR/wayland-0’ doesn’t seem sufficiently robust. Augmenting it with a sleep loop checking for $(pidof /usr/bin/weston) “fixes” the issue but there is still the issue of weston respawning Xwayland and then quitting which causes the app to have to start again.
I’m back from vacation now but not sure I can add much here. It sounds like fixing the initial XWayland crash may resolve all your issues. Is that right? It may require working with the upstream Wayland community. Any luck on a smaller test case? It would definitely be interesting to see if this is specific to the Toradex BSP or if it can happen on other platforms.
I am Tony’s colleague, and can fill in a little bit here…
Replacing the check of ‘$XDG_RUNTIME_DIR/wayland-0’ with a check of $(pidof /usr/bin/weston) reduces the severity of the bug. Instead of requiring a user to power cycle the device, it now just briefly displays some ugly diagnostic text before restarting the app.
This problem has only expressed itself in our full app on a modified BSP5 image. A minimal app does not express the problem, nor does it show on an umodified Toradex BSP5 image or on an Ubuntu PC.
The relevant modifications to the BSP5 image are the video output; standard is 1920x1080 HDMI output, modified is 1024x768 LVDS output. The other modifications are to configure the lcd backlight driver and add the Wifi driver to the kernel, which seem irrelevant.
At this point, the most helpful thing would probably be a method for getting better diagnostic data from the wayland/weston server.
Starting the App service produces an odd warning regarding the keyboard interface, although we aren’t using a keyboard on this hardware.
[10:55:51.874] xfixes version: 5.0
[10:55:51.940] created wm, root 71
The XKEYBOARD keymap compiler (xkbcomp) reports:
Warning: Unsupported maximum keycode 569, clipping.
X11 cannot support keycodes above 255.
Errors from xkbcomp are not fatal to the X server
Exiting the app produces the following:
(EE)
(EE) Backtrace:
(EE)
(EE) Segmentation fault at address 0x40
(EE)
Fatal server error:
(EE) Caught signal 11 (Segmentation fault). Server aborting
(EE)
[10:56:31.697] xserver exited, code 6
When running weston from its service, the output adds the following after the app exits, after the signal 11 error:
Is this more informative that the weston.log output? I looked briefly at the cairo-xcb-screen code and it seemed to say that it was looking for the index of a screen that doesn’t exist.
That does seem to be consistent with the bug report @tonyjones mentioned above unfortunately there is no fix in the community. The theory of a missing screen is also mentioned here. Can you debug further to find out what the screen value is and post in that ticket? It evidently is a fairly rare case so maybe if you can help the wayland/weston developers that will get a proper fix.
I made a brief attempt at debugging, but I don’t know how to find the screen value.
With the more verbose debugging turned on, there are no extra messages that occur between the app’s rendering loop and the crash.
We have not tested Xorg with BSP 5 but it should be doable. Yocto Dunfell still supports it so you should be able to switch but I don’t have a simple recipe for how to do that.
The failure sequence isn’t quite what we originally thought. It seems that this is what was happening:
The app shuts down.
The Xwayland server segfaults sometime between the app’s last two SDL cleanup calls: SDL_Window_Destroy and SDL_Quit
The Weston process keeps running, but without a screen.
The app service restarts the app, which tries to open a window using the Xwayland server.
Weston generates the assertion failure and exits because it has no screen to provide to the app.
The app exits because it is unable to open a display.
GOTO 4
The bandaid solution is to improve the monitoring of weston/Xwayland, and restart them as needed. The other option of intentionally not calling the SDL cleanup functions seems risky.