Assertion Failure in wayland

tonyjones · April 15, 2022, 8:06pm

I have an app, written in Go, that uses libsdl2 and the go-sdl2 wrapper (GitHub - veandco/go-sdl2: SDL2 binding for Go).

Everything was working fine in BSP2 using native Xorg. I got the code working with BSP5 using xwayland.

The app is run from a systemd service that starts a script. The script will restart the app if it exits.

If I kill the app (kill -9) the script restarts the app without issue.

However if I terminate the app normally (via its GUI “quit” function) instead of it being restarted I see the assertion failure below.

Weston crashes on launching dolphin-emu on a Nouveau GPU using PRIME (#407) · Issues · wayland / weston · GitLab is the same error → Xwayland crash on launching dolphin-emu with dmabuf import error (#1035) · Issues · xorg / xserver · GitLab which talks about enabling WAYLAND_DEBUG logging for /usr/bin/Xwayland but I’m not sure a) how relevant 1035 is for the issue I’m seeing b) how to enable it given it’s being started via weston-start (bash) → weston-launch (ELF) → weston (ELF) chain.

The XKEYBOARD keymap compiler (xkbcomp) reports:
> Warning: Unsupported maximum keycode 569, clipping
*> X11 cannot support keycodes above 255 *
Errors from xkbcomp are not fatal to the X server
(EE)
(EE) Backtrace:
(EE)
(EE) Segmentation fault at address 0x40:
(EE)
Fatal server error:
(EE) Caught signal 11 (Segmentation fault). Server aborting
The XKEYBOARD keymap compiler (xkbcomp) reports:
> Warning: Unsupported maximum keycode 569, clipping
*> X11 cannot support keycodes above 255 *
Errors from xkbcomp are not fatal to the X server
weston: …/…/cairo-1.16.0/src/cairo-xcb-screen.c:219: _get_screen_index: Assertion `!“reached”’ failed.
(EE) failed to read Wayland events: Broken pipe

Apalis IMX6
Ixora 1.1 carrier
BSP5 5.6.0 tag

drew.tx · April 20, 2022, 7:45pm

Hi @tonyjones

The ticket #1035 you referenced eventually seems to be resolved with a couple of patches. See here for details. Can you try to apply those in your local Yocto build and see if it resolves the issue?

Drew

tonyjones · April 20, 2022, 8:15pm

@drew.tx

I can but if you look at the actual error for #1035 “Fatal server error: (EE) [destroyed object]: error 7: importing the supplied dmabufs failed” it’s nothing resembling the assertion I’m seeing

The exact reason why #407 references #1035 is vague and rather unclear. “This might be linked to [xorg/xserver#1035 (closed) which happens in the exact same environment, except without using `DRI_PRIME=1”

I can try the patches but it seems quite a stretch to me. I’m not even sure how cleanly the patches will apply.

#407 never seems to have been resolved.

drew.tx · April 21, 2022, 2:31pm

Yeah, ok. Shot in the dark anyway.

Is it possible for you to share your app or ideally a smaller test case that exhibits the issue so I can reproduce it locally?

Drew

tonyjones · April 28, 2022, 7:44pm

Hi.

Sharing app. Probably no. I will check.

Creating smaller test case, maybe, but it’s an issue of what is least effort (expediency).

I have a follow-on question plus one that is still unanswered from first post.

I suspect the issue will not occur with X.org. Does Toradex have any pre-canned bbappend files or instructions on how to disable Wayland from BSP5 and go back to X.org. Obviously I can figure it out myself.
Still would like to know best method of enabling debug logging for XWayland. Per the bugs referenced in first post it is supposedly achieved by starting XWayland with WAYLAND_DEBUG but I’ve not been able to successfully pass this thru from Weston.

kevin.tx · May 2, 2022, 3:16pm

Hi @tonyjones ,

I recommend having a look at this community post here on how to disable or remove wayland. There’s no out-of-the-box solution for including X.org in BSP5, as Toradex has switched to Wayland per default.

Community Post

Did you try something like weston --tty=1 on the command line? This for example is printing the output to tty1.

Best Regards
Kevin

tonyjones · May 3, 2022, 5:44am

I’ll try running weston free of the weston@.service when I have a chance.

We are also trying to come up with a smaller test case that we can pass along that demonstrates the assertion failure bug, and it is clearly a bug.

I also see the following related issue; output of /var/log/weston.log. Lines starting with “#” are my inline comments.

Date: 2022-05-03 UTC
[04:18:34.591] weston 9.0.0
               https://wayland.freedesktop.org
               Bug reports to: https://gitlab.freedesktop.org/wayland/weston/issues/
               Build: 9.0.0-35-g230e9bc3+
...
...
[04:18:34.976] xserver listening on display :clock1: 
# the following is the on-demand spawn from our app starting and making an X request
[04:18:35.382] Spawned Xwayland server, pid 646   
[04:18:35.654] xfixes version: 5.0
[04:18:35.755] created wm, root 71
# this is where the Xwayland crashes due to it's assertion failure
[04:24:53.561] xserver exited, code 6  
# our app exits with failure because the connection to X died and is restarted by a separate systemd service, the following is the respawn from this second run
[04:24:54.871] Spawned Xwayland server, pid 1098  
[04:24:55.045] xfixes version: 5.0
[04:24:55.110] created wm, root 71
# this is what I am confused about,  at this point weston unexpectedly stops and is restarted by the existing /usr/bin/weston-launch; why? 
Date: 2022-05-03 UTC
[04:24:56.625] weston 9.0.0
               https://wayland.freedesktop.org
               Bug reports to: https://gitlab.freedesktop.org/wayland/weston/issues/
               Build: 9.0.0-35-g230e9bc3+

So what happens is the following timeline:

weston@.service starts /usr/bin/weston-start which runs /usr/bin/weston-launch which runs /usr/bin/weston
our systemd service (similar to your wayland-app-launch.service) has a requires on weston@root.service
Your layers/meta-toradex-demos/recipes-graphics/wayland-app-launch/wayland-app-launch.sh does the following which we copied:

# wait for weston
while [ ! -e $XDG_RUNTIME_DIR/wayland-0 ] ; do sleep 0.1; done
sleep 1

Once wayland-0 exists then our app starts

We quit our app but Xwayland crashes due to the assertion error. Since our app quit it is restarted by our systemd service. wayland-0 exists, so our app runs. It makes an X call which results in the second “[04:24:54.871] Spawned Xwayland server, pid 1098” line in weston.log.
weston then, for reasons unknown, stops and is restarted by the existing /usr/bin/weston-launch but this causes the second run of our app to also fail.

6 our systemd service restarts the app again but now the timing is somehow off (100% reproducible), When our app restarts again the wayland-0 socket exists from the previous run, but /usr/bin/weston-launch hasn’t yet started /usr/bin/weston. So our app is already running and trying to make X calls before “[04:24:56.625] weston 9.0.0” occurs in the weston log.

Checking for the existence of ‘$XDG_RUNTIME_DIR/wayland-0’ doesn’t seem sufficiently robust. Augmenting it with a sleep loop checking for $(pidof /usr/bin/weston) “fixes” the issue but there is still the issue of weston respawning Xwayland and then quitting which causes the app to have to start again.

drew.tx · May 9, 2022, 2:33pm

Hi @tonyjones,

I’m back from vacation now but not sure I can add much here. It sounds like fixing the initial XWayland crash may resolve all your issues. Is that right? It may require working with the upstream Wayland community. Any luck on a smaller test case? It would definitely be interesting to see if this is specific to the Toradex BSP or if it can happen on other platforms.

Drew

lukejb · May 11, 2022, 6:46pm

Hi Drew,

I am Tony’s colleague, and can fill in a little bit here…

Replacing the check of ‘$XDG_RUNTIME_DIR/wayland-0’ with a check of $(pidof /usr/bin/weston) reduces the severity of the bug. Instead of requiring a user to power cycle the device, it now just briefly displays some ugly diagnostic text before restarting the app.

This problem has only expressed itself in our full app on a modified BSP5 image. A minimal app does not express the problem, nor does it show on an umodified Toradex BSP5 image or on an Ubuntu PC.

The relevant modifications to the BSP5 image are the video output; standard is 1920x1080 HDMI output, modified is 1024x768 LVDS output. The other modifications are to configure the lcd backlight driver and add the Wifi driver to the kernel, which seem irrelevant.

At this point, the most helpful thing would probably be a method for getting better diagnostic data from the wayland/weston server.

-Luke

drew.tx · May 11, 2022, 7:19pm

Hi Luke,

Have you tried the suggestion above from @kevin.tx? ie running weston --tty=1 manually?

Drew

lukejb · May 12, 2022, 7:57am

Hi Drew,
Yes, starting it seems reasonable, producing the following:

Startup messages

root@apalis-imx6-10565529:~# weston --tty=1 --backend=fbdev-backend.so
Date: 2022-05-06 UTC
[10:54:33.993] weston 9.0.0
https://wayland.freedesktop.org
Bug reports to: Issues · wayland / weston · GitLab
Build: rel_imx_5.4.70_2.3.2_rc1+
[10:54:33.993] Command line: weston --tty=1 --backend=fbdev-backend.so
[10:54:33.994] OS: Linux, 5.4.129-5.4.0-devel+git.cb88cc157bfb, #1 SMP Wed Sep 29 18:17:21 UTC 2021, armv7l
[10:54:33.994] Using config file ‘/etc/xdg/weston/weston.ini’
[10:54:33.994] Output repaint window is 7 ms maximum.
[10:54:33.995] Loading module ‘/usr/lib/libweston-9/fbdev-backend.so’
[10:54:34.006] initializing fbdev backend
[10:54:34.010] logind: not running in a systemd session
[10:54:34.011] logind: cannot setup systemd-logind helper (-61), using legacy fallback
[10:54:34.011] Loading module ‘/usr/lib/libweston-9/gl-renderer.so’
[10:54:34.025] EGL client extensions: EGL_EXT_client_extensions
EGL_EXT_platform_base EGL_KHR_platform_wayland
EGL_EXT_platform_wayland
[10:54:34.025] warning: either no EGL_EXT_platform_base support or specific platform support; falling back to eglGetDisplay.
[10:54:34.032] EGL version: 1.5
[10:54:34.032] EGL vendor: Vivante Corporation
[10:54:34.033] EGL client APIs: OpenGL_ES
[10:54:34.033] EGL extensions: EGL_KHR_fence_sync EGL_KHR_reusable_sync
EGL_KHR_wait_sync EGL_KHR_image EGL_KHR_image_base
EGL_KHR_image_pixmap EGL_KHR_gl_texture_2D_image
EGL_KHR_gl_texture_cubemap_image EGL_KHR_gl_renderbuffer_image
EGL_EXT_image_dma_buf_import
EGL_EXT_image_dma_buf_import_modifiers EGL_KHR_lock_surface
EGL_KHR_create_context EGL_KHR_no_config_context
EGL_KHR_surfaceless_context EGL_KHR_get_all_proc_addresses
EGL_EXT_buffer_age EGL_ANDROID_native_fence_sync
EGL_WL_bind_wayland_display
EGL_WL_create_wayland_buffer_from_image EGL_KHR_partial_update
EGL_EXT_swap_buffers_with_damage
EGL_KHR_swap_buffers_with_damage EGL_EXT_pixel_format_float
[10:54:34.033] EGL_KHR_surfaceless_context available
[10:54:34.042] GL version: OpenGL ES 3.0 V6.4.3.p1.305572
[10:54:34.042] GLSL version: OpenGL ES GLSL ES 3.00
[10:54:34.042] GL vendor: Vivante Corporation
[10:54:34.042] GL renderer: Vivante GC2000
[10:54:34.042] GL extensions: GL_OES_vertex_type_10_10_10_2
GL_OES_vertex_half_float GL_OES_element_index_uint
GL_OES_mapbuffer GL_OES_vertex_array_object
GL_OES_compressed_ETC1_RGB8_texture
GL_OES_compressed_paletted_texture GL_OES_texture_npot
GL_OES_rgb8_rgba8 GL_OES_depth_texture
GL_OES_depth_texture_cube_map GL_OES_depth24 GL_OES_depth32
GL_OES_packed_depth_stencil GL_OES_fbo_render_mipmap
GL_OES_get_program_binary GL_OES_fragment_precision_high
GL_OES_standard_derivatives GL_OES_EGL_image
GL_OES_EGL_image_external GL_OES_EGL_image_external_essl3
GL_OES_EGL_sync GL_OES_required_internalformat
GL_OES_surfaceless_context GL_OES_texture_border_clamp
GL_OES_texture_half_float GL_OES_texture_float
GL_EXT_texture_type_2_10_10_10_REV
GL_EXT_texture_filter_anisotropic
GL_EXT_texture_compression_dxt1 GL_EXT_texture_format_BGRA8888
GL_EXT_texture_compression_s3tc GL_EXT_read_format_bgra
GL_EXT_multi_draw_arrays GL_EXT_frag_depth
GL_EXT_discard_framebuffer GL_EXT_blend_minmax
GL_EXT_multisampled_render_to_texture GL_EXT_robustness
GL_EXT_texture_sRGB_decode GL_EXT_texture_border_clamp
GL_EXT_texture_rg GL_EXT_sRGB GL_VIV_direct_texture
[10:54:34.043] GL ES 2 renderer features:
read-back format: BGRA
wl_shm sub-image to texture: yes
EGL Wayland extension: yes
[10:54:34.043] Opening fbdev frame buffer.
[10:54:34.043] Calculating pixman format from:
- type: 0 (aux: 0)
- visual: 2
- bpp: 16 (grayscale: 0)
- red: offset: 11, length: 5, MSB: 0
- green: offset: 5, length: 6, MSB: 0
- blue: offset: 0, length: 5, MSB: 0
- transp: offset: 0, length: 0, MSB: 0
[10:54:34.100] Created head ‘/dev/fb0’ for device /dev/fb0 (DISP4 BG - DI1)
[10:54:34.127] event0 - gpio-keys: is tagged by udev as: Keyboard
[10:54:34.128] event0 - gpio-keys: device is a keyboard
[10:54:34.163] libinput: configuring device “gpio-keys”.
[10:54:34.163] Creating fbdev output.
[10:54:34.164] Chosen EGL config details: id: 41 rgba: 8 8 8 0 buf: 24 dep: 0 stcl: 0 int: 0-10 type: win|pix|pbf|swap_preserved vis_id: 0
[10:54:34.165] fbdev output 1024×768 px
guessing 65 Hz and 96 dpi
[10:54:34.165] associating input device event0 with output /dev/fb0 (none by udev)
[10:54:34.165] Output ‘/dev/fb0’ enabled with head(s) /dev/fb0
[10:54:34.165] Compositor capabilities:
arbitrary surface rotation: yes
screen capture uses y-flip: yes
presentation clock: CLOCK_MONOTONIC_RAW, id 4
presentation clock resolution: 0.000000001 s
[10:54:34.166] Loading module ‘/usr/lib/weston/kiosk-shell.so’
[10:54:34.167] Loading module ‘/usr/lib/libweston-9/xwayland.so’
[10:54:34.202] Registered plugin API ‘weston_xwayland_v1’ of size 16
[10:54:34.202] Registered plugin API ‘weston_xwayland_surface_v1’ of size 8
[10:54:34.203] xserver listening on display :0

Starting the App service produces an odd warning regarding the keyboard interface, although we aren’t using a keyboard on this hardware.

[10:55:51.874] xfixes version: 5.0
[10:55:51.940] created wm, root 71
The XKEYBOARD keymap compiler (xkbcomp) reports:
Warning: Unsupported maximum keycode 569, clipping.
X11 cannot support keycodes above 255.
Errors from xkbcomp are not fatal to the X server

Exiting the app produces the following:

(EE)
(EE) Backtrace:
(EE)
(EE) Segmentation fault at address 0x40
(EE)
Fatal server error:
(EE) Caught signal 11 (Segmentation fault). Server aborting
(EE)
[10:56:31.697] xserver exited, code 6

When running weston from its service, the output adds the following after the app exits, after the signal 11 error:

weston: …/…/cairo-1.16.0/src/cairo-xcb-screen.c:219: _get_screen_index: Assertion `!“reached”’ failed.
(EE) failed to read Wayland events: Broken pipe

Is this more informative that the weston.log output? I looked briefly at the cairo-xcb-screen code and it seemed to say that it was looking for the index of a screen that doesn’t exist.

-Luke

drew.tx · May 12, 2022, 4:36pm

Hi Luke,

That does seem to be consistent with the bug report @tonyjones mentioned above unfortunately there is no fix in the community. The theory of a missing screen is also mentioned here. Can you debug further to find out what the screen value is and post in that ticket? It evidently is a fairly rare case so maybe if you can help the wayland/weston developers that will get a proper fix.

Drew

drew.tx · May 12, 2022, 4:44pm

Also, I wonder if removing the xkeyboard-config and xkbcomp packages will address the warnings there.

Drew

lukejb · May 12, 2022, 7:44pm

Hi Drew,

I made a brief attempt at debugging, but I don’t know how to find the screen value.
With the more verbose debugging turned on, there are no extra messages that occur between the app’s rendering loop and the crash.

[2953504.457] wl_callback@62.done(41038060)
[2953687.702] → wl_surface@18.attach(wl_buffer@61, 0, 0)
[2953688.016] → wl_surface@18.damage(0, 0, 768, 1024)
[2953688.670] → wl_surface@18.frame(new id wl_callback@62)
[2953688.952] → wl_surface@18.commit()
[2953689.418] wl_surface@18.attach(wl_buffer@61, 0, 0)
[2953689.520] wl_surface@18.damage(0, 0, 768, 1024)
[2953689.654] wl_surface@18.frame(new id wl_callback@62)
[2953689.710] wl_surface@18.commit()
[2953707.358] → wl_buffer@61.release()
[2953708.452] → wl_callback@62.done(41038264)
[2953708.557] → wl_display@1.delete_id(62)
[2953708.777] wl_display@1.delete_id(62)
[2953709.082] wl_callback@62.done(41038264)
(EE)
(EE) Backtrace:
(EE)
(EE) Segmentation fault at address 0x40
(EE)
Fatal server error:
(EE) Caught signal 11 (Segmentation fault). Server aborting
(EE)
[21:39:05.843] xserver exited, code 6

The keyboard will probably be removed for production, but it is working and is useful when running an unmodified BSP.

It seems that the choices now are to either live with the issue or replace Weston with Xorg. Is there a procedure for doing this with BSP5?

-Luke

drew.tx · May 13, 2022, 4:57pm

We have not tested Xorg with BSP 5 but it should be doable. Yocto Dunfell still supports it so you should be able to switch but I don’t have a simple recipe for how to do that.

Drew

lukejb · May 17, 2022, 6:24pm

Followup:

The failure sequence isn’t quite what we originally thought. It seems that this is what was happening:

The app shuts down.
The Xwayland server segfaults sometime between the app’s last two SDL cleanup calls: SDL_Window_Destroy and SDL_Quit
The Weston process keeps running, but without a screen.
The app service restarts the app, which tries to open a window using the Xwayland server.
Weston generates the assertion failure and exits because it has no screen to provide to the app.
The app exits because it is unable to open a display.
GOTO 4

The bandaid solution is to improve the monitoring of weston/Xwayland, and restart them as needed. The other option of intentionally not calling the SDL cleanup functions seems risky.