Weston-vivante launch with gray screen

Hi Jeremias,

Unfortunately, it’s somewhat of a heisen-bug and at this point in time all I have are our post-mortem analysis and observations. I’ll outline what I can provide below:

Also when you say “one of our products”, do you mean seeing this on one device or multiple devices on the same product line?

We have observed it on exactly two devices so far. One was an SQA unit that did it once or twice and never did it again. We didn’t get a chance to inspect it before it was re-deployed (which recreates containers and therefore “fixes” it). The other is the one I was looking at yesterday and which would boot to this state fairly reliably (but I don’t think that’s conclusive since the underlying issue only needs to happen once and then the damaged container will persist). Both are the same product. Note that in our case. both weston and weston-dev have been customized with branding, so if the “grey” background is being observed it means there is definitely some corruption or loss of weston’s configuration rather than just an incorrect entry into the “developer” state.

I’m curious if this is reproducible with the 3 tag, or not.

As are we; unfortunately we have not found the steps to reproduce the actual issue yet (see below). We have prepared an update with the new weston image and will continue to monitor for reports of this issue.

Interesting, I don’t recall ever seeing a similar corruption issue like this with containers before. If you remove and restart the container does it persist or does the issue go away?

Yes, that fixed it and I was unable to cause the issue to recur after restarting the device ~10 times. What’s weird here is our representative reported this unit would occasionally boot into a “correct” state. Presumably at some point the container would fail to the point that Docker decides to re-create it and this would fix the issue again for a little while.

. For example is it enough to just restart the Weston container a bunch of times? Does Cog/Chromium need to be involved? Any other factors?

Our device does run a Cog container. The only additional data point I have is this was a unit used for expos and at some point the LVDS display connector became loose enough that the display would flicker and become unstable. My current best guess as to what happened is that interference or back-EMF from the unreliable connection upset the IMX8 GPU in some way, which caused Weston to crash. As far as I know, LVDS doesn’t have an active detection mechanism for the presence of a display, so I don’t think it was a case of Weston seeing the display repeatedly appearing and vanishing in a short time. It might be possible to reproduce the issue by re-introducing this problem but at this point I’m not sure it’s worth the risk of hardware damage to the iMX or display. If it is at all useful to you I did save a snapshot of the damaged docker image and I could pull some of the core files.

I did find this related issue here - (we don’t use any of the mentioned variables, but I noticed entry.sh does run dos2unix repeatedly on the configuration files at container startup). So it’s plausible an inopportune crash or power loss could result in an empty or partially written Weston file remaining on the disk and persisting to further boots.

Regards,
~BW