We have a custom carrier board using the Apalis i.MX8 QuadMax 4GB WB IT SOM running a custom yocto image currently based on the 5.15 BSP. We have an intermittent problem on some SOMs where the USB 3503 hub on the SOM will stop working. When that happens, the kernel logs show errors such as:
[ 1.473941] usb 3-1: new high-speed USB device number 2 using ci_hdrc
[ 1.613978] usb 3-1: device descriptor read/64, error -71
[ 1.869956] usb 3-1: device descriptor read/64, error -71
[ 2.105952] usb 3-1: new high-speed USB device number 3 using ci_hdrc
[ 2.245936] usb 3-1: device descriptor read/64, error -71
[ 2.497982] usb 3-1: device descriptor read/64, error -71
[ 2.606010] usb usb3-port1: attempt power cycle
The USB hub and the bluetooth device no longer show up in lsusb
output and we can not connect to bluetooth device anymore. The carrier board has a Nordic nRF52840 based device connected on the USB H2 data pins of the SOM which go to the same hub. When these errors occur, the USB hub does not respond on the I2C bus, and toggling the reset GPIO does not bring it back. A software reset also does not bring the hub back, a power cycle is required.
The issue occurs on some SOMs, about a quarter of the ones we have installed recently. It is very intermittent in the lab setting on those SOMs generally but seems more prevalent in production. We have been running overnight tests that power the board on, wait 5 minutes, and check the lsusb
output, and that fails 3-15 times out of 200 tries. The 5 minute wait is because sometimes the USB hub fails on boot and sometimes fails up to 5 minutes after boot.
Replacing the SOM with a “Good” SOM running the same software image on the same carrier board shows no USB errors.
However, with the “Intermittent” SOM running the Toradex Minimal Reference Yocto image 6.8.0 with roughly the same 5.15 kernel running on our carrier board, we also don’t see the USB issues. And running the Toradex image on the Ixora 1.2 dev board on the “Intermittent” SOM shows no USB errors.
After enough testing with our software on our carrier board, three of the “Intermittent” SOMs have transitioned to “Damaged” SOMs where the USB fails on every boot. In this state, when I install the Toradex reference image and run it on the Ixora dev board, it will show USB errors on every boot also.
In all failure cases the i.MX8QM boots fine and everything works fine except for the USB bus. So there seems to be some hardware and some software component to this. We measured the 1.2V and 3.3V power rails going to the USB 3503 and they looked ok in both the good and bad case, no spikes. Looking for more ideas on what to investigate.
- What could damage the USB 3503 without damaging the CPU or other SOM components?
- What could be different between our image and the reference image that could cause “Intermittent” SOMs to fail with our image but succeed with Toradex image?
- We are not sure if the SOMs are received in an intermittent state or if something we are doing causes it. Is there a good way to investigate the SOMs before putting them into production?
Thanks,
Aaron