Image Info: Linux 5.15.129-6.5.0+git.6f8fd49366db #1-TorizonCore SMP PREEMPT Fri Dec 22 11:15:52 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux
Issue:
I observed that sometimes the Wi-Fi Interfaces (mlan0 and uap0) are not visible through NetworkManager after startup. When inspecting the dmesg output I noticed that mwifiex isn’t referenced anywhere. When comparing the dmesg output of a start with the issue and without, I think the issue is connected to this line: imx6q-pcie 5f010000.pcie: Phy link never came up
Hi, I’m the Toradex AI Assistant. I’m in early beta, and I would appreciate your feedback.
The Toradex Support team will follow up with you shortly!
Did you like my answer?
Handshake,
The issue you’re encountering with the Wi-Fi interfaces not appearing on your Colibri iMX8 DualX 1GB Wi-Fi / Bluetooth module seems to be related to the PCIe link not coming up consistently. The dmesg logs you provided indicate that the problem occurs when the PCIe link fails to establish (Phy link never came up), which in turn affects the initialization of the Wi-Fi module connected via PCIe.
This problem can be caused by several factors, including power supply instability, signal integrity issues on the PCIe interface, or software/configuration problems. Here are some steps you can take to troubleshoot and potentially resolve the issue:
Power Supply Check: Ensure that your custom carrier board provides a stable and sufficient power supply to the SoM, especially during the boot process. Power fluctuations can cause the PCIe link to fail.
Hardware Design Review: If possible, review the PCIe layout and design on your custom carrier board to ensure it meets the specifications and recommendations provided by NXP for the iMX8 series and by Toradex for the Colibri module. Pay special attention to the impedance of the PCIe traces and the quality of the signal.
Software Configuration: Verify that the PCIe root complex and endpoint configurations in the device tree (.dts/.dtsi files) are correct for your specific hardware setup. You might need to adjust the settings for the PCIe controller on the iMX8 to ensure compatibility with your Wi-Fi module.
Kernel Parameters: Experiment with kernel boot parameters that might affect PCIe initialization. For example, some platforms benefit from delaying the enumeration of PCIe devices. While this is more of a workaround than a solution, it can help identify if timing is part of the issue.
Update Firmware and Software: Ensure that you are using the latest version of the TorizonCore and the firmware for the Wi-Fi module. Updates often include fixes for hardware compatibility issues, including PCIe.
Debugging PCIe: If you have access to debugging tools or can enable additional debug output for the PCIe subsystem in the Linux kernel, this might provide further insights into why the link fails to come up.
Alternative Configurations: As a temporary measure, if your application allows, consider using a USB Wi-Fi dongle supported by TorizonCore. This can help isolate whether the issue is specific to the PCIe interface or related to other aspects of the system configuration.
Consult Hardware Documentation: Review the Toradex and
Please, also check if these links can help you solve your problem:
at least 25% of the modules in use have this issue. For most of the modules the problem occurs at least once a day. Only after multiple restarts wifi is working again. At the moment I cannot find a pattern to reproduce the issue.
The modules are running a customized Torizon OS 6.5.0 image created with TorizonCore builder.
I will try to reproduce the error on the carrier board and get back to you with more info.
I tried to reproduce the error using a Toradex carrier board, but the Wi-Fi worked every time. I also switched between the Toradex board and our own hardware, but the issue did not occur in my tests since I initially reported it. The modules tested did exhibit the problem before, but now it seems to have disappeared. Nevertheless, the issue is still present in many delivered machines, and I need a solution for this. Have you observed this problem with other customers before?
I also have some additional info about this issue:
It did only appear since moving to Torizon 6. There were some minor changes within my device-tree overlay with the biggest change being not disabling the fec1 ethernet node.
There was no significant hardware change on our carrier board.
I wonder if there is something marginal on your carrier board? My colleague @matthias.tx might have some ideas as he is our hardware expert.
I don’t know of anything obvious in TC6 vs TC5 that would cause this but I suppose it is possible. The only concrete thing I can think of would be to detect the issue and restart NetworkManager or other bits if you can identify a mechanism to get the devices up. Not an ideal solution but unless we have a better understanding of the root cause I cannot suggest anything better.
If I’m not mistaken, I’ve contacted @matthias.tx before regarding another issue with our carrier board. I’ll check with him if anything from our hardware side could cause this issue.
In the meantime, is there a possibility to “reinitialize” the PCI connection to the Wifi chip? I tried to restart NetworkManager, rescan for PCI connections (echo 1 | sudo tee /sys/bus/pci/rescan), modprobe -r mwifiex_pcie and modprobe mwifiex_pcie. All without success.
thanks for the suggested workaround.
Unfortunately when testing it I got the following errors:
$ echo "0000:01:00.0" > /sys/bus/pci/drivers/pcieport/0000\:00\:00.0/0000\:01\:00.0/driver/unbind
sh: /sys/bus/pci/drivers/pcieport/0000:00:00.0/0000:01:00.0/driver/unbind: No such file or directory
$ echo 1 > /sys/bus/pci/drivers/pcieport/0000\:00\:00.0/0000\:01\:00.0/enable
sh: /sys/bus/pci/drivers/pcieport/0000:00:00.0/0000:01:00.0/enable: No such file or directory
Is there anything else I could try to reload the WiFi driver?
Can you send the contents (ls) of the /sys/bus/pci/drivers/pcieport/ and /sys/bus/pci/drivers/pcieport/0000\:00\:00.0 directories when the WiFi fails?
Also, can you try the following commands to reload the WiFi driver?
I think there might be some differences when the WiFi fails to load and that is why the previous commands did not work.
here are the contents of the directories and also the error from the first command to reload the WiFi driver:
$ ls /sys/bus/pci/drivers/pcieport/
bind new_id remove_id uevent unbind
$ ls /sys/bus/pci/drivers/pcieport/0000\:00\:00.0
ls: cannot access '/sys/bus/pci/drivers/pcieport/0000:00:00.0': No such file or directory
$ echo "0000:01:00.0" > /sys/bus/pci/drivers/mwifiex_pcie/unbind
sh: echo: write error: No such device
Thanks for the information.
It looks like the WiFi is still not detected.
I think we should look into the hardware aspect and trying to reproduce the issue so we can try other workarounds. I know Matthias is in touch with you to look into this and we are coordinating on how to best deal with this problem.
When I have more information, I will send it here.