Wifi interfaces not appearing on Colibri iMX8X after power on

Hi everyone,

I’m having issues with Wi-Fi interfaces not appearing after start up.
(@drew.tx, this is the issue I talked about in my last mail)

Hardware:

Image Info:
Linux 5.15.129-6.5.0+git.6f8fd49366db #1-TorizonCore SMP PREEMPT Fri Dec 22 11:15:52 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

Issue:
I observed that sometimes the Wi-Fi Interfaces (mlan0 and uap0) are not visible through NetworkManager after startup. When inspecting the dmesg output I noticed that mwifiex isn’t referenced anywhere. When comparing the dmesg output of a start with the issue and without, I think the issue is connected to this line: imx6q-pcie 5f010000.pcie: Phy link never came up

dmesg with the issue:

[    1.879795] imx6q-pcie 5f010000.pcie: iATU unroll: disabled
[    1.879820] imx6q-pcie 5f010000.pcie: Detected iATU regions: 6 outbound, 6 inbound
[    1.879845] imx6q-pcie 5f010000.pcie: host bridge /bus@5f000000/pcie@0x5f010000 ranges:
[    1.879913] imx6q-pcie 5f010000.pcie:       IO 0x007ff80000..0x007ff8ffff -> 0x0000000000
[    1.879950] imx6q-pcie 5f010000.pcie:      MEM 0x0070000000..0x007fefffff -> 0x0070000000
[    1.880128] imx6q-pcie 5f010000.pcie: iATU unroll: disabled
[    1.880139] imx6q-pcie 5f010000.pcie: Detected iATU regions: 6 outbound, 6 inbound
[...]
[    2.881335] imx6q-pcie 5f010000.pcie: Phy link never came up
[    2.881899] Wi-Fi_PDn: Underflow of regulator enable count
[    2.887430] regulator-dummy: Underflow of regulator enable count
[    2.894371] imx6q-pcie: probe of 5f010000.pcie failed with error -110
[    2.897720] Freeing unused kernel memory: 4480K

dmesg without the issue

[    1.870624] imx6q-pcie 5f010000.pcie: iATU unroll: disabled
[    1.870649] imx6q-pcie 5f010000.pcie: Detected iATU regions: 6 outbound, 6 inbound
[    1.870674] imx6q-pcie 5f010000.pcie: host bridge /bus@5f000000/pcie@0x5f010000 ranges:
[    1.870744] imx6q-pcie 5f010000.pcie:       IO 0x007ff80000..0x007ff8ffff -> 0x0000000000
[    1.870778] imx6q-pcie 5f010000.pcie:      MEM 0x0070000000..0x007fefffff -> 0x0070000000
[    1.870957] imx6q-pcie 5f010000.pcie: iATU unroll: disabled
[    1.870969] imx6q-pcie 5f010000.pcie: Detected iATU regions: 6 outbound, 6 inbound
[    1.971064] imx6q-pcie 5f010000.pcie: Link up
[    1.971079] imx6q-pcie 5f010000.pcie: Link: Gen2 disabled
[    1.971091] imx6q-pcie 5f010000.pcie: Link up, Gen1
[    2.090477] imx6q-pcie 5f010000.pcie: Link up
[    2.090695] imx6q-pcie 5f010000.pcie: PCI host bridge to bus 0000:00
[    2.090712] pci_bus 0000:00: root bus resource [bus 00-ff]
[    2.090728] pci_bus 0000:00: root bus resource [io  0x0000-0xffff]
[    2.090742] pci_bus 0000:00: root bus resource [mem 0x70000000-0x7fefffff]
[    2.090794] pci 0000:00:00.0: [1957:0000] type 01 class 0x060400
[    2.090823] pci 0000:00:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
[    2.090846] pci 0000:00:00.0: reg 0x38: [mem 0x00000000-0x00ffffff pref]
[    2.090919] pci 0000:00:00.0: supports D1 D2
[    2.090932] pci 0000:00:00.0: PME# supported from D0 D1 D2 D3hot
[    2.097509] pci 0000:01:00.0: [1b4b:2b42] type 00 class 0x020000
[    2.097609] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x000fffff 64bit pref]
[    2.097665] pci 0000:01:00.0: reg 0x18: [mem 0x00000000-0x000fffff 64bit pref]
[    2.098052] pci 0000:01:00.0: supports D1 D2
[    2.098065] pci 0000:01:00.0: PME# supported from D0 D1 D3hot D3cold
[    2.098309] pci 0000:01:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x1 link at 0000:00:00.0 (capable of 4.000 Gb/s with 5.0 GT/s PCIe x1 link)
[    2.126308] pci 0000:00:00.0: BAR 0: assigned [mem 0x70000000-0x70ffffff]
[    2.126339] pci 0000:00:00.0: BAR 6: assigned [mem 0x71000000-0x71ffffff pref]
[    2.126358] pci 0000:00:00.0: BAR 15: assigned [mem 0x72000000-0x721fffff 64bit pref]
[    2.126382] pci 0000:01:00.0: BAR 0: assigned [mem 0x72000000-0x720fffff 64bit pref]
[    2.126430] pci 0000:01:00.0: BAR 2: assigned [mem 0x72100000-0x721fffff 64bit pref]
[    2.126479] pci 0000:00:00.0: PCI bridge to [bus 01-ff]
[    2.126495] pci 0000:00:00.0:   bridge window [mem 0x72000000-0x721fffff 64bit pref]
[    2.127059] pcieport 0000:00:00.0: PME: Signaling with IRQ 465
[    2.127542] pcieport 0000:00:00.0: AER: enabled with IRQ 465
[    2.129781] Freeing unused kernel memory: 4480K

What can I do to prevent this problem?

Hi, I’m the Toradex AI Assistant. I’m in early beta, and I would appreciate your feedback.
The Toradex Support team will follow up with you shortly!

Did you like my answer?

:+1: :-1:


Handshake,

The issue you’re encountering with the Wi-Fi interfaces not appearing on your Colibri iMX8 DualX 1GB Wi-Fi / Bluetooth module seems to be related to the PCIe link not coming up consistently. The dmesg logs you provided indicate that the problem occurs when the PCIe link fails to establish (Phy link never came up), which in turn affects the initialization of the Wi-Fi module connected via PCIe.

This problem can be caused by several factors, including power supply instability, signal integrity issues on the PCIe interface, or software/configuration problems. Here are some steps you can take to troubleshoot and potentially resolve the issue:

  1. Power Supply Check: Ensure that your custom carrier board provides a stable and sufficient power supply to the SoM, especially during the boot process. Power fluctuations can cause the PCIe link to fail.

  2. Hardware Design Review: If possible, review the PCIe layout and design on your custom carrier board to ensure it meets the specifications and recommendations provided by NXP for the iMX8 series and by Toradex for the Colibri module. Pay special attention to the impedance of the PCIe traces and the quality of the signal.

  3. Software Configuration: Verify that the PCIe root complex and endpoint configurations in the device tree (.dts/.dtsi files) are correct for your specific hardware setup. You might need to adjust the settings for the PCIe controller on the iMX8 to ensure compatibility with your Wi-Fi module.

  4. Kernel Parameters: Experiment with kernel boot parameters that might affect PCIe initialization. For example, some platforms benefit from delaying the enumeration of PCIe devices. While this is more of a workaround than a solution, it can help identify if timing is part of the issue.

  5. Update Firmware and Software: Ensure that you are using the latest version of the TorizonCore and the firmware for the Wi-Fi module. Updates often include fixes for hardware compatibility issues, including PCIe.

  6. Debugging PCIe: If you have access to debugging tools or can enable additional debug output for the PCIe subsystem in the Linux kernel, this might provide further insights into why the link fails to come up.

  7. Alternative Configurations: As a temporary measure, if your application allows, consider using a USB Wi-Fi dongle supported by TorizonCore. This can help isolate whether the issue is specific to the PCIe interface or related to other aspects of the system configuration.

  8. Consult Hardware Documentation: Review the Toradex and

Please, also check if these links can help you solve your problem:

Toradex’s Linux Images - How to connect to a Wi-Fi network | Toradex Developer Center.

Hello @MARO3,

Welcome to the Toradex Community!

How often do you see this issue?
Does it only happen with some modules or did you observe this with all SoMs?

From the information you sent, it appears you are using Torizon OS 6.5.0, is this correct?

Best Regards,
Bruno

@MARO3

In addition to the suggestions from @bruno.tx, can you attempt to reproduce this in a Toradex carrier board?

Drew

Hi @bruno.tx, Hi @drew.tx,

at least 25% of the modules in use have this issue. For most of the modules the problem occurs at least once a day. Only after multiple restarts wifi is working again. At the moment I cannot find a pattern to reproduce the issue.

The modules are running a customized Torizon OS 6.5.0 image created with TorizonCore builder.

I will try to reproduce the error on the carrier board and get back to you with more info.

Kind Regards,
Robin

Hi @drew.tx,

I tried to reproduce the error using a Toradex carrier board, but the Wi-Fi worked every time. I also switched between the Toradex board and our own hardware, but the issue did not occur in my tests since I initially reported it. The modules tested did exhibit the problem before, but now it seems to have disappeared. Nevertheless, the issue is still present in many delivered machines, and I need a solution for this. Have you observed this problem with other customers before?

Kind Regards,
Robin

I also have some additional info about this issue:

  • It did only appear since moving to Torizon 6. There were some minor changes within my device-tree overlay with the biggest change being not disabling the fec1 ethernet node.
  • There was no significant hardware change on our carrier board.
  • A Wi-Fi AP is configured with hostapd

Hi @MARO3,

I wonder if there is something marginal on your carrier board? My colleague @matthias.tx might have some ideas as he is our hardware expert.

I don’t know of anything obvious in TC6 vs TC5 that would cause this but I suppose it is possible. The only concrete thing I can think of would be to detect the issue and restart NetworkManager or other bits if you can identify a mechanism to get the devices up. Not an ideal solution but unless we have a better understanding of the root cause I cannot suggest anything better.

Drew

Hi @drew.tx,

If I’m not mistaken, I’ve contacted @matthias.tx before regarding another issue with our carrier board. I’ll check with him if anything from our hardware side could cause this issue.

In the meantime, is there a possibility to “reinitialize” the PCI connection to the Wifi chip? I tried to restart NetworkManager, rescan for PCI connections (echo 1 | sudo tee /sys/bus/pci/rescan), modprobe -r mwifiex_pcie and modprobe mwifiex_pcie. All without success.

Kind regards,

Robin

Hello @MARO3,

As a workaround, you could try to reload the WiFi driver and rescan the PCIe bus.

The following commands can be used for this:

echo "0000:01:00.0" > /sys/bus/pci/drivers/pcieport/0000\:00\:00.0/0000\:01\:00.0/driver/unbind
echo 1 > /sys/bus/pci/rescan
echo 1 > /sys/bus/pci/drivers/pcieport/0000\:00\:00.0/0000\:01\:00.0/enable
modprobe -r mwifiex_pcie
modprobe mwifiex_pcie

Restarting the network manager is not necessary after doing this procedure.

Best Regards,
Bruno

Hi @bruno.tx,

thanks for the suggested workaround.
Unfortunately when testing it I got the following errors:

$ echo "0000:01:00.0" > /sys/bus/pci/drivers/pcieport/0000\:00\:00.0/0000\:01\:00.0/driver/unbind
sh: /sys/bus/pci/drivers/pcieport/0000:00:00.0/0000:01:00.0/driver/unbind: No such file or directory

$ echo 1 > /sys/bus/pci/drivers/pcieport/0000\:00\:00.0/0000\:01\:00.0/enable
sh: /sys/bus/pci/drivers/pcieport/0000:00:00.0/0000:01:00.0/enable: No such file or directory

Is there anything else I could try to reload the WiFi driver?

Kind regards,
Robin

Hello @MARO3,

Can you send the contents (ls) of the /sys/bus/pci/drivers/pcieport/ and /sys/bus/pci/drivers/pcieport/0000\:00\:00.0 directories when the WiFi fails?

Also, can you try the following commands to reload the WiFi driver?
I think there might be some differences when the WiFi fails to load and that is why the previous commands did not work.

echo "0000:01:00.0" > /sys/bus/pci/drivers/mwifiex_pcie/unbind
echo 1 > /sys/bus/pci/rescan
modprobe mwifiex_pcie
echo "0000:01:00.0" > /sys/bus/pci/drivers/mwifiex_pcie/bind
modprobe -r mwifiex_pcie
modprobe mwifiex_pcie

Best Regards,
Bruno

Hi @bruno.tx,

here are the contents of the directories and also the error from the first command to reload the WiFi driver:

$ ls /sys/bus/pci/drivers/pcieport/
bind  new_id  remove_id  uevent  unbind

$ ls /sys/bus/pci/drivers/pcieport/0000\:00\:00.0
ls: cannot access '/sys/bus/pci/drivers/pcieport/0000:00:00.0': No such file or directory

$ echo "0000:01:00.0" > /sys/bus/pci/drivers/mwifiex_pcie/unbind 
sh: echo: write error: No such device

Kind Regards,
Robin

Hello @MARO3,

Thanks for the information.
So the behavior being observed is the same as I can see in a SoM without the WiFi module.

Did you try to run the subsequent commands after you got the error?
The unbind is likely not necessary because there was no bind in the first place.

echo 1 > /sys/bus/pci/rescan
modprobe mwifiex_pcie
echo "0000:01:00.0" > /sys/bus/pci/drivers/mwifiex_pcie/bind
modprobe -r mwifiex_pcie
modprobe mwifiex_pcie

Best Regards,
Bruno

Hi @bruno.tx,

I tried them as well but without success.

echo "0000:01:00.0" > /sys/bus/pci/drivers/mwifiex_pcie/bind

Just returns the same error as the unbind.

Kind regards,
Robin

Hello @MARO3,

Thanks for the information.
It looks like the WiFi is still not detected.
I think we should look into the hardware aspect and trying to reproduce the issue so we can try other workarounds. I know Matthias is in touch with you to look into this and we are coordinating on how to best deal with this problem.
When I have more information, I will send it here.

Best Regards,
Bruno