WiFi on Apalis iMX8 stops working under simultaneous heavy load and WiFi scanning

We’ve been running into WiFi reliability issues on our own yocto build, and I was able to reproduce the issue with the daily build:

Apalis-iMX8_Reference-Minimal-Image-Tezi_5.7.0-devel-20220620+build.698

The reproduction steps are to connect to a wifi network using iw and wpa_supplicant, then start aggressive background scanning (periodic scanning causes the failure as well, it just takes longer):

(while true; do iw mlan0 scan >/dev/null; sleep 2; done) &

Then wget a large file to put the card under heavy load:

wget https://git.kernel.org/torvalds/t/linux-5.18-rc1.tar.gz

It usually just takes the one transfer, but sometimes additional transfers may be necessary. The transfer will hang and all network traffic will cease until the interface is disabled and re-enabled. During this time, iw mlan0 scan will still return a list of access points.

Note that it seems if there are few or no other access points in the area, the problem is harder to reproduce.

Hello @Russ,
Could you explain what is the use case that’s leading to this situation?

Is it a requirement that you run background scans with short wait period, or did you come up with this test when trying to reproduce the issue?

Best regards,
Rafael Beims

I came up with the test case to reproduce the test case quickly rather than waiting several hours.

In this case I understand that you will eventually get the problem while you have your system running (even though it could take a long time).

Have you tried disabling background scanning? If the problem is related to having a wi-fi scan run simultaneously with a data transfer, you could workaround the issue. You can do that by editing /etc/connman/main.conf and setting:

BackgroundScanning = false

We will try to reproduce the issue and I’ll get back as soon as I have news.

I have verified that disables background scanning, but it does break roaming.

I was able to reproduce your problem using the test setup you described. I enabled the NXP wifi proprietary drivers and repeated the test. This time the issue was not reproduced.
As a next step I would suggest that you try enabling the proprietary drivers on your end and check if your application works better with them.
I’m currently updating our documentation on how to enable the drivers, and I’ll come back with instructions as soon as I’m finished.

Hello @russ,
I updated our meta-toradex-wifi yocto layer. This is a special layer that we have created for cases where the proprietary NXP wifi drivers are necessary.
Could you go to GitHub - toradex/meta-toradex-wifi: This repository contains recipes and meta-data to integrated the NXP proprietary wireless drivers (also called mlan or the C-Driver) into the Toradex BSP for SOMs containing the Azurewave 88W8997 chip. and follow the installation instructions there and see if with the proprietary driver enabled the issue is also resolved at your end?

Please let me know if you have difficulties enabling the driver.
Best regards,
Rafael