Bluetooth disconnection from external device crashes bluetooth module/driver on a Verdin AM62 board

Hello,

We are running into an issue with Bluetooth on a Verdin AM62 board.
The Bluetooth module or driver becomes unresponsive/crashes after 2 to 10 connect/disconnect cycles from a master device.

Board: Verdin AM62 Dual 1GB Wi-Fi / Bluetooth IT (u-blox MAYA-W160-00 module)
Image: Non customized Torizon OS 6 6.5.0+build.8 2024/01/17 for Verdin AM62 installed using Tezi.
Carrier board: Yavia

After installation has finished we can ssh into the board and execute the following procedure:

bluetoothctl
power on
menu advertise
name test
tx-power on
back
advertise on
  • We can see the AM62 advertising packets using nRF Sniffer.
  • We connect to the AM62 board using a laptop (Windows 10 using BLEConsole).
  • BLEConsole on the laptop and bluetoothctl on the AM62 both confirm that they are connected.
  • We confirm that the connection is successful by verifying that the advertising packets have stopped and there are now Master/Slave packets being transmitted.
  • We disconnect from the AM62 board by closing the connection on the laptop.
    After a few seconds bluetoothctl recognizes that the laptop has disconnected.
  • We verify that the Master/Slave packets also stop.
  • We then restart the advertising via
advertise off
advertise on

and repeat the connection process.

After 2 to 10 (the number of attempts appears to be random) such cycles the AM62 board does not receive the disconnect event anymore and still thinks it is connected.
After about 3 seconds the following trace is printed to the kernel ring buffer:

[  835.213717] Bluetooth: hci0: Frame reassembly failed (-84)
[  835.219727] ------------[ cut here ]------------
[  835.219761] serial serial0: receive_buf returns -84 (count = 20)
[  835.219965] WARNING: CPU: 0 PID: 27 at ttyport_receive_buf+0xb0/0x100
[  835.220074] Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_addrtype iptable_filter ip_tables x_tables br_netfilter bridge stp llc bnep rpmsg_ctrl rpmsg_char irq_pruss_intc pru_rproc crct10dif_ce btnxpuart pvrsrvkm(O) ti_k3_r5_remoteproc virtio_rpmsg_bus rpmsg_ns rtc_ti_k3 mwifiex_sdio ti_k3_m4_remoteproc mwifiex ti_k3_common tidss drm_dma_helper sa2ul pruss mcrc snd_soc_davinci_mcasp snd_soc_ti_udma snd_soc_ti_edma snd_soc_ti_sdma pwm_tiehrpwm m_can_platform m_can snd_soc_nau8822 can_dev ina2xx ti_ads1015 industrialio_triggered_buffer kfifo_buf lontium_lt8912b industrialio lm75 tc358768 display_connector overlay drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops spi_omap2_mcspi uio_pdrv_genirq uio cfg80211 libcomposite fuse
[  835.221445] CPU: 0 PID: 27 Comm: kworker/u4:1 Tainted: G           O       6.1.46-6.5.0+git.8e6a2ddd4fe6 #1-TorizonCore
[  835.221496] Hardware name: Toradex Verdin AM62 WB on Verdin Development Board (DT)
[  835.221531] Workqueue: events_unbound flush_to_ldisc
[  835.221599] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  835.221655] pc : ttyport_receive_buf+0xb0/0x100
[  835.221716] lr : ttyport_receive_buf+0xb0/0x100
[  835.221775] sp : ffff800009e6bd40
[  835.221801] x29: ffff800009e6bd40 x28: 0000000000000000 x27: 0000000000000000
[  835.221884] x26: ffff00000ada4cb8 x25: ffff000000019005 x24: ffff000000cb36e0
[  835.221970] x23: ffff00000023ae80 x22: ffff000000cb3708 x21: ffff000000f4b800
[  835.222052] x20: 00000000ffffffac x19: 0000000000000014 x18: ffffffffffffffff
[  835.222133] x17: 0000000000000000 x16: 0000000000000000 x15: ffff800009b7ef46
[  835.222215] x14: 0000000000000001 x13: ffff800009897638 x12: 0000000000000630
[  835.222295] x11: 0000000000000210 x10: ffff800009947638 x9 : ffff800009897638
[  835.222376] x8 : 00000000ffffdfff x7 : ffff800009947638 x6 : 0000000000000000
[  835.222456] x5 : ffff00003fd7eb60 x4 : 0000000000000000 x3 : 0000000000000027
[  835.222534] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff00000023ae80
[  835.222617] Call trace:
[  835.222641]  ttyport_receive_buf+0xb0/0x100
[  835.222705]  flush_to_ldisc+0xb8/0x1d0
[  835.222751]  process_one_work+0x1d8/0x44c
[  835.222816]  worker_thread+0x14c/0x444
[  835.222864]  kthread+0x110/0x120
[  835.222909]  ret_from_fork+0x10/0x20
[  835.222963] ---[ end trace 0000000000000000 ]---
[  835.223044] Bluetooth: hci0: Frame reassembly failed (-84)
[  835.257805] Bluetooth: hci0: Frame reassembly failed (-84)
[  835.301071] Bluetooth: hci0: Frame reassembly failed (-84)
[  835.346094] Bluetooth: hci0: Frame reassembly failed (-84)
[  835.352128] Bluetooth: hci0: Frame reassembly failed (-84)
[  835.358035] Bluetooth: hci0: Frame reassembly failed (-84)
[  835.366175] Bluetooth: hci0: Frame reassembly failed (-84)
[  869.179293] Bluetooth: hci0: Frame reassembly failed (-84)
[  869.185063] Bluetooth: hci0: Frame reassembly failed (-84)
[  871.258567] Bluetooth: hci0: command 0x0406 tx timeout

The Bluetooth module remains completely unresponsive aftewards, restarting the bluetooth service does not fix it. The only “fix” we have found so far is to power cycle the whole board. It is unclear if this issue occurs in the module or driver.

A similar issue is described here and here but the issue occurred on board boot up not during disconnection handling.

What could be causing this behaviour and how could we mitigate it?
Thanks in advance

Hi @SmartStuff!

Welcome to Toradex Community! :tada: :partying_face:
Feel free to browse around :smiley:

Thanks a lot for such detailed description of the issue you are facing. The reproducing steps are really helpful!

As you found in Linux Kernel mainling list, Torade R&D has faced an issue that seems related to the issue you are sharing here.

The investigation about the issue on the Linux Kernel mainling list is currently ongoing.

Please allows some time to investigate if the issue you are facing is related to the one being investigated.

We will get back to you as soon as possible.

Best regards,

Hi @SmartStuff !

I just tested this on my side (and @rafael.tx tested on his side).

Turns out we couldn’t reproduce the issue you pointed out in your first message.

Here is what we tested:

Modules & OS tested

  • Verdin AM62 Quad 2GB WB IT V1.1A (Torizon OS 6.5.0-build.8)
  • Verdin AM62 Quad 2GB WB IT V1.1A (TDX Wayland with XWayland 6.6.0-devel-20240125+build.506)
  • Verdin AM62 Dual 1GB WB IT V1.1A (TDX Wayland with XWayland 6.5.0+build.9 (kirkstone))

Carrier boards

  • Verdin Development Board V1.1A
  • Verdin Development Board V1.1B

Devices connected

  • LightBlue app v1.9.9(50) on Android (no Windows PC around :man_shrugging:)

We executed the steps you shared way more than 10 times and no issue happened.

Some comments about what we could see (or not see) during the tests:

  1. Verdin AM62 always managed to figure out the disconnection from LightBlue app and bluetoothctl on Verdin AM62 printed the suitable messages indicating the disconnection
  2. The kernel never showed any WARNING message
  3. Verdin AM62 never failed to connect nor disconnect (@Rafael Beims’s phone sometimes failed to connect - nothing permanent -, but it is possible that it was not Verdin’s fault)
  4. After disconnecting on LightBlue and before issuing the advertise off in bluetoothctl , a strange device called verdin-am6 (yes, it is missing the 2 in AM62) appears in the list of devices in LightBlue and the app fails to connect to it. After advertise off it disappears. After advertise on, Verdin AM62 appears in the list of devices in LightBlue as expected with the name defined in name <advertised name>.

Related to 4.: We currently do not know what is this verdin-am6. For now we just recommend not using it to connect, as the connection simply fails.

Could you please try to use something other than BLEConnect/Windows to connect to the module? Maybe this BLEConnect/Windows is misbehaving and making the connection fail?

Best regards,

Hello @henrique.tx,

I was also able to reproduce the issue with an Android device using the nRF Connect application.

Our primary use case consists of multiple centrals, usually mobile devices, connecting to a single peripheral device simultaneously.

I created a sample Python code that does not stop advertising after a device gets connected/disconnected. It works for a short period of time; however, if enough attempts are made, the same scenario as described by @SmartStuff can be achieved.

To reproduce the issue, use two Android devices and connect/disconnect from the Toradex board multiple times in random order with random intervals. It’s worth noting that the Python code might not be immediately notified when a device is disconnected, but this shouldn’t prevent you from attempting to reconnect from the same device before the disconnection message is printed in the logs.

https://gist.github.com/giorgib/a74f15e591979036c154fab5520b22c2

Hi @baska !

Thanks for sharing!

We will perform more tests targeting your use case and let you know here as soon as we have more news.

Best regards,