We are using the Remote Processor kernel support and RPMsg for integration and management from the Linux side to the RTOS running on the M4 coprocessor. For the most part, it works well.
Occasionally we get a kernel panic when using Remote Processor to “stop” the coprocessor:
[25016.237134] Unable to handle kernel paging request at virtual address ffff800015b3a002
[25016.245244] Mem abort info:
[25016.248053] ESR = 0x0000000096000007
[25016.251824] EC = 0x25: DABT (current EL), IL = 32 bits
[25016.257140] SET = 0, FnV = 0
[25016.260216] EA = 0, S1PTW = 0
[25016.263363] FSC = 0x07: level 3 translation fault
[25016.268242] Data abort info:
[25016.271147] ISV = 0, ISS = 0x00000007
[25016.274991] CM = 0, WnR = 0
[25016.277960] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000049c2d000
[25016.284680] [ffff800015b3a002] pgd=10000000bffff003, p4d=10000000bffff003, pud=10000000bfffe003, pmd=1000000075692003, pte=0000000000000000
[25016.297273] Internal error: Oops: 96000007 [#1] PREEMPT SMP
[25016.302859] Modules linked in: rpmsg_ctrl rpmsg_char imx_rpmsg_tty xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_addrtype iptable_filter ip_tables x_tables br_netfilter bridge stp llc mwifiex_sdio mwifiex bnep overlay cfg80211 mcp251xfd can_dev cm
[25016.356332] CPU: 1 PID: 95780 Comm: python Tainted: G O 5.15.129-6.4.0+git.67c3153d20ff #1-TorizonCore
[25016.366955] Hardware name: Toradex Verdin iMX8M Mini WB on Yavia Board (DT)
[25016.373924] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[25016.380891] pc : virtqueue_get_buf_ctx_split+0x28/0x180
[25016.386132] lr : virtqueue_get_buf+0x30/0x40
[25016.390411] sp : ffff800015db3a80
[25016.393727] x29: ffff800015db3a80 x28: ffff80000a7022a0 x27: 0000000000000007
[25016.400870] x26: ffff0000077dec00 x25: ffff00000e76c0c0 x24: ffff00000709bf00
[25016.408015] x23: 0000000000000007 x22: 0000000000000100 x21: ffff0000014e1f40
[25016.415162] x20: ffff0000014e1f00 x19: ffff000006c3cd00 x18: 0000000000000000
[25016.422306] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffa5db3fb0
[25016.429452] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[25016.436596] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff800015db3eb0
[25016.443742] x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff0000075c6e40
[25016.450888] x5 : 0000000000000001 x4 : ffff800015db3ae0 x3 : ffff0000014e1f40
[25016.458033] x2 : 0000000000000000 x1 : 00000000000002cf x0 : ffff800015b3a000
[25016.465182] Call trace:
[25016.467631] virtqueue_get_buf_ctx_split+0x28/0x180
[25016.472515] virtqueue_get_buf+0x30/0x40
[25016.476441] rpmsg_send_offchannel_raw+0x44c/0x4f0
[25016.481240] virtio_rpmsg_send+0x28/0x34
[25016.485167] rpmsg_send+0x20/0x40
[25016.488488] rpmsgtty_write+0x54/0xb0 [imx_rpmsg_tty]
[25016.493551] n_tty_write+0x2c0/0x48c
[25016.497134] file_tty_write.constprop.0+0x130/0x294
[25016.502016] tty_write+0x14/0x20
[25016.505248] new_sync_write+0xec/0x18c
[25016.509004] vfs_write+0x24c/0x2b0
[25016.512409] ksys_write+0x6c/0x100
[25016.515817] __arm64_sys_write+0x1c/0x30
[25016.519744] invoke_syscall+0x48/0x114
[25016.523499] el0_svc_common.constprop.0+0xd4/0xfc
[25016.528209] do_el0_svc+0x28/0xa0
[25016.531526] el0_svc+0x28/0x80
[25016.534589] el0t_64_sync_handler+0xa4/0x130
[25016.538863] el0t_64_sync+0x1a0/0x1a4
[25016.542533] Code: 35000700 f9403660 aa0103e4 79409261 (79400400)
[25016.548634] ---[ end trace bc845368ab15e73f ]---
[25016.553257] Kernel panic - not syncing: Oops: Fatal exception
[25016.559009] SMP: stopping secondary CPUs
[25016.563249] Kernel Offset: disabled
[25016.566739] CPU features: 0x0,00002001,20000846
[25016.571276] Memory Limit: none
[25016.574336] Rebooting in 5 seconds..
To me, this looks like a write on the tty occurring after the coprocessor shutdown unmapped the memory region being used to communicate with it.
Seems the immediate answer is “don’t do that”. I.e. we should shutdown our communication and close the tty before attempting to shutdown the processor.
However, even so, it would be good if the Remote Processor code handled that correctly instead of panicing.
Can anybody confirm my suspicions and if so, perhaps suggest a way to get a patch to fix this?