Loading NXP M7 hello_world via RemoteProc crashes linux kernel on iMX8M-Plus / Yavia

I have a 0058 Verdin iMX8M-Plus. Here’s the tdx-info:

Software summary
------------------------------------------------------------
Bootloader:               U-Boot
Kernel version:           5.15.129-6.4.0+git.67c3153d20ff #1-TorizonCore SMP PREEMPT Wed Sep 27 12:30:36 UTC 2023
Kernel command line:      root=LABEL=otaroot rootfstype=ext4 quiet logo.nologo vt.global_cursor_default=0 plymouth.ignore-serial-consoles splash fbcon=map:3 ostree=/ostree/boot.1/torizon/ef4c7d153a661abaee3124825cb354ad214e6fb2fece45f99041e0b2e1be75e5/0
Distro name:              NAME="TorizonCore"
Distro version:           VERSION_ID=6.4.0-build.5
Hostname:                 verdin-imx8mp-15230201
------------------------------------------------------------

Hardware info
------------------------------------------------------------
HW model:                 Toradex Verdin iMX8M Plus WB on Yavia Board
Toradex version:          0058 V1.1B
Serial number:            15230201
Processor arch:           aarch64
------------------------------------------------------------

Here’s the boot log:

U-Boot SPL 2022.04-6.4.0+git.dc27426aa417 (Jan 01 1970 - 00:00:00 +0000)
DDRINFO: start DRAM init
DDRINFO: DRAM rate 4000MTS
DDRINFO:ddrphy calibration done
DDRINFO: ddrmix config done
DDR configured as dual rank
SEC0:  RNG instantiated
Normal Boot
WDT:   Started watchdog@30280000 with servicing (60s timeout)
Trying to boot from BOOTROM
Boot Stage: Primary boot
Find img info 0x&4802e000, size 888
Download 941056, Total size 941904
NOTICE:  BL31: v2.6(release):lf_v2.6-g3c1583ba0a
NOTICE:  BL31: Built : 00:00:00, Jan  1 1970


U-Boot 2022.04-6.4.0+git.dc27426aa417 (Jan 01 1970 - 00:00:00 +0000)

CPU:   i.MX8MP[8] rev1.1 1600 MHz (running at 1200 MHz)
CPU:   Industrial temperature grade (-40C to 105C) at 40C
Reset cause: POR
DRAM:  4 GiB
Core:  89 devices, 23 uclasses, devicetree: separate
WDT:   Started watchdog@30280000 with servicing (60s timeout)
MMC:   FSL_SDHC: 1, FSL_SDHC: 2
Loading Environment from MMC... OK
In:    serial
Out:   serial
Err:   serial
Model: Toradex 0058 Verdin iMX8M Plus Quad 4GB WB IT V1.1B
Serial#: 15230201
Carrier: Toradex Yavia V1.0A, Serial# 35128035
SEC0:  RNG instantiated

 BuildInfo:
  - ATF 3c1583b

flash target is MMC:2
Net:   eth1: ethernet@30be0000, eth0: ethernet@30bf0000 [PRIME]
Fastboot: Normal
Normal Boot
Hit any key to stop autoboot:  0 
switch to partitions #0, OK
mmc2(part 0) is current device
Scanning mmc 2:1...
Found U-Boot script /boot.scr
973 bytes read in 0 ms
## Executing script at 50280000
6672 bytes read in 1 ms (6.4 MiB/s)
89730 bytes read in 2 ms (42.8 MiB/s)
148 bytes read in 2 ms (72.3 KiB/s)
Applying Overlay: verdin-imx8mp_dsi-to-hdmi_overlay.dtbo
3037 bytes read in 2 ms (1.4 MiB/s)
Applying Overlay: verdin-imx8mp_spidev_overlay.dtbo
433 bytes read in 2 ms (210.9 KiB/s)
Applying Overlay: verdin-imx8mp_hmp_overlay.dtbo
2503 bytes read in 2 ms (1.2 MiB/s)
Applying Overlay: verdin-imx8mp_abc_overlay.dtbo
941 bytes read in 2 ms (459 KiB/s)
13426599 bytes read in 42 ms (304.9 MiB/s)
11574520 bytes read in 37 ms (298.3 MiB/s)
   Uncompressing Kernel Image
## Flattened Device Tree blob at 50200000
   Booting using the fdt blob at 0x50200000
   Loading Device Tree to 00000000ffac1000, end 00000000ffaf9fff ... OK

Starting kernel ...

[    1.057147] clk: failed to reparent hsio_axi to sys_pll2_500m: -16
[    1.072947] clk: failed to reparent hsio_axi to sys_pll2_500m: -16
[    1.083375] clk: failed to reparent hsio_axi to sys_pll2_500m: -16
[    2.363418] regulator-dummy: Underflow of regulator enable count
[    2.589536] [drm:drm_bridge_attach] *ERROR* failed to attach bridge /soc@0/bus@32c00000/mipi_dsi@32e60000 to encoder DSI-34: -517
[    2.601247] imx_sec_dsim_drv 32e60000.mipi_dsi: Failed to attach bridge: 32e60000.mipi_dsi
[    2.609522] imx_sec_dsim_drv 32e60000.mipi_dsi: failed to bind sec dsim bridge: -517
Starting version 250.5+

As you can see, I’m loading in the verdin-imx8mp_hmp_overlay for RemoteProc, and it does appear present in the running system.

The content of verdin-imx8mp_abc_overlay just removes some devices for use by the M7:

/dts-v1/;
/plugin/;

/ {
        compatible = "toradex,verdin-imx8mp";
};

// Removing UART1
&uart1 {
        status = "disabled";
};

// Removing UART2
&uart2 {
        status = "disabled";
};

&iomuxc {
        // Removing GPIO 1-4. I.e. they are normally in this list.
        pinctrl-0 = <&pinctrl_gpio7>, <&pinctrl_gpio8>,
                    <&pinctrl_gpio_hog2>, <&pinctrl_gpio_hog3>, <&pinctrl_gpio_hog4>,
                    <&pinctrl_hdmi_hog>;
};

I follow the documented instructions to download the M7 SDK and it gives me SDK_2.15.000_MIMX8ML8xxxKZ. I get the appropriate ARM cross-compiling gcc toolchain downloaded.

I compile the release version of hello_world found in the NXP SDK with zero modifications to any files. I copy the resulting .elf to the target and launch it:

root@verdin-imx8mp-15230201:/var/rootdirs/home/torizon# echo -n /var/rootdirs/home/torizon > /sys/module/firmware_class/parameters/path
root@verdin-imx8mp-15230201:/var/rootdirs/home/torizon# echo hello_world.elf > /sys/class/remoteproc/remoteproc0/firmware
root@verdin-imx8mp-15230201:/var/rootdirs/home/torizon# echo start > /sys/class/remoteproc/remoteproc0/state

At this point the module is locked up hard and responds to nothing. There is no panic on the console and within 20-30 seconds it appears a hardware watchdog hits and the board reboots.

I’ve also done dmesg --follow on the console while starting the M7 and it gets this far before locking up:

[ 1958.028581] remoteproc remotep

I can do this exact same procedure on an iMX8M-Mini and it all works as expected. Another engineer is experiencing the same thing on another iMX8M-Plus and having repeated the steps above in his own development environment.

We could use some help figuring out what we are doing wrong…

Ping @gustavo.tx and @alvaro.tx who have been very helpful with these sorts of issues for other folks!

Hi David, is being a while since I touched FreeRTOS but hopefully I can help you :slight_smile:

Sorry but I couldn’t see that in the log. Can you run and paste the log?

# dmesg | grep -E "remote|rproc"

I’m not 100% sure (I haven’t used FreeRTOS in a while) but are you using the same elf for the Mini and the Plus? I’m not entirely sure if that is possible, one has a M4 core and another has a M7 and there might be some other changes.

1 Like

I’m away from my computer at the moment but will add more logging from dmesg to tomorrow.

We definitely are not trying to run M4 binaries on the M7. I mean we are setting up the SDK and compiler toolchain for the M4 and setting up the SDK and compiler toolchain for the M7. Two different installations of the correct tools for the two different Cortex-M processors on the two different boards.

Remote Proc works on the Mini and kills the kernel on the Plus.

Hi @davidkhess
as far I saw on my project with iMX8M-Plus, it’s necessary that U-Boot loads a firmware in M7 core.
Otherwise remoteproc is not able to do it without crashing the core (soon or later).
If U_boot has loaded a firmware, remoteproc is able to stop it and reload another firmware (if needed).
I know that Toradex has been working over this, but unless there are news, this is the situation.

Here’s the full dmesg log of a boot up:

dmesg.log (34.5 KB)

And in particular:

torizon@verdin-imx8mp-15230201:~$ dmesg | grep -E "remote|rproc"
[    1.088239] remoteproc remoteproc0: imx-rproc is available
torizon@verdin-imx8mp-15230201:~$ 

Yes, there’s this ominous message at the top of the Plus HMP overlay:

/* Enable RPMSG and the RemoteProc M7 driver.
 * Note: This overlay is working only on nonwifi modules. For more information, please
 * check the Verdin iMX8MP Datasheet section 5.4 Wi-Fi and Bluetooth.
 */

It’s clear the overlay disables UART4 so Bluetooth is not available but otherwise, it’s not really clear what this means nor what to do about it on a WiFi board.

For your solution, was there any particular overlay or uboot magic involved? Or was it as simple as using the documented approach to starting the M7 from uboot and then booting Linux?

Thanks!

I also found this:

(If the link doesn’t quite work right, it’s a section on that page displaying overlay needed to run code on the M7.)

I have not applied the overlay settings in that file - and don’t see anything along those lines in the device-tree repo for the Plus. The HMP overlay does disable UART4 like this does but it doesn’t do any of the rest of what’s linked above.

Ok, so I follow the documentation for loading hello_world.bin into the M7 from uboot:

setenv load_cmd "ext4load mmc 2:1"
setenv m4image "/ostree/deploy/torizon/var/rootdirs/home/torizon/hello_world.bin"
setenv m4image_size 15000                       
setenv loadm4image "${load_cmd} ${loadaddr} ${m4image}"
setenv m4boot "${loadm4image}; cp.b ${loadaddr} 0x7e0000 ${m4image_size}; dcache flush; bootaux 0x7e0000"
run m4boot

That works and I see “hello world.” on UART4.

I then run boot in uboot to bring Linux up and it promptly panics in the remote processor device driver code:

Starting kernel ...

[    1.049119] clk: failed to reparent hsio_axi to sys_pll2_500m: -16
[    1.064748] clk: failed to reparent hsio_axi to sys_pll2_500m: -16
[    1.074382] clk: failed to reparent hsio_axi to sys_pll2_500m: -16
[    1.088419] Unable to handle kernel paging request at virtual address ffff80000a4ccfff
[    1.096410] Mem abort info:
[    1.099207]   ESR = 0x0000000096000007
[    1.102971]   EC = 0x25: DABT (current EL), IL = 32 bits
[    1.108284]   SET = 0, FnV = 0
[    1.111377]   EA = 0, S1PTW = 0
[    1.114520]   FSC = 0x07: level 3 translation fault
[    1.119398] Data abort info:
[    1.122286]   ISV = 0, ISS = 0x00000007
[    1.126172]   CM = 0, WnR = 0
[    1.129193] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000049c2d000
[    1.135914] [ffff80000a4ccfff] pgd=100000013ffff003, p4d=100000013ffff003, pud=100000013fffe003, pmd=100000010019d003, pte=0000000000000000
[    1.148504] Internal error: Oops: 96000007 [#1] PREEMPT SMP
[    1.154079] Modules linked in:
[    1.157136] CPU: 0 PID: 9 Comm: kworker/u8:0 Not tainted 5.15.129-6.4.0+git.67c3153d20ff #1-TorizonCore
[    1.166533] Hardware name: Toradex Verdin iMX8M Plus WB on Yavia Board (DT)
[    1.173495] Workqueue: events_unbound deferred_probe_work_func
[    1.179338] pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    1.186303] pc : rproc_handle_resources.constprop.0+0xb8/0x1a4
[    1.192146] lr : rproc_boot+0x45c/0x614
[    1.195987] sp : ffff80000a3e39e0
[    1.199303] x29: ffff80000a3e39e0 x28: ffff80000a4cd003 x27: 00000000ffffffff
[    1.206447] x26: ffff8000098cfe48 x25: 0000000000000000 x24: ffff0000c5693038
[    1.213591] x23: ffffffffffffffff x22: ffff80000a4cd000 x21: 00000000000003fd
[    1.220737] x20: ffff0000c5693000 x19: 0000000000000000 x18: 0000000000000001
[    1.227881] x17: 73203220746e656d x16: 67657320746b6d20 x15: 6c6261745f63441f
[    1.235025] x14: 0000000000000001 x13: 0000000000000000 x12: 0000000000000003
[    1.242171] x11: 0101010101010101 x10: 0000000000000037 x9 : 0000000000000000
[    1.249316] x8 : ffff0000c4bb5380 x7 : 0000000000000000 x6 : 000000000000003f
[    1.256462] x5 : 0000000000000040 x4 : ffff0000c5693400 x3 : 0000000000000000
[    1.263608] x2 : 0000000000000400 x1 : ffff80000a0f2560 x0 : ffff80000a4cd000
[    1.270755] Call trace:
[    1.273203]  rproc_handle_resources.constprop.0+0xb8/0x1a4
[    1.278693]  rproc_boot+0x45c/0x614
[    1.282189]  rproc_add+0xd0/0x170
[    1.285508]  imx_rproc_probe+0x510/0x710
[    1.289435]  platform_probe+0x68/0xe0
[    1.293100]  really_probe+0xbc/0x46c
[    1.296683]  __driver_probe_device+0x100/0x160
[    1.301132]  driver_probe_device+0x40/0x120
[    1.305319]  __device_attach_driver+0xbc/0x160
[    1.309766]  bus_for_each_drv+0x7c/0xdc
[    1.313607]  __device_attach+0xac/0x1f0
[    1.317446]  device_initial_probe+0x14/0x20
[    1.321633]  bus_probe_device+0x9c/0xa4
[    1.325479]  deferred_probe_work_func+0x94/0xe4
[    1.330020]  process_one_work+0x1d4/0x4a0
[    1.330548] imx6q-pcie 33800000.pcie: iATU unroll: enabled
[    1.334033]  worker_thread+0x2c0/0x490
[    1.339520] imx6q-pcie 33800000.pcie: Detected iATU regions: 4 outbound, 4 inbound
[    1.343267]  kthread+0x150/0x160
[    1.343274]  ret_from_fork+0x10/0x20
[    1.350873] imx6q-pcie 33800000.pcie: host bridge /soc@0/pcie@33800000 ranges:
[    1.354072] Code: aa1a03e2 f94037e0 aa1803e1 97d0da62 (b8776ac1) 
[    1.357668] imx6q-pcie 33800000.pcie:       IO 0x001ff80000..0x001ff8ffff -> 0x0000000000
[    1.364866] ---[ end trace 973d3c1ce191acef ]---
[    1.364870] Kernel panic - not syncing: Oops: Fatal exception
[    1.370990] imx6q-pcie 33800000.pcie:      MEM 0x0018000000..0x001fefffff -> 0x0018000000
[    1.379137] SMP: stopping secondary CPUs
[    1.383894] imx6q-pcie 33800000.pcie: iATU unroll: enabled
[    1.389800] Kernel Offset: disabled
[    1.389802] CPU features: 0x0,00002001,20000846
[    1.389806] Memory Limit: none
[    1.418455] Rebooting in 5 seconds..

Ok, have some success to report! Didn’t realize it but hello world is not a good example app to try if you enable the HMP overlay. It doesn’t have compiled into it things the Remote Proc driver is looking for which will cause it to panic when it doesn’t find it.

Figuring that out and thanks to the hint from @vix, I switched over to the rpmsg_lite_pingpong_rtos_linux_remote demo and built that instead. It still causes linux kernel to lockup if you attempt to load it into the M7 after linux is loaded as @vix indicated.

But if you load it in from uboot first, you can manage the M7 successfully via Remote Proc afterwards once Linux is booted and running.

Thank you @vix !

Hi @davidkhess ,

We’re currently addressing a known issue with loading Cortex-M7 firmware on iMX8MP from Linux using remoteproc. In addition to loading the firmware from U-boot using bootaux, as suggested by @vix , you can also resolve the issue by adding clk-imx8mp.mcore_booted=1 to your kernel command line parameters in U-boot. This parameter is specific to the imx8mp Linux clock driver and prevents Linux from disabling the root clock of the Cortex M7, which is causing the problem.

@vix, maybe this will help you too. We just figured this out recently.

Regards,
João Paulo S. Goncalves

@joao.tx - I can confirm that kernel command line parameter worked - no more lockups.

Thank you!

@davidkhess
do you mean that adding the kernel parameter is enough (i.e., no more necessity to having u-boot loading a firmware into M7)?

Yes. I added that kernel param to our image via the TorizonCore Builder Tool (which apparently automatically builds and compiles a device tree overlay setting the chosen node which sets additional kernel parameters) and Remote Proc works correctly from the start. Loading something from u-boot before Linux comes up not required.

Note, I’m on Torizon 6.4.0 so that might factor into it.