kexec freezes

Hi, this is a followup question, but since it doesnt really fit the last topic, I started a new question.

I am trying to restart the Linux kernel without interrupting the safety critical app on the M4. Watchdogs and Resets on the iMX7 reset the whole SoC so now I am trying to reboot via kexec as suggested in SW reset via SRC_A7RCR0 - Toradex Community

But the reboot will hang while booting the new kernal after some odd seconds, if the M4 is not running, and won’t even start to boot after shutdown, if there is an App running in the M4.

To repoduce the issue I will describe what I have done.

I used the content of /proc/cmdline for *--append*. So Kernel command line will be no different to normal boot.

root@colibri-imx7-emmc:~# kexec 
--load /media/mmcblk0p1/zImage 
--type=zImage
--dtb=/media/mmcblk0p1/imx7d-colibri-emmc-eval-v3.dtb 
--append='ip=off root=/dev/mmcblk0p2 ro 
   rootfstype=ext4 rootwait console=tty1 console=ttymxc0,115200n8 consoleblank=0 
   video=mxsfb:640x480M-16@60'    

Followed by kexec -e, which shuts down the current kernel.

After the shutdown things started to differ, yielding two different results, depending on whether we started the Auxiliary M4 Core or not:

**with M4 running:** freeze after successful shutdown
kexec_core: Starting new kernel
Disabling non-boot CPUs ...
CPU1: shutdown
Bye!

without M4 running: freeze immediately at different outputs at around 1.9 seconds

[    1.921539] nf_tables: (c) 2007-2009 Patrick McHardy <kaber@trash.net>
[    1.921792] ip_tables: (C) 2000-2006 Netfilter Core Team
[    1.922322] NET: Registered protocol family 10
[    1.923240] ip6_tables: (C) 2000-2006 Netfilter Core Team

NOTE: it doesn’t freeze at the same output everytime, but always at around 1.92 seconds

One warning occurred:

[    0.103000] ------------[ cut here ]------------
[    0.103037] WARNING: CPU: 0 PID: 1 at /kernel-source//arch/arm/mm/ioremap.c:303 
__arm_ioremap_pfn_caller+0xd8/0x1b4
[    0.103070] Modules linked in:
[    0.103097] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.166-2.8.6+g011d796 #6
[    0.103127] Hardware name: Freescale i.MX7 Dual (Device Tree)
[    0.103167] [<8010e6ac>] (unwind_backtrace) from [<8010ade0>] (show_stack+0x10/0x14)
[    0.103207] [<8010ade0>] (show_stack) from [<8041094c>] (dump_stack+0x88/0x9c)
[    0.103247] [<8041094c>] (dump_stack) from [<801240b0>] (__warn+0xe8/0x100)
[    0.103276] [<801240b0>] (__warn) from [<80124178>] (warn_slowpath_null+0x20/0x28)
[    0.103314] [<80124178>] (warn_slowpath_null) from [<801148dc>] 
(__arm_ioremap_pfn_caller+0xd8/0x1b4)
[    0.103353] [<801148dc>] (__arm_ioremap_pfn_caller) from [<80114a04>] 
(__arm_ioremap_caller+0x4c/0x54)
[    0.103394] [<80114a04>] (__arm_ioremap_caller) from [<80610424>](imx_rpmsg_find_vqs+0xc0/0x218)
[    0.103434] [<80610424>] (imx_rpmsg_find_vqs) from [<8060fe90>] (rpmsg_probe+0xb4/0x420)
[    0.103473] [<8060fe90>] (rpmsg_probe) from [<8047e274>] (virtio_dev_probe+0x204/0x2e4)
[    0.103514] [<8047e274>] (virtio_dev_probe) from [<804b49ec>] (driver_probe_device+0x1e8/0x2b4)
[    0.103553] [<804b49ec>] (driver_probe_device) from [<804b2fc4>] (bus_for_each_drv+0x44/0x94)
[    0.103592] [<804b2fc4>] (bus_for_each_drv) from [<804b4724>] (__device_attach+0xb0/0x114)
[    0.103630] [<804b4724>] (__device_attach) from [<804b3dbc>] (bus_probe_device+0x84/0x8c)
[    0.103667] [<804b3dbc>] (bus_probe_device) from [<804b20ec>] (device_add+0x3ac/0x59c)
[    0.103705] [<804b20ec>] (device_add) from [<8047def0>] (register_virtio_device+0xdc/0xf8)
[    0.103743] [<8047def0>] (register_virtio_device) from [<80610a40>] (imx_rpmsg_probe+0x368/0x4b0)
[    0.103783] [<80610a40>] (imx_rpmsg_probe) from [<804b62e0>] (platform_drv_probe+0x50/0xac)
[    0.103823] [<804b62e0>] (platform_drv_probe) from [<804b49ec>] (driver_probe_device+0x1e8/0x2b4)
[    0.103861] [<804b49ec>] (driver_probe_device) from [<804b4b5c>] (__driver_attach+0xa4/0xa8)
[    0.103899] [<804b4b5c>] (__driver_attach) from [<804b2f1c>] (bus_for_each_dev+0x4c/0x9c)
[    0.103937] [<804b2f1c>] (bus_for_each_dev) from [<804b405c>] (bus_add_driver+0x188/0x20c)
[    0.103975] [<804b405c>] (bus_add_driver) from [<804b5408>] (driver_register+0x78/0xf4)
[    0.104014] [<804b5408>] (driver_register) from [<80b2c70c>] (imx_rpmsg_init+0x14/0x34)
[    0.104052] [<80b2c70c>] (imx_rpmsg_init) from [<80101790>] (do_one_initcall+0x44/0x170)
[    0.104092] [<80101790>] (do_one_initcall) from [<80b00d90>] (kernel_init_freeable+0x15c/0x1e8)
[    0.104133] [<80b00d90>] (kernel_init_freeable) from [<807a7af8>] (kernel_init+0x8/0x110)
[    0.104172] [<807a7af8>] (kernel_init) from [<80107650>] (ret_from_fork+0x14/0x24)
[    0.104212] ---[ end trace ea35a24a4a549cbe ]---

Any suggestions what might cause the different behaviours and how one might avoid the freezes?

Disabling CONFIG_MXC_PXP_V3 solved both issues!

Thank you verry much!

Perfect, I’m glad we already found it!

If you need that Pixel Pipeline stufff from NXP you can solve the bug in the driver with the following patch:
git.toradex.com/cgit/linux-toradex.git/commit/?h=toradex_4.14-2.0.x-imx-next&id=f7f6322469c630eecc72151a256771b09a1e989c

Best regards, Philippe

Hi @tik

I fixed an issue that has to do with that not long ago. See this link:

Could you please try first of all if disabling CONFIG_MXC_PXP_V3 already solves your issue?

Some thoughts of my side on your questions:

I used the content of /proc/cmdline for --append. So Kernel command line will be no different to normal boot.
You have an eMMC module, no worries that will work. But be aware that with a NAND module U-Boot passes the NAND-Structure to the kernel on booting. If using kexec with a nand you have to add the NAND structure by yourself with the kernel parameter mtdparts.

with M4 running: freeze after successful shutdown

U-Boot carves out some memory for RPMSG to work. I guess that could be the reason as on a boot with kexec the kernel is not aware that there is another guy writing in a certain memory region. See this commit

without M4 running: freeze immediately at different outputs at around 1.9 seconds

That looks familiar to me. I guess that could be a hint that it’s the CONFIG_MXC_PXP_V3 bug.

Best regards,
Philippe