Hello
I am attempting to get kernel crash dumps working on the Colibri iMX7. I’m using Toradex BSP 2.7, with some slight modifications to the kernel and rootfs configuration. I’ve been unable to get crash dumps working, and I’m hoping someone can point me to what I’m doing wrong.
The mechanism for kernel crash dumps is that a second “dump kernel” will start up when the first kernel crashes and collect state information about the running kernel. However I’m seeing the dump kernel hang as soon as it starts. From the digging I’ve done into the issue, the hang in the dump kernel seems to happen as it jumps to the start_kernel routine in init/main.c. The new kernel executes the assembly leading up to the jump to __enter_kernel in arch/arm/boot/compressed/head.S, but after it jumps to the entry point I see no response from it. I see the same behavior when I attempt to induce a crash and have a crash kernel take over (kexec -p), and when I load the dump kernel using ‘kexec -l’ and execute it without a crash using ‘kexec -e’.
Is there any documentation on getting kernel crash dumps working on Colibri IMX7, or is anyone able to diagnose what I’m doing incorrectly? There seems to be pretty sparse information on this feature on the internet, and what information is there is often conflicts.
Here is what I am doing to reproduce.
-
Sync the yocto project system used to build the BSP: “repo init -u Index of /toradex-bsp-platform.git -b LinuxImageV2.7”, and “repo sync”.
-
Make the following modifications to the recipe files:
2.1) Add the following to layers/meta-toradex-nxp/recipes-kernel/linux/linux-toradex-4.1-2.0.x/defconfig. I also modified linux-toradex-4.1-2.0x./mx7/defconfig and a bunch of other kernels as I was unsure of the exact version that would be picked up. Confirm these features are enabled in your build by looking at build/tmp-glibc/work-shared/colibri-imx7/kernel-build-artifacts/.config and observing that these features are enabled.
CONFIG_KEXEC=y
CONFIG_SYSFS=y
CONFIG_DEBUG_INFO=y
CONFIG_CRASH_DUMP=y
CONFIG_PROC_VMCORE=y
CONFIG_DEBUG_LL=y
CONFIG_EARLY_PRINTK=y
CONFIG_MAGIC_SYSRQ=y
2.2) Add the corresonding userspace utilities to your rootfs by modifying layers/openembedded-core/meta/recipes-core/images/core-image-minimal.bb and adding the following:
IMAGE_INSTALL_append = " kexec-tools makedumpfile"
2.3) Change your machine to colibri-imx7 in local.conf
-
bitbake core-image-minimal linux-toradex u-boot-toradex
-
Add the resulting zImage and zImage-imx7d-colibri-aster.dtb to your root filesystem at the base location. These are the binaries that will be used as the dump kernel - they are identical copies to the versions in nand that are used to boot the system. Do this by mounting core-image-minimal-colibri-imx7.ubifs, copying zImage and the dtb to the root location in that mount point, and save the result as a new ubifs filesystem.
Side note, I am doing it this way because the kernel and dtb are being stored as raw binaries spread out across a UBI backing in our system - they are not on a UBIFS that can be mounted at runtime, hence not accessible to kexec command. I believe providing copies of the binaries in this way should work.
-
Flash zImage, u-boot-toradex.imx and the the new filesystem you created in step 4 to the device.
-
Boot into u-boot. If you want to reproduce using the method of inducing a kernel panic, you will have to reserve a section of memory for the dump kernel. Do this by stopping bootup by pressing enter and add the following argument to the ‘bootargs’ uboot env var:
crashkernel=128M@2058M
This will reserve a 128M block of memory in a place that doesn’t overlap with the reserved kernel memory (past the 0x80000000 start of DDR). You can verify it worked from the running system by issuing ‘cat /proc/iomem’ and noting the reserved block of memory for crash kernel.
This isn’t necessary if you want to reproduce using ‘kexec -l’ followed by ‘kexec -e’. Though I’m not totally certain the hangs are caused by the same problem.
-
You are now in linux. You’ll see a bunch of init go by. Get to the point where you are looking at a root prompt. And issue the following:
export DUMPK_CMDLINE=“1 console=tty1 console=ttymxc0,115200n8 consoleblank=0 root=ubi0:rootfs rootfstype=ubifs rootwait init=/sbin/init maxcpus=1 reset_devices”
kexec --type zImage -l /zImage --dtb=/zImage-imx7d-colibri-aster.dtb --append=${DUMPK_CMDLINE}
kexec -e
[ 120.679519] kexec: Starting new kernel
[ 120.685639] Disabling non-boot CPUs …
[ 120.721259] CPU1: shutdown
[ 120.751820] Bye!
Uncompressing Linux… done, booting the kernel.
This is the last thing I see. I expect to see the new kernel doing its early init, but it appears to hang instead. If you wish to reproduce using kernel panic method, ensure you have the ‘crashkernel’ argument to your kernel as described above, and issue the following:
# export DUMPK_CMDLINE="1 console=tty1 console=ttymxc0,115200n8 consoleblank=0 root=ubi0:rootfs rootfstype=ubifs rootwait init=/sbin/init maxcpus=1 reset_devices"
# kexec --type zImage -p /zImage --dtb=/zImage-imx7d-colibri-aster.dtb --append=${DUMPK_CMDLINE}
# echo c > /proc/sysrq-trigger [ 87.302114] sysrq: SysRq :
Trigger a crash
[ 87.310430] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 87.323020] pgd = 94840000
[ 87.327862] [00000000] *pgd=94885831, *pte=00000000, *ppte=00000000
[ 87.336382] Internal error: Oops: 817 [#1] SMP ARM
[ 87.343282] Modules linked in:
[ 87.348419] CPU: 0 PID: 233 Comm: sh Not tainted 4.1.44-2.7.6+g18717e2b1ca9 #1
[ 87.359822] Hardware name: Freescale i.MX7 Dual (Device Tree)
[ 87.367712] task: 94310a80 ti: 9487e000 task.ti: 9487e000
[ 87.375251] PC is at sysrq_handle_crash+0x48/0x50
[ 87.382072] LR is at __handle_sysrq+0x120/0x174
[ 87.388661] pc : [<80346b7c>] lr : [<803473d8>] psr: 60080013
[ 87.388661] sp : 9487feb0 ip : 00000000 fp : 7eeca668
[ 87.404273] r10: 00000000 r9 : 00000002 r8 : 00000000
[ 87.411540] r7 : 808dbfcc r6 : 00000007 r5 : 00000063 r4 : 808c5ba8
[ 87.420079] r3 : 00000000 r2 : 00000001 r1 : 97b8031c r0 : 00000063
[ 87.428585] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 87.437722] Control: 10c5387d Table: 9484006a DAC: 00000015
[ 87.445434] Process sh (pid: 233, stack limit = 0x9487e210)
[ 87.452975] Stack: (0x9487feb0 to 0x94880000)
[ 87.459270] fea0: 00000002 00000000 00000000 94327b80
[ 87.471280] fec0: 00b37d88 00000002 00000000 80347818 803477e0 8013f8d0 9474ba80 8013f874
[ 87.483317] fee0: 9487ff80 00000002 00b37d88 800ea4b0 9400ac00 00000001 808bb880 9400ad88
[ 87.495413] ff00: 00000001 00000000 00000000 800ec9ac 00000020 00000301 00000020 000000bc
[ 87.507649] ff20: 00000000 00000000 94310a78 9487ff60 94310a80 9474ba80 00b37d88 9474ba80
[ 87.520016] ff40: 00b37d88 9487ff80 00000002 00b37d88 00000002 800ead28 00000000 800300dc
[ 87.532421] ff60: 00000003 9474ba80 9474ba80 00000000 00000000 00b37d88 00000002 800eb5e4
[ 87.544945] ff80: 00000000 00000000 00000200 00070878 00000002 00b37d88 00000004 8000f4c4
[ 87.557637] ffa0: 9487e000 8000f340 00070878 00000002 00000001 00b37d88 00000002 00070878
[ 87.570477] ffc0: 00070878 00000002 00b37d88 00000004 00000020 00b37d88 00000000 7eeca668
[ 87.583527] ffe0: 00000000 7eeca434 0000dcb1 76e997f0 40080010 00000001 97fbe821 97fbec21
[ 87.596668] [<80346b7c>] (sysrq_handle_crash) from [<00000000>] ( (null))
[ 87.606105] Code: e5c32000 e8bd8010 e3a03000 e3a02001 (e5c32000)
[ 87.614787] CPU 1 will stop doing anything useful since another CPU has crashed
[ 87.627964] Loading crashdump kernel...
[ 87.634395] Bye!
Uncompressing Linux... done, booting the kernel.
Again, that is the last thing I see.
Can anyone offer any pointers as to how to correctly configure your system for crash dump?
Thanks in advance,
Conor