Colibri iMX7D 1GB issue with V1.1B - not booting

Hi,

I am working on a custom carrier board using following setup:

Colibri iMX7D 1 GB V1.1B (or V1.1A)
Kernel 5.4.193-5.7.3-devel+git.f5d73fd6e9f8
Based on Toradex BSP Layers and Reference Images for Yocto Project 5.7.3

I am observing sporadic problems booting the Linux kernel with different Colibri iMX7D V1.1B modules. It looks very much like the partition of the eMMC flash cannot be found (see logging down below).
To mimic the problem, I keep triggering new reboots using systemd (timer-service) (I call ‘sudo reboot’ after a while after booting).
The issue sometimes occurs after the 2nd reboot, but sometimes it may happen after e.g. 30th reboot.

Here is the log containing the kernel panic:

U-Boot 2020.07-5.7.3-devel+git.7683835c191e (Jun 14 2023 - 09:05:20 +0000)

CPU:   Freescale i.MX7D rev1.3 1000 MHz (running at 792 MHz)
CPU:   Extended Commercial temperature grade (-20C to 105C) at 39C
Reset cause: POR
DRAM:  1 GiB
PMIC:  RN5T567 LSIVER=0x01 OTPVER=0x0d
MMC:   FSL_SDHC: 1, FSL_SDHC: 0
Loading Environment from MMC... OK
In:    serial
Out:   serial
Err:   serial
Model: Toradex Colibri iMX7 Dual 1GB (eMMC) V1.1B, Serial# 07339347
SEC0: RNG instantiated
Net:   eth0: ethernet@30be0000
Hit any key to stop autoboot:  0
7799296 bytes read in 184 ms (40.4 MiB/s)
53577 bytes read in 17 ms (3 MiB/s)
Kernel image @ 0x80800000 [ 0x000000 - 0x770200 ]
## Flattened Device Tree blob at 82000000
   Booting using the fdt blob at 0x82000000
   Loading Device Tree to 8ffef000, end 8ffff148 ... OK

Starting kernel ...

[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 5.4.193-5.7.3-devel+git.f5d73fd6e9f8 (oe-user@oe-host) (gcc version 9.5.0 (GCC)) #1 SMP Fri Jun 24 10:15:32 UTC 2022
[    0.000000] CPU: ARMv7 Processor [410fc075] revision 5 (ARMv7), cr=10c5387d
[    0.000000] CPU: div instructions available: patching division code
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
[    0.000000] OF: fdt: Machine model: Toradex Colibri iMX7D 1GB (eMMC) on Getinge Connectivity Board
[    0.000000] Memory policy: Data cache writealloc
[    0.000000] efi: Getting EFI parameters from FDT:
[    0.000000] efi: UEFI not found.
[    0.000000] cma: Reserved 64 MiB at 0xbc000000
[    0.000000] psci: probing for conduit method from DT.
[    0.000000] psci: PSCIv1.0 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: Trusted OS migration not required
[    0.000000] psci: SMC Calling Convention v1.0
[    0.000000] percpu: Embedded 20 pages/cpu s51020 r8192 d22708 u81920
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 260416
[    0.000000] Kernel command line: root=PARTUUID=24e246a8-02 rw console=ttymxc0,115200n8
[    0.000000] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes, linear)
[    0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 956128K/1048576K available (10240K kernel code, 783K rwdata, 3600K rodata, 1024K init, 423K bss, 26912K reserved, 65536K cma-reserved, 196608K highmem)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu:     RCU event tracing is enabled.
[    0.000000] rcu:     RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=2.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
[    0.000000] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
[    0.000000] GIC: Using split EOI/Deactivate mode
[    0.000000] random: get_random_bytes called from start_kernel+0x314/0x4f0 with crng_init=0
[    0.000000] arch_timer: cp15 timer(s) running at 8.00MHz (phys).
[    0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x1d854df40, max_idle_ns: 440795202120 ns
[    0.000007] sched_clock: 56 bits at 8MHz, resolution 125ns, wraps every 2199023255500ns
[    0.000020] Switching to timer-based delay loop, resolution 125ns
[    0.000499] Switching to timer-based delay loop, resolution 41ns
[    0.000516] sched_clock: 32 bits at 24MHz, resolution 41ns, wraps every 89478484971ns
[    0.000531] clocksource: mxc_timer1: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 79635851949 ns
[    0.001929] Console: colour dummy device 80x30
[    0.001970] Calibrating delay loop (skipped), value calculated using timer frequency.. 48.00 BogoMIPS (lpj=240000)
[    0.001985] pid_max: default: 32768 minimum: 301
[    0.002185] Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
[    0.002203] Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
[    0.003237] CPU: Testing write buffer coherency: ok
[    0.004293] Setting up static identity map for 0x80100000 - 0x80100060
[    0.004440] rcu: Hierarchical SRCU implementation.
[    0.007768] EFI services will not be available.
[    0.008059] smp: Bringing up secondary CPUs ...
[    0.009087] smp: Brought up 1 node, 2 CPUs
[    0.009100] SMP: Total of 2 processors activated (96.00 BogoMIPS).
[    0.009108] CPU: All CPU(s) started in HYP mode.
[    0.009114] CPU: Virtualization extensions available.
[    0.009735] devtmpfs: initialized
[    0.020038] VFP support v0.3: implementor 41 architecture 2 part 30 variant 7 rev 5
[    0.020307] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.020331] futex hash table entries: 512 (order: 3, 32768 bytes, linear)
[    0.022911] pinctrl core: initialized pinctrl subsystem
[    0.024169] NET: Registered protocol family 16
[    0.030782] DMA: preallocated 256 KiB pool for atomic coherent allocations
[    0.032225] cpuidle: using governor ladder
[    0.032262] cpuidle: using governor menu
[    0.032871] No ATAGs?
[    0.032993] hw-breakpoint: found 5 (+1 reserved) breakpoint and 4 watchpoint registers.
[    0.033007] hw-breakpoint: maximum watchpoint size is 8 bytes.
[    0.033329] Serial: AMBA PL011 UART driver
[    0.038900] imx7d-pinctrl 302c0000.iomuxc-lpsr: initialized IMX pinctrl driver
[    0.039592] debugfs: Directory 'dummy-iomuxc-gpr@30340000' with parent 'regmap' already present!
[    0.040016] imx7d-pinctrl 30330000.iomuxc: initialized IMX pinctrl driver
[    0.041741] vdd1p0d: Bringing 0uV into 800000-800000uV
[    0.042422] vdd1p2: supplied by regulator-dummy
[    0.059348] cryptd: max_cpu_qlen set to 1000
[    0.077587] mxs-dma 33000000.dma-apbh: initialized
[    0.079053] vgaarb: loaded
[    0.079654] SCSI subsystem initialized
[    0.080265] usbcore: registered new interface driver usbfs
[    0.080329] usbcore: registered new interface driver hub
[    0.080382] usbcore: registered new device driver usb
[    0.080549] usb_phy_generic usbphynop1: usbphynop1 supply vcc not found, using dummy regulator
[    0.080787] usb_phy_generic usbphynop3: usbphynop3 supply vcc not found, using dummy regulator
[    0.081071] usb_phy_generic usbphynop2: usbphynop2 supply vcc not found, using dummy regulator
[    0.082190] i2c i2c-0: IMX I2C adapter registered
[    0.083095] i2c i2c-3: IMX I2C adapter registered
[    0.083346] pps_core: LinuxPPS API ver. 1 registered
[    0.083356] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[    0.083382] PTP clock support registered
[    0.084256] Bluetooth: Core ver 2.22
[    0.084317] NET: Registered protocol family 31
[    0.084325] Bluetooth: HCI device and connection manager initialized
[    0.084340] Bluetooth: HCI socket layer initialized
[    0.084352] Bluetooth: L2CAP socket layer initialized
[    0.084376] Bluetooth: SCO socket layer initialized
[    0.084971] clocksource: Switched to clocksource arch_sys_counter
[    0.655163] VFS: Disk quotas dquot_6.6.0
[    0.655267] VFS: Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
[    0.665550] thermal_sys: Registered thermal governor 'step_wise'
[    0.665936] NET: Registered protocol family 2
[    0.666156] IP idents hash table entries: 16384 (order: 5, 131072 bytes, linear)
[    0.667417] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 6144 bytes, linear)
[    0.667457] TCP established hash table entries: 8192 (order: 3, 32768 bytes, linear)
[    0.667545] TCP bind hash table entries: 8192 (order: 4, 65536 bytes, linear)
[    0.667690] TCP: Hash tables configured (established 8192 bind 8192)
[    0.667812] UDP hash table entries: 512 (order: 2, 16384 bytes, linear)
[    0.667869] UDP-Lite hash table entries: 512 (order: 2, 16384 bytes, linear)
[    0.668084] NET: Registered protocol family 1
[    0.668724] RPC: Registered named UNIX socket transport module.
[    0.668737] RPC: Registered udp transport module.
[    0.668743] RPC: Registered tcp transport module.
[    0.668750] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    0.668765] PCI: CLS 0 bytes, default 64
[    0.669513] hw perfevents: enabled with armv7_cortex_a7 PMU driver, 5 counters available
[    0.672404] Initialise system trusted keyrings
[    0.672596] workingset: timestamp_bits=14 max_order=18 bucket_order=4
[    0.680598] NFS: Registering the id_resolver key type
[    0.680634] Key type id_resolver registered
[    0.680643] Key type id_legacy registered
[    0.727166] NET: Registered protocol family 38
[    0.727187] Key type asymmetric registered
[    0.727196] Asymmetric key parser 'x509' registered
[    0.727272] bounce: pool size: 64 pages
[    0.727326] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 247)
[    0.727468] io scheduler mq-deadline registered
[    0.727479] io scheduler kyber registered
[    0.731875] debugfs: Directory 'dummy-src@30390000' with parent 'regmap' already present!
[    0.793202] 30860000.serial: ttymxc0 at MMIO 0x30860000 (irq = 44, base_baud = 1500000) is a IMX
[    1.537933] printk: console [ttymxc0] enabled
[    1.558758] brd: module loaded
[    1.571442] loop: module loaded
[    1.594168] pps pps0: new PPS source ptp0
[    1.601891] fec 30be0000.ethernet eth0: registered PHC device 0
[    1.608084] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    1.614640] ehci-pci: EHCI PCI platform driver
[    1.619209] ehci-mxc: Freescale On-Chip EHCI Host driver
[    1.625310] usbcore: registered new interface driver usb-storage
[    1.631415] usbcore: registered new interface driver usbserial_generic
[    1.637997] usbserial: USB Serial support registered for generic
[    1.645596] imx_usb 30b10000.usb: No over current polarity defined
[    1.656200] imx_usb 30b20000.usb: No over current polarity defined
[    1.662447] imx_usb 30b20000.usb: 30b20000.usb supply vbus not found, using dummy regulator
[    1.673731] ci_hdrc ci_hdrc.1: EHCI Host Controller
[    1.678678] ci_hdrc ci_hdrc.1: new USB bus registered, assigned bus number 1
[    1.714989] ci_hdrc ci_hdrc.1: USB 2.0 started, EHCI 1.00
[    1.721327] hub 1-0:1.0: USB hub found
[    1.725209] hub 1-0:1.0: 1 port detected
[    1.738721] rtc-pcf85063 3-0051: registered as rtc0
[    1.744428] snvs_rtc 30370000.snvs:snvs-rtc-lp: registered as rtc1
[    1.750822] i2c /dev entries driver
[    1.757082] imx2-wdt 30280000.wdog: timeout 60 sec (nowayout=0)
[    1.763455] /cpus/cpu@0: unsupported enable-method property: psci
[    1.770140] sdhci: Secure Digital Host Controller Interface driver
[    1.776344] sdhci: Copyright(c) Pierre Ossman
[    1.780761] Synopsys Designware Multimedia Card Interface Driver
[    1.787101] sdhci-pltfm: SDHCI platform and OF driver helper
[    1.827501] mmc0: SDHCI controller on 30b40000.usdhc [30b40000.usdhc] using ADMA
[    1.865021] mmc1: SDHCI controller on 30b60000.usdhc [30b60000.usdhc] using ADMA
[    1.877594] caam 30900000.caam: device ID = 0x0a16030000000000 (Era 8)
[    1.884178] caam 30900000.caam: job rings = 3, qi = 0
[    1.890973] caam_jr 30901000.jr0: failed to flush job ring 0
[    1.896835] caam_jr: probe of 30901000.jr0 failed with error -5
[    1.902992] caam_jr 30902000.jr1: failed to flush job ring 1
[    1.908865] caam_jr: probe of 30902000.jr1 failed with error -5
[    1.915021] caam_jr 30903000.jr1: failed to flush job ring 2
[    1.925124] caam_jr: probe of 30903000.jr1 failed with error -5
[    1.926407] random: fast init done
[    1.932951] NET: Registered protocol family 17
[    1.939033] Key type dns_resolver registered
[    1.941320] mmc0: new high speed SDIO card at address 0001
[    1.943728] imx-cpufreq-dt imx-cpufreq-dt: cpu speed grade 2 mkt segment 1 supported-hw 0x4 0x2
[    1.959204] Registering SWP/SWPB emulation handler
[    1.965656] registered taskstats version 1
[    1.969756] Loading compiled-in X.509 certificates
[    2.012092] vdd1p0d: supplied by DCDC3
[    2.017409] imx_thermal tempmon: Extended Commercial CPU temperature grade - max:105C critical:105C passive:95C
[    2.028936] input: gpio-keys as /devices/platform/gpio-keys/input/input0
[    2.046190] rtc-pcf85063 3-0051: setting system clock to 2023-10-10T11:27:58 UTC (1696937278)
[    2.069872] VFS: Cannot open root device "PARTUUID=24e246a8-02" or unknown-block(0,0): error -6
[    2.072715] mmc1: new HS200 MMC card at address 0001
[    2.078626] Please append a correct "root=" boot option; here are the available partitions:
[    2.084501] mmcblk1: mmc1:0001 S40004 3.64 GiB
[    2.091991] 0100           65536 ram0
[    2.091994]  (driver?)
[    2.096907] mmcblk1boot0: mmc1:0001 S40004 partition 1 4.00 MiB
[    2.100271] 0101           65536 ram1
[    2.100273]  (driver?)
[    2.102955] mmcblk1boot1: mmc1:0001 S40004 partition 2 4.00 MiB
[    2.108576] 0102           65536 ram2
[    2.108579]  (driver?)
[    2.120889] mmcblk1rpmb: mmc1:0001 S40004 partition 3 4.00 MiB, chardev (246:0)
[    2.124381] 0103           65536 ram3
[    2.124383]  (driver?)
[    2.131206]  mmcblk1: p1 p2 p3
[    2.134066] 0104           65536 ram4
[    2.134069]  (driver?)
[    2.149411] 0105           65536 ram5
[    2.149417]  (driver?)
[    2.155546] 0106           65536 ram6
[    2.155548]  (driver?)
[    2.161661] 0107           65536 ram7
[    2.161663]  (driver?)
[    2.167804] 0108           65536 ram8
[    2.167808]  (driver?)
[    2.173921] 0109           65536 ram9
[    2.173924]  (driver?)
[    2.180061] 010a           65536 ram10
[    2.180064]  (driver?)
[    2.186281] 010b           65536 ram11
[    2.186284]  (driver?)
[    2.192483] 010c           65536 ram12
[    2.192485]  (driver?)
[    2.198703] 010d           65536 ram13
[    2.198705]  (driver?)
[    2.204904] 010e           65536 ram14
[    2.204907]  (driver?)
[    2.211124] 010f           65536 ram15
[    2.211127]  (driver?)
[    2.217347] b300         3817472 mmcblk1
[    2.217352]  driver: mmcblk
[    2.224161]   b301         1048576 mmcblk1p1 24e246a8-01
[    2.224164]
[    2.230989]   b302         1048576 mmcblk1p2 24e246a8-02
[    2.230991]
[    2.237814]   b303         1048576 mmcblk1p3 24e246a8-03
[    2.237817]
[    2.244625] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
[    2.252916] CPU0: stopping
[    2.255634] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.193-5.7.3-devel+git.f5d73fd6e9f8 #1
[    2.264157] Hardware name: Freescale i.MX7 Dual (Device Tree)
[    2.269961] [<c0112dc8>] (unwind_backtrace) from [<c010d2fc>] (show_stack+0x10/0x14)
[    2.277721] [<c010d2fc>] (show_stack) from [<c0a45098>] (dump_stack+0x90/0xa4)
[    2.284954] [<c0a45098>] (dump_stack) from [<c01110e4>] (handle_IPI+0x338/0x370)
[    2.292367] [<c01110e4>] (handle_IPI) from [<c0548b78>] (gic_handle_irq+0x8c/0x90)
[    2.299949] [<c0548b78>] (gic_handle_irq) from [<c0101b0c>] (__irq_svc+0x6c/0x90)
[    2.307431] Exception stack(0xc1001ed8 to 0xc1001f20)
[    2.312485] 1ec0:                                                       00000000 ef6c5200
[    2.320668] 1ee0: 0000689c c1003d80 ec7f0400 ffffffff 86486aeb c10a597c ef6c0eb8 00000000
[    2.328849] 1f00: 00000000 8648a653 00000000 c1001f28 c01bbb28 c0812334 60000013 ffffffff
[    2.337037] [<c0101b0c>] (__irq_svc) from [<c0812334>] (cpuidle_enter_state+0x16c/0x52c)
[    2.345134] [<c0812334>] (cpuidle_enter_state) from [<c0812730>] (cpuidle_enter+0x28/0x38)
[    2.353407] [<c0812730>] (cpuidle_enter) from [<c0160ae4>] (do_idle+0x1f4/0x2a0)
[    2.360810] [<c0160ae4>] (do_idle) from [<c0160e54>] (cpu_startup_entry+0x18/0x1c)
[    2.368389] [<c0160e54>] (cpu_startup_entry) from [<c0f010b0>] (start_kernel+0x4ac/0x4f0)
[    2.376573] [<c0f010b0>] (start_kernel) from [<00000000>] (0x0)
[    2.382505] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]---

However, if I use different Colibri iMX7D V1.1A modules, then I don’t observe these issues even during longer reboot tests.

Do you have any idea what the problem could be?

Thanks for your support.

Cheers,
Rob

How did you flash your custom image? Have you used the Toradex Easy Installer? If so, could you please share your custom image JSON file? If not, how did you perform eMMC partitioning?

If you flashed Toradex-provided OS builds, have you observed any issues on Colibri iMX7D V1.1B?

Sometimes, the hardware or the driver for the storage device (e.g., MMC) takes a bit longer to initialize and be ready. To account for this, you can add a delay before the kernel tries to mount the root filesystem. Add rootdelay=5 (for a 5 second delay) to your kernel command line and see if that helps.

Thanks for your answers.

I flashed the custom image using Toradex Easy Installer. Here is the content of the JSON file. We are using 3 partitions (one data and two rootfs partitions for the A/B software update).

{
    "config_format": "2",
    "autoinstall": false,
    "name": "Board Core Image (UPSTREAM)",
    "description": "Image without graphical interface that boots the Board",
    "version": "5.7.3-devel-20231009120448+build.0",
    "release_date": "2023-10-09",
    "u_boot_env": "u-boot-initial-env",
    "prepare_script": "prepare.sh",
    "wrapup_script": "wrapup.sh",
    "marketing": "marketing.tar",
    "icon": "toradexlinux.png",
    "supported_product_ids": [
        "0039"
    ],
    "blockdevs": [
        {
            "name": "mmcblk0",
            "partitions": [
                {
                    "partition_size_nominal": 1024,
                    "want_maximised": false,
                    "content": {
                        "filesystem_type": "ext4",
                        "label": "DATA"
                    }
                },
                {
                    "partition_size_nominal": 1024,
                    "want_maximised": false,
                    "content": {
                        "label": "RFS",
                        "filesystem_type": "ext4",
                        "mkfs_options": "-E nodiscard",
                        "filename": "image-embsys-connectivity-node.tar.xz",
                        "uncompressed_size": 373.95703125
                    }
                },
                {
                    "partition_size_nominal": 1024,
                    "want_maximised": false,
                    "content": {
                        "filesystem_type": "ext4",
                        "label": "rootfs2"
                    }
                }
            ]
        },
        {
            "name": "mmcblk0boot0",
            "erase": true,
            "content": {
                "filesystem_type": "raw",
                "rawfiles": [
                    {
                        "filename": "u-boot.imx",
                        "dd_options": "seek=2"
                    }
                ]
            }
        }
    ]
}

I will check this and give you information.

I implemented the suggestion of rootdelay=5 and started an endurance test overnight (where the system constantly reboots). With this change there was no kernel panic! :+1:

Now I ask myself if we should/can consider this as a fix? I mean, are the 5 seconds arbitrarily chosen or is there corresponding data (possibly from data sheets etc.)?

I’ve suggested a 5-second delay to ensure the eMMC driver is fully loaded. Based on the boot log you provided, this duration could potentially be reduced to 3 seconds. However, you should validate this by performing a series of boots to ensure the chosen delay consistently allows for a reliable boot each time