iMX8MP u-boot reset loop

[ Related to IMX8MP Reboot Infinitly - #11 by SvenAlmgren ]

Hi! We also have this issue on two iMX8MP Q 4GB WB IT 1.1A modules.

We run our custom Linux with custom u-boot on a custom carrier, but we’ve also verified this on a Dahlia board running the latest 6.5.0+build.9 (2024-01-08) Toradex Embedde Linux Reference Minimal Image. Also tried with 6.5.0 upstream, and had the same issue during 6.4.0.

5.7.5+build.27 seems to work (I have reset the board over 100 times without the issue manifesting), but looking at the output CAAM seems not to be enabled on that build.

If I press reset a couple of times the board will eventually boot.

If I disable CAAM during SPL init the issue goes away entirely.

diff --git a/board/toradex/verdin-imx8mp/spl.c b/board/toradex/verdin-imx8mp/spl.c
index b901038e90..2b66bab5be 100644
--- a/board/toradex/verdin-imx8mp/spl.c
+++ b/board/toradex/verdin-imx8mp/spl.c
@@ -50,6 +50,7 @@ void spl_dram_init(void)

 void spl_board_init(void)
 {
+#if 0
        if (IS_ENABLED(CONFIG_FSL_CAAM)) {
                struct udevice *dev;
                int ret;
@@ -58,6 +59,7 @@ void spl_board_init(void)
                if (ret)
                        printf("Failed to initialize caam_jr: %d\n", ret);
        }
+#endif

        /*
         * Set GIC clock to 500Mhz for OD VDD_SOC. Kernel driver does

The first board we saw this on booted most of the times, probably around 49 of 50 boots worked, or even more. The second board instead fail about 5 times out of 6.

CAAM is enabled during second u-boot initialization with our fix, so the issue seems to be related to CAAM during SPL and it’s always near Find img info in the console output, but it varies a bit with the exact config used during u-boot build. I don’t have the configs saved from my testings but I have some outputs saved.

The most common output is with the reset occurring right after BL31, and BL31 is only printed on the first (cold) boot, or after an external reset (the reset button). The loop seems to reach BL31 (or if it’s skipped, I’m not an expert here…)

This particular example got to print half of the secondary u-boot header before the reset.

U-Boot SPL 2022.04-6.4.0-devel+git.15fa90038d82 (Jan 31 2024 - 13:09:35 +0000)
DDRINFO: start DRAM init
DDRINFO: DRAM rate 4000MTS
Training FAILED
DDRINFO: start DRAM init
DDRINFO: DRAM rate 4000MTS
DDRINFO:ddrphy calibration done
DDRINFO: ddrmix config done
DDR configured as single rank
SEC0:  RNG instantiated
Normal Boot
WDT:   Started watchdog@30280000 with servicing (60s timeout)
Trying to boot from BOOTROM
Boot Stage: Primary boot
Find img info 0x&4802e000, size 888
Download 1079296, Total size 1079856
NOTICE:  BL31: v2.6(release):lf_v2.6-g3c1583ba0a
NOTICE:  BL31: Built : 11:00:38, Nov 21 2022


U-Boot 2022.04-6.4.0-devel+git.15fa90038d82 (Jan 31 2024 - 13:
U-Boot SPL 2022.04-6.4.0-devel+git.15fa90038d82 (Jan 31 2024 - 13:09:35 +0000)
DDRINFO: start DRAM init
DDRINFO: DRAM rate 4000MTS
Training FAILED
DDRINFO: start DRAM init
DDRINFO: DRAM rate 4000MTS
DDRINFO:ddrphy calibration done
DDRINFO: ddrmix config done
DDR configured as single rank
SEC0:  RNG instantiated
Normal Boot
WDT:   Started watchdog@30280000 with servicing (60s timeout)
Trying to boot from BOOTROM
Boot Stage: Primary boot
Find img info 0x&4802e000, size 888

U-Boot SPL 2022.04-6.4.0-devel+git.15fa90038d82 (Jan 31 2024 - 13:09:35 +0000)
DDRINFO: start DRAM init
DDRINFO: DRAM rate 4000MTS
Training FAILED
DDRINFO: start DRAM init
DDRINFO: DRAM rate 4000MTS
DDRINFO:ddrphy calibration done
DDRINFO: ddrmix config done
DDR configured as single rank
SEC0:  RNG instantiated
Normal Boot
WDT:   Started watchdog@30280000 with servicing (60s timeout)
Trying to boot from BOOTROM
Boot Stage: Primary boot
Find img info 0x&4802e000, size 888

Disabling CAAM during SPL have so far fixed the issue on both the problematic modules, and works with all other modules we have too.

The same thing happens with the upstream image from Toradex Easy Installer

U-Boot SPL 2022.04-6.5.0+git.28dc906f6107 (Dec 22 2023 - 09:31:10 +0000)
DDRINFO: start DRAM init
DDRINFO: DRAM rate 4000MTS
Training FAILED
DDRINFO: start DRAM init
DDRINFO: DRAM rate 4000MTS
DDRINFO:ddrphy calibration done
DDRINFO: ddrmix config done
DDR configured as single rank
SEC0:  RNG instantiated
Normal Boot
WDT:   Started watchdog@30280000 with servicing (60s timeout)
Trying to boot from BOOTROM
Boot Stage: Primary boot
Find img info 0x&4802a600, size 888
Need continue download 1024

U-Boot SPL 2022.04-6.5.0+git.28dc906f6107 (Dec 22 2023 - 09:31:10 +0000)
DDRINFO: start DRAM init
DDRINFO: DRAM rate 4000MTS
Training FAILED
DDRINFO: start DRAM init
DDRINFO: DRAM rate 4000MTS
DDRINFO:ddrphy calibration done
DDRINFO: ddrmix config done
DDR configured as single rank
SEC0:  RNG instantiated
Normal Boot
WDT:   Started watchdog@30280000 with servicing (60s timeout)
Trying to boot from BOOTROM
Boot Stage: Primary boot
Find img info 0x&4802a600, size 888
Need continue download 1024

U-Boot SPL 2022.04-6.5.0+git.28dc906f6107 (Dec 22 2023 - 09:31:10 +0000)
DDRINFO: start DRAM init
DDRINFO: DRAM rate 4000MTS
Training FAILED
DDRINFO: start DRAM init
DDRINFO: DRAM rate 4000MTS
DDRINFO:ddrphy calibration done
DDRINFO: ddrmix config done
DDR configured as single rank
SEC0:  RNG instantiated
Normal Boot
WDT:   Started watchdog@30280000 with servicing (60s timeout)
Trying to boot from BOOTROM
Boot Stage: Primary boot
Find img info 0x&4802a600, size 888
Need continue download 1024

1 Like

Hi @SvenAlmgren !

Thanks for creating this new thread :slight_smile:

Some questions:

Which version of Dahlia are you using?

I have here a Verdin iMX8MP Q 8GB WB IT V1.1A. Not exactly the same module as yours, but it should be good enough to try to reproduce the issues.

It is connected to a Dahlia V1.1C and it is installed with the Reference Minimal Image from BSP 6.5.0-build.9.

I just reset the module 50 times and none of them got stuck.

Would you be able to share a way to reproduce the issue?

Also, do you have other modules to test? Is the issue reproducible on those other modules?

Have you tried reflashing the OS to the module? Does the issue persists even after reflashing?

Does the issue also happen with unmodified Reference Images from Toradex?
Which kind of modifications you did in your custom U-Boot?

Best regards,

Dahlia was 1.0B, but also confirmed on Verdin Development Board V1.1F. (I know there are newer Dahlia boards… we’ve mainly used the Development Board when not running on our custom carrier).

I believe it’s an issue with this particular SoM, as we only ever had 1 other that had similar (but less common) issues. I’ve reached out to Diana U via email about if you want access to this particular SoM if there’s anything you want to check on this. It could be a defect on this particular board, but it could also be some tolerances that are too tight, we’ve seen this issue on 2 of about 60 SoMs, so it’s not common, and it’s only this particular board that had it this frequently. (If you maybe want to check with jtag where it actually crashes?)

I’m sorry, I don’t have any good repro case for this, as it’s most likely a defective SoM.

I originally posted this to put light on that disabling CAAM is a workaround.

And yes, it happens when I flash both the regular reference image and the upstream reference image, both unmodified. We can’t run the reference image on our carrier because we have an LVDS display directly connected, but the issue exists also when running on your carriers with stock reference images.

And yes, other SoMs work as expected.

And yes, I have reflashed the module multiple times, and read back the flashed image and verified that it reads correctly.

So I think the module is defective, but maybe there’s some insight the be gained by running it with jtag? We only have jtag interfaces for our microcontrollers, and even if I did I’m not an expert on the boot steps of the iMX8MP.

Hi @SvenAlmgren !

Sorry for the delay here.

We brought up internally. We will discuss about it and update you here about the outcome.

Thanks for bringing this to our attention and taking the time to do this investigation :slight_smile:

Best regards,

Hi @SvenAlmgren !

Could you please send us back the module(s) via RMA?

Here is the link to trigger the process:

Best regards,

Hello @SvenAlmgren,
I don’t know if you already sent the modules back to us or not. Did you keep a module that reproduces the issue with you?

The reason I ask is because we have a potential fix for the issue and it would be great if you could try it out and see if you can still reproduce the reset loop.

The fix is here:
https://git.toradex.com/cgit/u-boot-toradex.git/commit/?h=toradex_imx_lf_v2022.04&id=3428b470191ca0f5bb48d95e77e8ba48d9708a42

It’s already available on our latest nightly builds if you want to give it a quick try.

Thank you,
Rafael

Hi @rafael.tx

I haven’t had time for the RMA yet, but we have another board that has the same symptoms, but that’s with the CAAM disabled, so I’ll try your patch on this board today and let you know the results.

/Sven

So far this seems to work, I’m currently implementing this in our custom u-boot to have it running on more units.

I’m reading through the incoming changes since we last updated our SRCREF, is 4c53a37ef6ec3de1d7115f39e57ac3af7e587a24 also relevant for the iMX8M-plus? I’m not an expert in what’s different between the -mini and -plus versions related to booting, but I know both have the same size of TCM.

Great news! I don’t expect you’ll see issues on the other units as well. I think we can cancel the RMA in this case.

I’m reading through the incoming changes since we last updated our SRCREF, is 4c53a37ef6ec3de1d7115f39e57ac3af7e587a24 also relevant for the iMX8M-plus? I’m not an expert in what’s different between the -mini and -plus versions related to booting, but I know both have the same size of TCM.

I’ll ask around, see if I can get more information on that.

Regarding this:

I’m reading through the incoming changes since we last updated our SRCREF, is 4c53a37ef6ec3de1d7115f39e57ac3af7e587a24 also relevant for the iMX8M-plus? I’m not an expert in what’s different between the -mini and -plus versions related to booting, but I know both have the same size of TCM.

We modified this because, during our development cycle, the SPL got too big to fit inside the available memory on the iMX8MM. The SPL on the iMX8MP is smaller than the iMX8MM because the ROM can take care of more steps of the boot process and thus the SPL needs fewer features enabled. Because of this, we haven’t faced issues like this on the iMX8MP.

It probably wouldn’t hurt to enable LTO on the iMX8MP, but currently, there’s no real reason to do it.

Thank you for helping out! We’re satisfied that this issue is solved now.