IMX8MP Remote proc fails to load M7

Hi,

we are facing some interesting challenging with our M7 system.
We use remoteproc to start the M7 firmware.
Usually, the process works fine but we figured out that when the M7 image grows above a certain size (weather only data, only code or a mixture of both is hard to say) remoteproc fails loading the FW.
This is particularly bad, because we are never able to load a debug build and we always have to build in release mode.

Our FW is all sitting in the TCM (both data and code):

Linker script

MEMORY
{
  m_itcm_interrupts      (RX)  : ORIGIN = 0x00000000, LENGTH = 0x00000400
  m_itcm_text            (RX)  : ORIGIN = 0x00000400, LENGTH = 0x0001FC00
  m_dtcm_data            (RW)  : ORIGIN = 0x20000000, LENGTH = 0x00020000
  m_ddr_data            (RW)  : ORIGIN = 0x80000000, LENGTH = 0x01000000
}

Compilation output

m_itcm_interrupts: 680 B 1 KB 66.41%
m_itcm_text: 66340 B 127 KB 51.01%
m_dtcm_data: 62448 B 128 KB 47.64%
m_ddr_data: 0 B 16 MB 0.00%

The error shown while booting is:

[    7.253579] remoteproc remoteproc0: bad phdr da 0x1de78 mem 0xd5a0
[    7.259905] remoteproc remoteproc0: Failed to load program segments: -22

The same binary (okay, the .bin not the .elf, but the output of the same build) is always running correctly when booting via uboot.

We went for the remoteproc way 'cause it’s better supported in TCB, if we want to switch to uboot we have than to “hack” the u-boot-initial-env-sd after having built the image, since we didn’t find a way to pass the necessary u-boot variables in the TCB input parameters.

Thank you for you support.

Additional info:

tdx-info

verdin-imx8mp-15604834:/var/rootdirs/home/torizon# tdx-info

Software summary
------------------------------------------------------------
Bootloader:               U-Boot
Kernel version:           6.6.54-7.1.0-g3493ccd66900 #1-Torizon SMP PREEMPT Tue Dec 17 21:04:41 UTC 2024
Kernel command line:      root=LABEL=otaroot rootfstype=ext4 quiet logo.nologo vt.global_cursor_default=0 plymouth.ignore-serial-consoles splash fbcon=map:3 ostree=/ostree/boot.0/torizon/a45c8357cdd318d41ef5d9b14198e03a8856550b9b33490848a61855e500cf04/0 clk-imx8mp.mcore_booted=1 firmware_class.path=/var/m7-firmware
Distro name:              NAME="Torizon OS"
Distro version:           VERSION_ID=7.1.0-build.4
Distro variant:           VARIANT="Docker"
Hostname:                 verdin-imx8mp-15604834
------------------------------------------------------------

Hardware info
------------------------------------------------------------
HW model:                 Toradex Verdin iMX8M Plus WB on Verdin Development Board
Toradex version:          0058 V1.1B
Serial number:            15604834
Processor arch:           aarch64
------------------------------------------------------------

Hi, I’m the Toradex AI Assistant.

I might make mistakes. Our support team will step in if needed. If I answered your question, please tick the “Solution” checkbox. For follow-up questions to me, please use @ToradexAI in your post.

Did you like my answer? :+1: :-1:


Hello,

Thank you for your detailed description and the provided information.

Your observation is aligned with known behavior on the Verdin iMX8M Plus with Torizon OS 7.1.0: the Linux remoteproc framework enforces memory and segment integrity checks when loading M7 firmware, and has stricter limitations compared to U-Boot, especially regarding ELF program headers and memory regions.

Key Points from Your Scenario:

  • Loading via remoteproc fails when the binary exceeds a certain size, but succeeds with U-Boot.
  • The load error:
    [    7.253579] remoteproc remoteproc0: bad phdr da 0x1de78 mem 0xd5a0
    [    7.259905] remoteproc remoteproc0: Failed to load program segments: -22
    
  • Your code and data are entirely mapped to TCM, using .elf files.

Why This Happens:

  • The Linux remoteproc loader expects program segments in the ELF file to be contiguous and fit within predefined memory regions (TCM in this case).
  • If a segment tries to load some part outside the available memory area or overlaps/extends it incorrectly according to remoteproc’s checks, the loader will return -EINVAL (-22) as you see.
  • U-Boot’s loader is less strict, so it may successfully load images that the Linux remoteproc will reject.

Some Recommendations:

  1. Check your linker script boundaries:

    • Ensure your code and data do not overflow configured TCM sizes (128KB ITCM, 128KB DTCM for M7).
    • Even if the sections appear within bounds, ensure the total of all used memory does not cross architectural or SoC-specific boundaries that remoteproc checks for.
  2. ELF File Segmentation:

    • The remoteproc loader parses program headers (phdrs). Large .elf files with many or non-aligned segments sometimes cause the loader to “see” a physical address for a segment that is out-of-range or disallowed.
    • Try limiting the number of program headers (combine sections if possible), or generate a “flat binary” (.bin) with only the code and data destined for TCM.
  3. Use stripped and size-limited debug builds:

    • If possible, strip unnecessary debug symbols or use -g1.
    • Sometimes changing debug build flags reduces .elf size and complexity.
  4. Workaround with U-Boot:
    Since U-Boot does not have the same segment restrictions, it remains an option for debug builds.

    • For TorizonOS and TorizonCore Builder, you typically cannot easily adjust U-Boot environment via TCB input parameters. The common workaround is to manually edit and inject the needed U-Boot variables in your provisioning pipeline.
    • This is not as streamlined, but ensures debug firmware can be loaded until the remoteproc issue is resolved in a future OS/kernel update.
  5. Test Across Versions:

    • Some users report better results (or at least different remoteproc behavior) on Torizon OS 6.x releases or with minimal kernel updates. Testing with an older OS or kernel, if practical, may help you determine if something has changed in the newer remoteproc driver.

Links for Reference:

Summary

  • This is a known constraint with the current kernel’s remoteproc loader.
  • Review TCM usage and program header segmentation.
  • Consider using stripped debug builds or fallback to U-Boot boot for larger/development firmware.
  • Watch for updates in future TorizonOS/Kernels for improved remoteproc handling.

Let me know if you need detailed guidance on linker script adjustments or remoteproc configuration!

Best regards,
Toradex AI Assistant.

Perhaps it has to do with some elf misinterpretation in

drivers/remoteproc/remoteproc_elf_loader.c

try if these diffs help:

@@ -181,7 +181,7 @@ int rproc_elf_load_segments(struct rproc *rproc, const struct firmware *fw)
 		bool is_iomem = false;
 		void *ptr;
 
-		if (type != PT_LOAD || !memsz)
+		if (type != PT_LOAD || !filesz/*!memsz*/)
 			continue;
 
 		dev_dbg(dev, "phdr: type %d da 0x%llx memsz 0x%llx filesz 0x%llx\n",
@@ -209,10 +209,10 @@ int rproc_elf_load_segments(struct rproc *rproc, const struct firmware *fw)
 		}
 
 		/* grab the kernel address for this device address */
-		ptr = rproc_da_to_va(rproc, da, memsz, &is_iomem);
+		ptr = rproc_da_to_va(rproc, da, filesz/*memsz*/, &is_iomem);
 		if (!ptr) {
 			dev_err(dev, "bad phdr da 0x%llx mem 0x%llx\n", da,
-				memsz);
+				filesz/*memsz*/);
 			ret = -EINVAL;
 			break;
 		} 

I don’t recall why, but in some situation initialized part of segment (filesz) got much shorter than memory segment (memsz), which as well led me to load error.
As well, depending on your startup routine, if it initializes bss to zero, no need do initialize it by elf loader, so this might help as well, but again, it depends on your C startup routine.

@@ -225,6 +225,7 @@ int rproc_elf_load_segments(struct rproc *rproc, const struct firmware *fw)
 				memcpy(ptr, elf_data + offset, filesz);
 		}
 
+#if 0
 		/*
 		 * Zero out remaining memory for this segment.
 		 *
@@ -238,6 +239,7 @@ int rproc_elf_load_segments(struct rproc *rproc, const struct firmware *fw)
 			else
 				memset(ptr + filesz, 0, memsz - filesz);
 		}
+#endif
 	}
 
 	return ret; 

Thank you for your reply.

Since we’re using Torizon OS, I don’t think we can make these kernel changes in our setup…

But you can try with Linux BSP.

Hello @Giona,

I will try to reproduce the problem.
Once there are further updates, I will send them here.

Best Regards,
Bruno

Hi Bruno

cool, thank you for taking a look at this issue.

For now, we switched to u-boot booting, which comes with it’s own issues since we need to take care of splitting the BIN file into TCM and DDR files, and load them separately… it works, but it’s a bit of “extra work”.

Let me know if you need the elf file which caused the issue. For us, building in Debug mode didn’t work from day 0, so possibly every example project would cause the error in remote proc.

Cheers
Giona

Hello @Giona,

I have been able to reproduce the issue.
It seems to be related to the elf loader, but what exactly is the problem is not clear to me.

The following thread shows the same issue: https://community.nxp.com/t5/i-MX-Processors/i-MX-8M-Plus-Loading-firmware-via-remoteproc-fails/td-p/1675105

In my tests, when the firmware size gets close to 128 KiB, the problem starts to manifest.
It does not matter whether data is in the text region or the data region, as long as it adds up to around 128 KiB or more I can see the same issue you see.
This could indicate that the driver is not correctly using the additional 128 KiB of DTCM which is available to the Cortex-M7 and trying to put all the data within the ITCM, but this is not clear yet.

The driver is provided by NXP, so we will look into this and see whether we fix it or escalate the issue to NXP.

Best Regards,
Bruno

Hi Bruno

I’m glad you have been able to repro this!
I kind of wonder if we are (were?) almost the only ones seriously using remoteproc and the M7 :slight_smile:

Thank you for keep us posted!

Best regards
Giona

Hello @Giona,

The attached patch to the remoteproc driver resolves this issue in our testing.
I understand that you are using Torizon OS, so you most likely don’t have a Yocto build where you can add this patch.

We are working to get such solution integrated into our BSP and Torizon OS.
At this point it is unlikely that this will make the 7.3.0 release at the start of next quarter.
Therefore, I expect this to be fixed in a pre-release of 7.4.0 and in the subsequent 7.4.0 release.

Best Regards,
Bruno

cm7-imx8mp-TCML-U.patch (1.0 KB)