Run simply Hello world" M4 application in DDR

As part of our feasibility tests and performance tests on the Verdinn imx8mm, we are now trying to use DDR instead of TCM to run a simply ‘Hello world’ application on the Core M side.
Unfortunately, we don’t know how to do it and we haven’t found much info on this topic.
We are able to launch the application from the eboot or linux side when using the TCM.

We tried to run an application compiled for DDRAM from Linux side without success.

Do you have a tutorial to help us?
Thank you for your help.

Hi, I’m the Toradex AI Assistant.

I might make mistakes. Our support team will step in if needed. If I answered your question, please tick the “Solution” checkbox. For follow-up questions to me, please use @ToradexAI in your post.

Did you like my answer? :+1: :-1:


Running Hello World on Verdin iMX8M Mini Cortex-M4 Using DDR

Memory Configuration for DDR Execution

For the Verdin iMX8M Mini, the Cortex-M4 core can access DDR memory in addition to the TCM regions(1). The memory layout shows that DDR is accessible by the M4 core at address range 0x4000_0000-0xBFFF_FFFF(1).

Prerequisites

You need to set up the SDK and toolchain as described in the Setting Up MCUXpresso SDK and Toolchain for Cortex-M development article(2).

Memory Areas Overview

The Verdin iMX8M Mini uses a modified Harvard architecture where the Cortex-M4 CPU has two buses connected to the main interconnect(1):

  • Code bus - for instruction fetching
  • System bus - for data fetching

For optimal performance, program code should be located in a region accessed through the code bus, while data should be in a region accessed through the system bus(1).

Memory Mapping for DDR

According to the memory areas documentation(1):

Region Size Cortex-A53 M4-0
DDR Address 2GB 0x0000_0000-0x3FFF_FFFF 0x4000_0000-0xBFFF_FFFF

The full DRAM range is 0x1_0000_0000 - 0x2_FFFF_FFFF, but only 2048MB of the DRAM is accessible by the M4 cores(1).

Development Steps

  1. Download the MCUXpresso SDK for Verdin iMX8M Mini(2):

    • Verdin iMX8M Mini DualLite: MIMX8MM3xxxLZ
    • Verdin iMX8M Mini Quad: MIMX8MM6xxxKZ
  2. Verify the source code structure(2):

$ cd <verdin-imx8mm-sdk>/boards/
$ tree -L 2
.
└── evkmimx8mm
    ├── cmsis_driver_examples
    ├── demo_apps
    ├── driver_examples
    ├── evkmimx8mm.png
  1. Modify the linker script to target DDR memory instead of TCM. You’ll need to configure the memory regions to use the DDR address space accessible by the M4 core (0x4000_0000-0xBFFF_FFFF)(1).

Important Considerations

  • Memory allocation in advance is required for the M4 core to use DDR memory(1)
  • The TCM regions provide zero wait-states and massively better performance than DRAM, even if cached(1)
  • When using DDR, ensure proper memory management between the Cortex-A53 cores running Linux and the Cortex-M4 core

Loading and Execution

After compiling your project for DDR execution, you’ll need to load it using U-Boot as described in the How to load compiled binaries into Cortex-M guide(3).

The key difference when using DDR is ensuring your linker script targets the correct DDR memory regions accessible by the M4 core and that memory allocation is properly coordinated with the Linux system running on the A53 cores.

Best regards, Toradex AI Assistant.

Hello @slou26,

Running the Cortex-m firmware from the DDR is not directly supported via remoteproc.
It is technically possible to do so, but a small amount of code will need to run on the TCM to jump the Cortex-M to the DDR area.

Otherwise, running the firmware from the DDR can be done via u-boot.
To do that, the steps on the following page are mostly valid: How to Load Compiled Binaries into Cortex-M | Toradex Developer Center
When building the binary, please make sure to build the one for DDR, using the corresponding build_ddr_release.sh script from the MCU Xpresso SDK.
When setting up the firmware to be started from u-boot, the following commands can be used:

setenv load_cm_image "${load_cmd} ${loadaddr} ${cm_image}"
setenv cm_boot "${load_cm_image}; cp.b ${loadaddr} 0x80000000 ${filesize}; dcache flush; bootaux 0x80000000"

The main differences are:

  • Change of base address to 0x80000000.
  • Use of ${filesize} instead of ${cm_image}.
    • Ideally, you could also check that filesize is always smaller than the area of reserved memory defined in your device-tree overlay.

This should be enough to get the firmware running from u-boot.
The default device tree overlay for HMP has a region defined as reserved at 0x80000000.
If you need a larger region for a larger firmware, this would need to be modified in the device tree overlay.

Another detail to keep in mind is that the 0x80000000 address is valid for modules with 2 GB of RAM.
For modules with 1 GB of RAM, a different address and adjustments to the linker script from MCU Xpresso are needed.

Running the firmware from the DDR has performance impacts, therefore I recommend that you check if doing so is really needed for your use case.

Best Regards,
Bruno

Thanks for your reply …

That’s exactly what I did but it doesn’t work. I may have forgotten something.
We have a Toradex Verdin iMX8M Mini DualLite 1GB board, so I modified the linker script by changing the address 0x80000000 to 0x70000000 …

Verdin iMX8MM # run cm_boot_ddr
14980 bytes read in 3 ms (4.8 MiB/s)
## No elf image at address 0x70000000
## Starting auxiliary core stack = 0x70400000, pc = 0x700002FD...

with

Verdin iMX8MM # printenv cm_boot_ddr
cm_boot_ddr=ext4load mmc 0:1 0x48200000 /ostree/deploy/torizon/var/rootdirs/home/torizon/m4.bin; cp.b 0x48200000 0x70000000 3c9; dcache flush; bootaux 0x70000000

If I take elf file instead of binary file ,only this message and no hello world displayed :

Verdin iMX8MM # run cm_boot_ddr
153664 bytes read in 4 ms (36.6 MiB/s)

Hello @slou26,

Sorry for the confusion.
I could have checked your previous threads.

For the Verdin iMX8M Mini DualLite 1GB, the changes to the commands in u-boot are that we need to use 0x77000000 as the base address.

setenv load_cm_image "${load_cmd} ${loadaddr} ${cm_image}"
setenv cm_boot "${load_cm_image}; cp.b ${loadaddr} 0x77000000 \${filesize}; dcache flush; bootaux 0x77000000"

Please also note the addition of a \ before ${filesize} to make sure it is resolved at runtime.

In addition to these changes, you can use the following patch to change the linker script and board.h file:

imx8mm-1gb-m4-memory.patch (1.7 KB)

With the two changes above, the firmware should work from u-boot.

However, to allow the Linux system to start, additional changes are needed.
These changes are to the device tree overlay, which needs to reserve a different area of memory and also setup the rpmsg areas in a different place:

&{/} {
	imx8mm-cm4 {
		compatible = "fsl,imx8mm-cm4";
		rsc-da = <0x780ff000>; /* Updated to match new rsc_table */
		clocks = <&clk IMX8MM_CLK_M4_DIV>;
		mbox-names = "tx", "rx", "rxdb";
		mboxes = <&mu 0 1
			  &mu 1 1
			  &mu 3 1>;
		memory-region = <&vdevbuffer>, <&vdev0vring0>, <&vdev0vring1>, <&rsc_table>, <&m4_reserved>;
		syscon = <&src>;
		fsl,startup-delay-ms = <500>;
	};
};

&uart4 {
	status = "disabled";
};

&resmem {
	#address-cells = <2>;
	#size-cells = <2>;

	m4_reserved: m4@77000000 {
		no-map;
		reg = <0 0x77000000 0 0x01000000>; /* 16MB total */
	};

	vdev0vring0: vdev0vring0@78000000 {
		reg = <0 0x78000000 0 0x00008000>; /* 32KB */
		no-map;
	};

	vdev0vring1: vdev0vring1@78008000 {
		reg = <0 0x78008000 0 0x00008000>; /* 32KB */
		no-map;
	};

	rsc_table: rsc-table@780ff000 {
		reg = <0 0x780ff000 0 0x00001000>; /* 4KB */
		no-map;
	};
	vdevbuffer: vdevbuffer@78100000 {
		compatible = "shared-dma-pool";
		reg = <0 0x78100000 0 0x01000000>; /* 16MB */
		no-map;
	};
};

Best Regards,
Bruno

Thank you @bruno.tx … It’s working with 0x77000000 as base address.
Why it’s not working with 0x70000000 as base address ?
What is the maximum size I can allocate to m4 core on verdin 1Gb ?

it’s works with 0x70000000 as base address too, it was just missing ’ \ ’ before ${filesize}

Hello @slou26,

It is good to know that you got it working with 0x70000000 as well.
Any area on the DDR that is not used elsewhere should be usable for this.
The limit will depend on the DDR that needs to be allocated for the rest of the system.
For most use cases, not much storage should be needed for the firmware.
If large amounts of data need to be shared between Cortex-M and Cortex-A, this could be done in a separate region of reserved memory defined in the device tree overlay, not necessarily in a region which the firmware is loaded to.

Best Regards,
Bruno

Thanks @bruno.tx
It works fine when I launch the application from boot, I get a “hello world” message.
However, when I try to launch it from the Linux side, nothing is displayed and I don’t get any errors.
I only adapted the overlay and build the application with DDR config

/* Entry Point */
ENTRY(Reset_Handler)

HEAP_SIZE  = DEFINED(__heap_size__)  ? __heap_size__  : 0x0400;
STACK_SIZE = DEFINED(__stack_size__) ? __stack_size__ : 0x0400;

/* Specify the memory areas */
MEMORY
{
  m_interrupts          (RX)  : ORIGIN = 0x77000000, LENGTH = 0x00000240
  m_text                (RX)  : ORIGIN = 0x77000240, LENGTH = 0x001FFDC0
  m_data                (RW)  : ORIGIN = 0x77200000, LENGTH = 0x00200000
  m_data2               (RW)  : ORIGIN = 0x77400000, LENGTH = 0x00C00000
}

// SPDX-License-Identifier: GPL-2.0-or-later OR MIT
/*
 * Copyright 2022 Toradex
 */

// Enable RPMSG and the M4 driver

/dts-v1/;
/plugin/;

#include <dt-bindings/clock/imx8mm-clock.h>

/ {
        compatible = "toradex,verdin-imx8mm";
};

&{/} {
        #address-cells = <2>;
        #size-cells = <2>;
        imx8mm-cm4 {
                compatible = "fsl,imx8mm-cm4";
                rsc-da = <0xb8000000>;
                clocks = <&clk IMX8MM_CLK_M4_DIV>;
                mbox-names = "tx", "rx", "rxdb";
                mboxes = <&mu 0 1
                          &mu 1 1
                          &mu 3 1>;
                memory-region = <&vdevbuffer>, <&vdev0vring0>, <&vdev0vring1>, <&rsc_table>, <&m4_reserved>;
                syscon = <&src>;
                fsl,startup-delay-ms = <500>;
        };
};

&uart4 {
        status = "disabled";
};

&resmem {
        #address-cells = <2>;
        #size-cells = <2>;

        m4_reserved: m4@77000000 {
            no-map;
            reg = <0 0x77000000 0 0x1000000>;
        };

        vdev0vring0: vdev0vring0@78000000 {
                reg = <0 0x78000000 0 0x8000>;
                no-map;
        };

        vdev0vring1: vdev0vring1@78008000 {
                reg = <0 0x78008000 0 0x8000>;
                no-map;
        };

        rsc_table: rsc-table@780ff000 {
                reg = <0 0x780ff000 0 0x1000>;
                no-map;
        };

        vdevbuffer: vdevbuffer@78400000 {
                compatible = "shared-dma-pool";
                reg = <0 0x78400000 0 0x100000>;
                no-map;
        };

};

&ecspi2 {
	status = "disabled";
};

Can you help us ?
Can you try on your side ?

Thanks

Hello @slou26,

As I mentioned previously, unfortunately the remoteproc driver cannot start a firmware on the DDR.
This is a known limitation of the driver.
While the driver does correctly load the firmware to the correct region on the DDR, it only starts the firmware from the TCM.

What you could do is to add an entry point your firmware on the TCM that jumps to the address where your actual code resides on the DDR.
This would require some customization on the MCU Xpresso SDK side.

Best Regards,
Bruno

ok thanks , and do you have any examples or tutorial to do that ?

Hello @slou26,

Unfortunately no.
The generally recommended approach would be to load the firmware from u-boot, if you plan to have it on the DDR.
Do you have a strong requirement to start the firmware with remoteproc?

Best Regards,
Bruno

Thank you for your reply.
No, we’re just in the feasibility phase.
So for now, we’ll just start from the boot.

Thanks

1 Like