RPMGS on imx8mm. Hard Fault on rpmsg_lite_send_nocopy

VLengineering · February 10, 2025, 2:41pm

I’m trying to implement RPMsg on i.MX8MM. I followed the rpmsg_lite_str_echo_rtos example, using the rpmsg_queue_recv_nocopy and rpmsg_lite_send_nocopy functions.

On the A53 side, everything should be fine. When it starts, I see the link activating and the message "hello world!"appearing on the M4 console.

If I then run the following command from the A53:

echo -n “Hello M4” > /dev/ttyRPMSG30

I correctly see the message arriving.

The problem occurs when I execute rpmsg_lite_send_nocopy to send the received message back to the A53. I either get a Hard Fault, or I notice that in the following function:

int32_t virtqueue_add_consumed_buffer(struct virtqueue *vq, uint16_t head_idx, uint32_t len)
{
if (head_idx > vq->vq_nentries)
{
return (ERROR_VRING_NO_BUFF);
}

VQUEUE_BUSY(vq, used_write);
vq_ring_update_used(vq, head_idx, len);
VQUEUE_IDLE(vq, used_write);

return (VQUEUE_SUCCESS);

}
head_idx has an incorrect value, causing the ERROR_VRING_NO_BUFF error to be triggered.

Any suggestion ?

Thanks

ToradexAI · February 10, 2025, 2:41pm

Hi, I’m the Toradex AI Assistant. The Toradex Support team will follow up with you shortly!

Did you like my answer?

Hello,

It sounds like you’re encountering a challenging issue with the rpmsg_lite_send_nocopy function on your i.MX8MM platform, resulting in a hard fault or an ERROR_VRING_NO_BUFF error. This typically points to a problem with buffer management or memory allocation. Here are some steps to help you troubleshoot and resolve this issue:

Memory Allocation and Device Tree Configuration:
- Verify that the memory regions used by RPMsg are correctly defined in your device tree. Ensure that the reserved memory regions for RPMsg are properly allocated and do not overlap with other memory regions.
- Check the device tree overlays provided by Toradex for RPMsg and ensure they are correctly applied.
Buffer Management:
- The incorrect head_idx value suggests there might be an issue with buffer management. Ensure that buffer indices are correctly tracked and updated.
- Confirm that the buffer sizes and counts are correctly configured in your RPMsg setup. The number of buffers and their sizes should match between the A53 and M4 cores.
Synchronization and Timing:
- Ensure proper synchronization between the A53 and M4 cores. Timing issues can lead to incorrect buffer indices if one core is reading or writing to a buffer that the other core is not ready for.
Code Review and Debugging:
- Review your implementation of the RPMsg functions, especially around rpmsg_lite_send_nocopy and rpmsg_queue_recv_nocopy. Ensure all parameters and return values are handled correctly.
- Use debugging tools to step through the code and monitor variable values, particularly around buffer management functions.
Example Code Comparison:
- Compare your implementation with the rpmsg_lite_str_echo_rtos example to ensure all necessary steps and configurations are correctly followed.

By carefully reviewing these aspects, you should be able to identify and resolve the issue causing the hard fault or buffer error. If the problem persists, consider simplifying the setup to isolate the issue further.

I hope this helps you resolve the issue. If you have any more questions, feel free to ask.

Best regards,
Toradex AI Assistant

Please, also check if these links can help you solve your problem:

https://community.nxp.com/t5/i-MX-Processors/Where-is-the-RPMsg-document-for-IMX8MM/td-p/894428/?profile.language=zh-CN.

yago.tx · February 10, 2025, 6:52pm

Hi @VLengineering,

Is the hmp device tree overlay enabled? You may find the enabled overlays in /sysroot/boot/ostree/torizon-*/dtb/overlays.txt

Also, could you please send the output of the tdx-info command on your module?

Best regards,
Yago

VLengineering · February 11, 2025, 8:36am

Hi Yago,

my tdx-info output is:
Software summary

Bootloader: U-Boot
Kernel version: 5.15.148-6.7.0+git.bfdbfb2c85fb #1-TorizonCore SMP PREEMPT Thu Jun 20 15:59:41 UTC 2024
Kernel command line: root=LABEL=otaroot rootfstype=ext4 quiet logo.nologo vt.global_cursor_default=0 plymouth.ignore-serial-consoles splash fbcon=map:3 ostree=/ostree/boot.1/torizon/bc8150e4d70ac4642d17b5145c80286f23ef76241d52ed3655a9e2fd6054f963/0
Distro name: NAME=“TorizonCore”
Distro version: VERSION_ID=6.7.0-build.18
Distro variant: VARIANT=“Docker”
Hostname: verdin-imx8mm-06944461

Hardware info

HW model: Toradex Verdin iMX8M Mini WB on Verdin Development Board
Toradex version: 0060 V1.1B
Serial number: 06944461
Processor arch: aarch64

Hi Yago, as I wrote, it seems that everything is fine on the A53 side. As soon as it boots, I see the message ‘Hello World!’ on the M4 console. If I send data from A53, I can see it on the M4 console. The problem is when I write from M4 to A53.
My task running on M4 is.

static struct rpmsg_lite_instance *volatile rpmsghndRpmsg;
static struct rpmsg_lite_endpoint *volatile rpmsghndEpt = NULL;
static volatile rpmsg_queue_handle rpmsghndQueue = NULL;
static char buf[512];
static uint32_t len;

void saldatrice_task(void *param)
{
volatile uint32_t rpmsghndRemoteAddr;
int32_t result;
void *rx_buf;
void *tx_buf;
uint32_t size = RL_BUFFER_PAYLOAD_SIZE;

PRINTF("Saldatrice Task started ...\n");    

// Init RPMSG
PRINTF("\r\nRpmsgHnd init ...\r\n");
rpmsghndRpmsg = rpmsg_lite_remote_init((void *)RPMSG_LITE_SHMEM_BASE, RPMSG_LITE_LINK_ID, RL_NO_FLAGS);
// Attesa link
rpmsg_lite_wait_for_link_up(rpmsghndRpmsg, RL_BLOCK);
PRINTF("\r\nLink up...\r\n");
// Creazione coda e endpoint
rpmsghndQueue = rpmsg_queue_create(rpmsghndRpmsg);
rpmsghndEpt   = rpmsg_lite_create_ept(rpmsghndRpmsg, LOCAL_EPT_ADDR, rpmsg_queue_rx_cb, rpmsghndQueue);
// Announcement
SDK_DelayAtLeastUs(2000000U,SDK_DEVICE_MAXIMUM_CPU_CLOCK_FREQUENCY);
(void)rpmsg_ns_announce(rpmsghndRpmsg, rpmsghndEpt, RPMSG_LITE_NS_ANNOUNCE_STRING, RL_NS_CREATE);
PRINTF("\r\nRpmsgHnd nameservice sent, ready for incoming messages...\r\n");

#ifdef RPMSG_LITE_MASTER_IS_LINUX
(void)rpmsg_queue_recv(rpmsghndRpmsg, rpmsghndQueue, (uint32_t *)&rpmsghndRemoteAddr, buf, sizeof(buf), &len, RL_BLOCK);
buf[len] = 0;
PRINTF(“\r\n%s…%ld\r\n”, buf, len);
#endif

while(1)
{
    /* Get RPMsg rx buffer with message */
    result = rpmsg_queue_recv(rpmsghndRpmsg, rpmsghndQueue, (uint32_t *)&rpmsghndRemoteAddr, buf, sizeof(buf), &len, RL_BLOCK);
    if (result != 0)
        break;

    PRINTF("\r\n\r\nRpmsgHnd Get Message From Master Side:\r\n");
    for (size_t i = 0; i < len; i++) {
        PRINTF("%02X,", buf[i]);
    }   
    buf[len] = 0;

    result = rpmsg_lite_send(rpmsghndRpmsg, rpmsghndEpt, rpmsghndRemoteAddr, buf, len, RL_BLOCK);
    if (result != 0)
        break;


    PRINTF("Saldatrice Task running ...\n");

// rpmsghndReceiveNoCopy((char *)buf, &buflen);
// vTaskDelay(10);
// rpmsghndTransmitNoCopy((char *)buf, buflen);
}
vTaskDelay(100);
}
I tried both using the functions rpmsg_queue_recv_nocopy and rpmsg_lite_send_nocopy , as well as rpmsg_queue_recv and rpmsg_lite_send , copying exactly the example provided in the NXP SDK. The problem always lies with the send operations. Sometimes, some sends go through, but then it goes into a hard fault. Other times, if it doesn’t crash, the following condition is triggered:

int32_t virtqueue_add_consumed_buffer(struct virtqueue *vq, uint16_t head_idx, uint32_t len)
{
if (head_idx > vq->vq_nentries)
{
return (ERROR_VRING_NO_BUFF);
}
}

Rightly so, because head_idx has absurd values beyond the number of entries in the queue.

I’m debugging on the M4 core on a Linux machine using VSCode and CMake.
My config.cmake is:

set(CONFIG_COMPILER gcc)
set(CONFIG_TOOLCHAIN armgcc)
set(CONFIG_USE_COMPONENT_CONFIGURATION false)
set(CONFIG_USE_middleware_multicore_rpmsg_lite_imx8mm_m4_freertos true)
set(CONFIG_USE_middleware_multicore_rpmsg_lite_freertos true)
set(CONFIG_USE_middleware_multicore_rpmsg_lite true)
set(CONFIG_USE_middleware_freertos-kernel_heap_4 true)
set(CONFIG_USE_driver_clock true)
set(CONFIG_USE_driver_mu true)
set(CONFIG_USE_middleware_freertos-kernel true)
set(CONFIG_USE_driver_common true)
set(CONFIG_USE_driver_rdc true)
set(CONFIG_USE_driver_igpio true)
set(CONFIG_USE_driver_ii2c true)
set(CONFIG_USE_driver_ipwm true)
set(CONFIG_USE_driver_gpt true)
set(CONFIG_USE_device_MIMX8MM6_CMSIS true)
set(CONFIG_USE_utility_debug_console true)
set(CONFIG_USE_component_iuart_adapter true)
set(CONFIG_USE_component_serial_manager_uart true)
set(CONFIG_USE_component_serial_manager true)
set(CONFIG_USE_driver_iuart true)
set(CONFIG_USE_component_lists true)
set(CONFIG_USE_device_MIMX8MM6_startup true)
set(CONFIG_USE_utility_assert true)
set(CONFIG_USE_utilities_misc_utilities true)
set(CONFIG_USE_middleware_freertos-kernel_template true)
set(CONFIG_USE_middleware_freertos-kernel_extension true)
set(CONFIG_USE_CMSIS_Include_core_cm true)
set(CONFIG_USE_device_MIMX8MM6_system true)
set(CONFIG_CORE cm4f)
set(CONFIG_DEVICE MIMX8MM6)
set(CONFIG_BOARD evkmimx8mm)
set(CONFIG_KIT evkmimx8mm)
set(CONFIG_DEVICE_ID MIMX8MM6xxxLZ)
set(CONFIG_FPU SP_FPU)
set(CONFIG_DSP NO_DSP)

Memory sections in my .ld file are:

/* Entry Point */
ENTRY(Reset_Handler)

HEAP_SIZE = DEFINED(heap_size) ? heap_size : 0x400;
STACK_SIZE = DEFINED(stack_size) ? stack_size : 0x400;

/* Specify the memory areas */
MEMORY
{
m_interrupts (RX) : ORIGIN = 0x1FFE0000, LENGTH = 0x00000240
m_text (RX) : ORIGIN = 0x1FFE0240, LENGTH = 0x0001FDC0
m_data (RW) : ORIGIN = 0x20000000, LENGTH = 0x00020000
m_data2 (RW) : ORIGIN = 0x80000000, LENGTH = 0x01000000
}

I really don’t know how to solve this. I tried running the M4 application both through the debugger and by loading it following your guide. In both cases, as soon as it tries to send, it crashes.
But the receive works perfectly.
If I remove the send from my M4 application, everything works.

Thanks

yago.tx · February 11, 2025, 3:14pm

Hi @VLengineering,

The reason I asked if the hmp overlay is enabled is because it configures the vring that your software is not finding (the overlay source code). Your firmware looks correct, you are calling the required functions (remote_init, wait_for_link, queue_create, create_ept) and the parameters look correct, and you said that not even the SDK examples work (“[…] copying exactly the example provided in the NXP SDK […]”), which indicates that something in Linux may not be configured properly.

Can you send the contents of your /sysroot/boot/ostree/torizon-*/dtb/overlays.txt file?

Edit: Another thing, your CMAKE configuration refers to the MIMX8MM6, which is the SoC for the Verdin iMX8M Mini Quad, but your tdxinfo indicates that you are using the Veridn iMX8M Mini DualLite WB, which uses the MIMX8MM3xxxKZ SoC. This could be messing up the addresses in your firmware. You may find more information about the SDK setup on our MCUXpresso setup guide.

VLengineering · February 11, 2025, 3:53pm

Hi Yago

my overlays.txt contains verdin-imx8mm_hmp_overlay.dtbo which was already present in the /sysroot/boot/ostree/torizon-*/dtb/overlays directory; I just added it to the overlays.txt file. However, I notice that in the *.dts file you provided, the base addresses are @55000000 , whereas in my board.h I have:

/* Shared memory base for RPMsg communication. */
#define VDEV0_VRING_BASE (0xB8000000U)
#define RESOURCE_TABLE_OFFSET (0xFF000)

But I haven’t modified these files; maybe they were related to the Quad version for which the example was compiled?

Thanks

VLengineering · February 11, 2025, 4:53pm

Hi Yago, I tried with the overlay you suggested, but nothing works anymore, even after changing the base address of VDEV0_VRING_BASE to 55000000. I no longer see Link-up or the HelloWorld! message at kernel startup. At this point, I have even more doubts.

What is the correct overlay to use? As I mentioned, I added verdin-imx8mm_hmp_overlay.dtbo to overlays.txt, as it was already present. And with that, I need to use B8000000 as the VDEV0_VRING_BASE address in my M4 software.

Thanks

yago.tx · February 11, 2025, 9:38pm

Hi @VLengineering,

The addresses on the SDK should be correct (no need to replace with the ones on the overlay), and in theory the NXP examples should just work (requiring only enabling the HMP overlay and loading a kernel module), so tomorrow I’ll run the echo example from NXP to gather more information, and if everything works I’ll send you the exact steps I did to get there.

VLengineering · February 11, 2025, 9:58pm

Hi Yago

Thank you very much

VLengineering · February 12, 2025, 7:39pm

Hi Yago

some news ?

Thanks

yago.tx · February 12, 2025, 10:05pm

Hi @VLengineering, I tried the echo example on Torizon 6 (and 7) and found problems on Torizon’s rpmsg (even got a kernel panic) and on our documentation. I managed to get the echo example working without issues on our BSP (tested on BSP 6.8.1 reference minimal), so the problem seems related to Torizon.

I still need to run a few more test and check with the Torizon team. I’m sorry for not having an answer today, and I’ll keep you updated with our findings.

VLengineering · February 12, 2025, 10:36pm

Hello Yago,

On my side, I downloaded the latest SDK version and ran the rpmsg_lite_str_echo_rtos application on the M4, but I encountered the same issue. The send function crashes the application on the M4.
I can confirm that I also experienced a kernel panic in these latest tests, which I had never seen before.

I was quite worried, thinking it was an issue on my side. I hope you manage to solve it soon. Communication between the two cores is essential for my application.

If needed, is there any other possible solution?

Thanks

VLengineering · February 14, 2025, 6:48pm

Hi Yago,

some new ?

Thanks

yago.tx · February 14, 2025, 10:39pm

Hi @VLengineering,

I had no luck trying to get the send functions to work, and this is not a known issue, so we’ll have to investigate more to understand why it is broken on Torizon.

For the workarounds, I know of two possible paths:

Using Linux BSP (which from a quick check the echo example seems to be working, and it uses the send functions).
Using shared memory between the Cortex A and Cortex M cores.

To expand #2 a bit more:

You can map a memory region to be accessed by both Cortex A and Cortex M, then carefully write data to that region, and it will be visible by both cores. This, of course, is a lot harder to get right when compared to using a (functioning) rpmsg system.

It also involves rebuilding Torizon to allow processes to access global memory (or implementing part of your application as a driver in kernel space), and reserving the memory in the device tree.

A few months ago I used this direct memory writing/reading approach for transferring data between the cores in a project with the Verdin iMX8M Plus.

I would not recommend following this path under normal circumstances.

If you still want to know a bit more about the approach #2, I can write a short guide on how to do it.

VLengineering · February 17, 2025, 8:44am

Hi Yago, I bypassed the issue using solution 2. I created an overlay to reserve a shared memory area following Toradex guide. My Python application on A53 runs in a Docker container and using mmap I can access the shared memory. I implemented a protocol with a handshake to manage communication where A53 is always master vs M4. I didn’t have to make the container privileged, but I granted it access to /dev/mem using the --device option.
Keep in mind that I don’t need to exchange large amounts of data between the A53 and M4, nor do I need to do it frequently (every 200 - 500ms). For now, I’m using this solution, which allows me to move forward. I’m waiting for you to fix the issue with RPMSG.

RPMGS on imx8mm. Hard Fault on rpmsg_lite_send_nocopy

my tdx-info output is: Software summary

Hardware info

HW model: Toradex Verdin iMX8M Mini WB on Verdin Development Board Toradex version: 0060 V1.1B Serial number: 06944461 Processor arch: aarch64

my tdx-info output is:
Software summary

HW model: Toradex Verdin iMX8M Mini WB on Verdin Development Board
Toradex version: 0060 V1.1B
Serial number: 06944461
Processor arch: aarch64