System crash of A7 in combination with M4

Hello

I observed sporadic system crashes upon starting my application after a fresh boot. The problem occurs approximately on every 100th boot. If the crash occurs the A7 core stops completely without a kernel panic or something similar. In case of the error I can not even access the system by console (UART). What is interesting is that the M4 core is still running

Further investigations showed that the crash never occurs when I don’t load the M4 firmware. It also doesn’t occur when I disable the RPMSG interface. This observation lead me to the suspicion that there might be a resource conflict between A7 or M4. Then I stumbled over the following issue: https://www.toradex.com/community/questions/26134/use-remoteproc-and-rpmsg-on-m4.html

  1. Do you think the problem might be related to the linked thread?

  2. Am I doing something wrong in my device tree? For the M4 I use MCIMX7D_M4_ocram.ld (see attachment). Do I have to reserve the OCRAM section used by M4 in the device tree (see attachment)? Or is this already done in imx7s.dtsi?

Attachement

Do you have any idea on how to narrow down the issue?

Best regards,
Michael

In the meantime I explicitly listed the memory areas mentioned in MCIMX7D_M4_ocram.ld as reserved-memory in the linux device tree. Unfortunately the error still occurs. I also enabled dynamic debugging in my kernel and enabled debug logs for the tty_rpmsg driver. However I didn’t see any logs in this area prior to the error.

Do you have any idea what might be going wrong on my system or do you have any ideas on how to debug this problem?

Regards,
Michael

Greetings @michaelg

This could be related to the issue mentioned on that thread, yes. Did you make sure to disable all interfaces used by the M4 core on the Linux device tree? Also, can you try running your code on the TCM instead of OCRAM? Can you please check if your issue persists? That way we can narrow down the causes to a probable conflict related to the OCRAM.

Hi @gustavo.tx

Thank you for the support. Yes I made sure to disable all interfaces used by M4 core in the Linux device tree. I’m about to strip down the M4 code to get it packed into TCM.

One question that arose during that process: Is my bootloader environment configured correctly?

I noticed that until now I started my M4 with:

loadaddr=0x80800000
m4boot=ubi read ${loadaddr}
m4firmware1 && dcache flush
&& bootaux ${loadaddr}

ubi part ubi && run m4boot

I studied the memory map in the iMX7 reference manual: My loadaddr is a section in DDRAM. Shouldn’t this be the address of either TCM or OCRAM? I tried to change it to the corresponding address in TCM and OCRAM with the result that I can’t start my M4 from there. Am I misunderstanding something?

Best regards,
Michael

Update: I probably got it: DDRAM is just an intermediate buffer before the code is getting loaded into OCRAM or TCM. Is that correct?

Hi @gustavo.tx

In the meantime I managed to strip down the code (just for testing) so that it fits into TCM. I successfully run that stripped down code in OCRAM but as soon as I put it in TCM it doesn’t work (I get no console output from M4). Do I need to reconfigure something beside the changed ld file?

Please find my elf files attached. elf files

Best regards,
Michael

Hi @gustavo.tx

The problem why M4 doesn’t work with the linker script “MCIMX7D_M4_tcm.ld” is the m_data section which is located in TCMU. If I put m_data in OCRAM_GP and m_text in TCML it works. What do I need to change in the device tree if I like to use TCMU for m_data like it is defined in MCIMX7D_M4_tcm.ld?

Hello @michaelg!

Can you try reverting your changes to the device tree and try to run your M4 application on the TCM? As far as we’ve tested this is the “default” behavior and should work out of the box with our default device tree.

@michaelg,

That is correct. loadaddr corresponds to an address in RAM that U-Boot uses as an intermediate buffer before loading the code into the desired memory region.

Hi @gustavo.tx

In the meantime I manage to run the firmware entirely in the TCM (text and data section). However the error still occurred in this setup.

Best regards,
Michael