Rpmsg not running stable on m4

andriscewo · July 10, 2018, 3:15pm

It seems like the missing channel creation [ 0.109011] virtio_rpmsg_bus virtio0: creating channel rpmsg-openamp-demo-channel addr 0x0 is related to where the memory data section (.bss, .heap and .stack) is located. If the memory data section is located in the OCRAM m_text section there is a very little chance that the initialization fails. If the memory data section is located in the m_data section there is a pretty good chance that it fails (especially after a full power down and waiting for a small amount of time).

Ways to reproduce. Run the demo_apps/rpmsg/str_echo_freertos after applying the following change to the linker file Linker file patch. It will quite often fail already at startup. Also some other rpmsg crashes that happen later might be related to this and we experience quite often that the rpmsg doesn’t initialize correctly. This issue appeared just lately and got more and more severe (We just never had the means to reproduce it until now). I suspect that it is because our data section is now partially in the m_data section anyway. Based on the strange behavior and the fact that it happens sometimes even if it is in the m_text section I would suspect a caching issue.

Anyway, I would be very grateful if someone could look into this with better knowledge of the architecture as the rpmsg channel is a crucial part of our software!

stefan.tx · July 13, 2018, 1:08pm

As far as I can tell the linker patch is wrong: The .bss section is initalized in the startup code. See platform/devices/MCIMX7D/startup/gcc/startup_MCIMX7D_M4.S.

Do you see issues with str_echo_freertos with U-Boot boot and without the linker file changes?

stefan.tx · July 13, 2018, 1:21pm

Note that U-Boot also does some magic in ft_board_setup (see boards/toradex/colibri_imx7/colibri_imx7.c). When you use remoteproc, you need to make sure that memory is reserved as if the M4 is started in U-Boot… Note that @stefan_e.tx made some additional changes in the device tree. However, for now maybe we should focus on the bootaux boot case.

andriscewo · July 13, 2018, 2:16pm

Well if the linker patch is wrong then there is still some issue with the linker file. The asserts in the linker file for the size check of the m_text section do not include the .bss section.

No, without the changes there are no issues.

UPDATE: The changes in the linker file are valid. The .bss only gets cleared in the startup script and actually clearing it resolves part of the issues (add #define __STARTUP_CLEAR_BSS). What still remains is that if some code changes there sometimes we still have the issue that rpmsg isn’t opened. The fix hasn’t highest priority anymore as it seems like we need to move to the ddr until we have time to optimize our code.

andriscewo · July 13, 2018, 2:18pm

I agree. For now we can focus on the bootaux case. I will remove the remoteproc part in the question

jaski.tx · July 17, 2018, 10:44am

Is bootaux running stable with m4 or not?

andriscewo · July 18, 2018, 1:40pm

With bootaux and the OCRAM linker file we had the issue that the RPMsg often didn’t start once we had bigger elf files. Adding #define __STARTUP_CLEAR_BSS in the startup script helped a lot, but it didn’t resolve the issue altogether. For now it’s fine, but in the future we will move back to the OCRAM

jaski.tx · July 20, 2018, 3:25pm

Thanks for the feedback.

andriscewo · August 20, 2018, 8:45am

I had a breakthrough on this issue thanks to @dry (Thanks a lot!)

To recreate the error I used this simple modification in the linker file error patch. In theory it should have no effect whatsoever, but with it rpmsg nearly never initialized.

So I checked the boot loader code and found an issue. This is the part that loads the code into memory

		debug("Loading phdr %i to 0x%p (%i bytes)\n",
		      i, dst, phdr->p_filesz);
		if (phdr->p_filesz)
			memcpy(dst, src, phdr->p_filesz);
		if (phdr->p_filesz != phdr->p_memsz)
			memset(dst + phdr->p_filesz, 0x00,
			       phdr->p_memsz - phdr->p_filesz);
		flush_cache((unsigned long)dst & ~(CONFIG_SYS_CACHELINE_SIZE-1),
			    ALIGN(phdr->p_filesz, CONFIG_SYS_CACHELINE_SIZE));

It has 2 issues:

It zeros out the load address of the .bss, .heap and .stack instead of the execution address
Only the cache of the code region is flushed

Why this fails now is because the above patch modified the load address of the .bss, .heap and .stack to be at the place where afterwards the .data section will reside (this also happens if the code just gets larger because the m_text and m_data region are continuous).
This means the place where the .data section resides will be overwritten by 0 before the m4 actually starts. This shouldn’t be an issue at all, but because the cache isn’t flushed part of the .data section can be overwritten with 0 after the m4 booted.

To verify that this is really the problem I applied following patch to just flush the whole region that was modified check assumption. With it the problem doesn’t happen anymore.

To fix the issue on the other hand I would just remove the code that clears the load address (proposed fix), but there are still some open questions for me namely the following:

Why was the .bss, .heap and .stack region 0 out? Isn’t this supposed to be done by the c startup code?
We have the problem also in the DDR and our code is not yet so big that the same situation would happen. I suspect it to be also be a cache issue [edit: based on dry’s feedback and a simple check timing issues could be ruled out]
Could you look at some options to protect the memory a bit better where the m4 is running? For the OCRAM this could be with the RDC (see also “3.2.4.3 Memory Region Map” in nxp manual ). As for the DDR region, would it be possible to handle that with the linux kernel and maybe mark the region as readonly?

dry · August 20, 2018, 11:12am

@andriscewo , so this is u-boot code now?

It zeros out the load address of the .bss, .heap and .stack instead of the execution address
…
Why was the .bss, .heap and .stack region 0 out?

Where do you see this happens? here ? :

memset(dst + phdr->p_filesz, 0x00, phdr->p_memsz - phdr->p_filesz);

This clears the leftovers of the memory segment it loaded too.

the rpmsg initialization in linux is hardcoded (it starts at boot and only if the m4 is running).

Yes and no, I guess. The first hardcoded part would be shared memory where both sides expect to find things - messages to each other. But it does not need M4 running, or more specifically, RPMsg enabled M4 code running on the other side before/while Linux boots. You can delay M4 startup & firmware boot. And do your imx remote proc boot you busy with trying to make work .. (otherwise, whats the point).

As for the DDR region,
This must be already handled in device tree by simply disabling the area used for the M4 code.

For the OCRAM this could be with the RDC ..

The RTOS hw init code does RDC setup for OCRAM, TCM, and so likely also happens on Linux side.

dry · August 20, 2018, 12:51pm

.. t if you do it there would be a check necessary if it doesn’t overwrite the memory section.

No, I don’t see where it can overwrite it. The limits are all there.

I would rather have the linux aware of the memory ..
If you install 1GIG if physical RAM, do you want your Linux to be aware of 2? I’m of view that device tree sets ‘hard’ limits.

When you do move to latest & greatest imxproc, that system works differently [ within the context of discussing remote proc/booting fw on M4, and rpmsg ] , and assumes the management of whole memory, and may allocate where remote “procedure” or processor (whatever) will run it’s code from.

it would be still to late (as long as you don’t want to do it in assembler in the startup code) .

It is in the C startup code of M4, as part of hw init (at least there was in the bsp of freertos released by fsl ). Why is this late?

The memory protection has to be in place before the m4 is started else there is still room for violations

I don’t understand you: do you mean M4 should not be touching RDC, like which memory regions are for M4 only, or which it shares?
Or you want bootloader to partition it before A7/Linux and M4/FreeRTOS boot?

stefan_e.tx · August 20, 2018, 1:49pm

Hi @andriscewo

I didn’t follow the whole conversation. Are you again trying to use remoteproc or does your problem also appear with loading from u-boot? I was able to run remoteproc but the problem is that it requires some ugly hacks and specially one I really don’t like. We then decided to try everything again with the mainline version of Linux but there I’ve another problem. How important is remoteproc for you? I thought you can live with the u-boot start for now?

To make Linux ignore a specific memory region you can mark it as reserved:

diff --git a/arch/arm/boot/dts/imx7d-colibri-emmc-eval-v3.dts b/arch/arm/boot/dts/imx7d-colibri-emmc-eval-v3.dts
index a2164e979606..c06a1bc7f2bb 100644
--- a/arch/arm/boot/dts/imx7d-colibri-emmc-eval-v3.dts
+++ b/arch/arm/boot/dts/imx7d-colibri-emmc-eval-v3.dts
@@ -15,9 +15,37 @@
        model = "Toradex Colibri iMX7D 1GB on Colibri Evaluation Board V3";
        compatible = "toradex,colibri_imx7d_emmc-eval", "toradex,colibri_imx7d_emmc", \
                     "fsl,imx7d";
+
+       reserved-memory {
+               #address-cells = <1>;
+               #size-cells = <1>;
+               ranges;
+
+               m4_reserved_sysmem1: cm4@81000000 {
+                       no-map;
+                       reg = <0x00910000 0x10000>;
+               };
+
+               m4_reserved_sysmem2: cm4@82000000 {
+                       no-map;
+                       reg = <0x00920000 0x20000>;
+               };
+
+       };
+
+       imx7d-cm4 {
+               compatible      = "fsl,imx7d-cm4";
+               memory-region   = <&m4_reserved_sysmem1>, <&m4_reserved_sysmem2>;
+               syscon          = <&src>;
+               clocks          = <&clks IMX7D_ARM_M4_ROOT_CLK>;
+       };
 };

 &usbotg2 {
        vbus-supply = <&reg_usbh_vbus>;
        status = "okay";
 };
+
+&uart2 {
+       status = "disabled";
+};

If you do this, Linux will ignore the addresses specified in the devicetree.

Regards,
Stefan

andriscewo · August 20, 2018, 2:18pm

No, I don’t see where it can overwrite it. The limits are all there.

The m_data load address is inside the m_text memory region. So it would have to consider the memory boundaries of the m_text region and not the m_data region. It doesn’t do that!

If you install 1GIG if physical RAM, do you want your Linux to be aware of 2? …

If you have 2 GIG of physical RAM, but it is shared between processors. Do you want some hardware attached to it (cache) that isn’t managed by anyone and could potentially write something into your RAM? This isn’t about remoteproc or not. The memory is there, the cache has access to it and I would reason that it’s more dangerous to not let your linux know that it’s actually there.

It is in the C startup code of M4, as
part of hw init (at least there was in
the bsp of freertos released by fsl
).

I just took a look at the startup code of the M4 from NXP. The M4 is assigned to a resource domain, but there isn’t done any memory protection for the OCRAM (doesn’t also make sense in general, because it can be in the DDR or TCM too).

Why is this late?

Well, the .data is loaded during the assembler startup and even if you assign the OCRAM memory used for the M4 to the memory domain of the M4 there is still some time where your .data region can be modified.

I don’t understand you: do you mean M4
should not be touching RDC…

No, it should not assign a memory region as default, but based on the address you can easily check if your elf file will be loaded into the OCRAM region. If it is done one could protect the .bss, .heap and .stack by the RDC. If this was done there wouldn’t have been this issue in the first place.

PS. I have the feeling you believe I’m making all this up. If you don’t believe me could you please try it out yourself. All code is available on the toradex repository ( u-boot-toradex.git - U-Boot bootloader for Apalis and Colibri modules branch 2016.11-toradex commit aca804c9ddadbf34a8ff82779e5598ec5e319f23 ). It is about the toradex bootaux method they use to load the elf file

andriscewo · August 20, 2018, 2:37pm

Sorry for the confusion. I just realized I kind of messed up by answering instead of commenting… I corrected the last answer, but I couldn’t change it for the other one as I can’t edit dry’s comments.

The problem is specific to u-boot, namely the bootaux loading mechanism by toradex. Based on the questions from dry I’m not sure he realized it, maybe I should have made it more clear somewhere. Could you take another look at it and tell me what you think about it? Does it make sense to you?

Looks like that could be something I was looking for. Will have to check the manuals what it guarantees.

stefan_e.tx · August 20, 2018, 3:11pm

Hi @andriscewo

I think we should start a new discussion and close all old ones because it is somehow obfuscated now…

One thing I know is that your patch for the linker file is invalid. If you put bss to data_start it messes up everything. bss needs to be in m_data and should come after data. I know you wanted to fix a remoteproc issue with that but basically it was only hiding another problem. The linkerscript as is (without your patch) is okay. I was also able to use remoteproc without your modification.

You are right that there is no protection for the OCRAM region available. This could be improved, however this is what NXP provides.

I don’t see a problem in the bootaux command if you use the default linker script. U-Boot doesn’t do any blackmagic in the background if this is your concern. Therefore I don’t see where someone should write to a memory region of the M4 besides Linux. This is fixed with the devicetree change or by the u-boot modification stefan pointed out.

To make things easier if you open a new thread please upload the modifications of freetros to github so that we take the same version and we talk about the same problem. There are too many patch files around to keep an overview.

Regards,
Stefan

andriscewo · August 20, 2018, 3:26pm

Hi @stefan.tx

Yes, I came up with the linker file because I thought there was something strange with the linker script. I know now everything is fine and the issue was located in the remoteproc check.
I use the linker script modification here for another reason. It is to show that the load address of something that shouldn’t get loaded does actually modify something!
I’m kind of pissed now because I’ve spent way to much time on this problem and I’m not even taken serious

stefan_e.tx · August 21, 2018, 8:57am

Hi @andriscewo

Sorry if you have the feeling we don’t take you serious. The problem on my side is that I simply don’t understand the problem anymore. I’m in your area today, are you in the office around 16:00? Maybe it would be good to sit side by side so that I understand the problem.

Regards,
Stefan

andriscewo · August 21, 2018, 9:06am

It’s fine. I already know how I can make an example without the linker file change. Probably it’s easiest if you just have an example where you actually see the problem. I will open a new thread and post it, it’s really getting messy in here.

stefan_e.tx · August 21, 2018, 9:10am

perfect, thanks!

stefan.tx · August 21, 2018, 9:16am

Indeed very interesting findings!

To be honest, I do not remember the exact reason we added the zeroing. I agree, any proper initialization code should zero out the regions necessary, so the boot loader does not need to do it. However, if properly done it should not hurt… It might have been added for debug reasons: If everything is zeroed out, it is easier to look at memory through debuggers/check with md.l afterwards.
Not sure what you exactly mean. I would assume that as soon as .data is used, strange things can begin to happen, unrelated to the firmware size..
Ideally, yes. Using the processors functionality (MMU on the A7 side, MPU on the M4 side) has the downside that you need to configure the protection twice, and the two protection need to be synchronized/agreeing on the exact settings. This seems somewhat brittle. Therefor I recommend using the RDC memory protection functionality. Unfortunately NXP does not provide examples and we had not the time to investigate in that area. There are preprocessor defines available in the FreeRTOS BSP (e.g. the RDC_MRC() macro), so it should be fairly straight forward to enable protection from M4 side…

Thanks a lot for debugging and the patch! I will create a ticket for the cache flush issue.