I developed my application on VF61 and it uses RPMSG communication between A5 and M4 cores.
Basically it works as expected, but the communication is not 100% reliable and some of the packets are lost.
In my scenario it’s always the A5 core that sends a message and waits for an answer from M4 core.
A5 core sends a new message every second with Rpmsg_Write(), then waits for the handle hReceiveGlobal created over "DataAvailableEvent". The timeout for thie event is quite long (0.1 s).
When the event is received, Rpmsg_Read() is called.
As far as I can see, one or two messages every 1000 (more or less) don’t see any answer from M4 core and the handle timeouts.
So I started to debug inside M4 core and I use a GPIO: M4 core set it high every time it receives a new messgae from A5 rpmsg_rtos_recv_nocopy() and set it low after he sends the answer calling rpmsg_rtos_recv_nocopy_free().
I see that M4 core receives every message sent by A5 core and it calls rpmsg_rtos_recv_nocopy_free() every time after 4 or 5 ms.
So I suspect two possible reasons for the missing answers:
some errors inside M4 functions (but in this case the behavior should be similar when A5 runs Linux)
some race conditions or whatever in CE 6 RPMSG functions
Can someone provide some ideas on how to debug deeper this scenario?
Is it possible that you send me your application (or of course a stripped-down extract) so I can debug into the rpmsg library to search for the problem? I would need:
The M4 application
(binary should be fine for the moment)
Source code of the WinCe application
(I prefer a complete VS project)
You can mark your reply as private if you don’t want the public to see it.
You already figured out that the communication M4 → A7 is the issue.
You could try to find out whether the problem is related to bidirectional communication, or whether it is still there if you just send messages from the M4 to the A7 only, on a timer base.
I spent the last couple of days debugging the issue and I found something really strange.
When the firmware on M4 core runs standalone (i.e without the application on A5), everything works fine.
In this scenario no communication happens between the two cores (I haven’t find a way to debug M4 through JTAG while application on A5 is running and communicates with M4).
When the application on A5 loads and starts firmware on M4, communication happens between the two cores and the memory region OCRAM from 0x3F040000 to 0x3F070000 is unexpectedly written.(some bytes every 512 bytes).
Since RPMSG buffers starts from 0x3F070000 I wonder if it’s possible that a bug into RPMSG communication uses this portion of OCRAM.
The same memory region is not written when no communication happens between the two cores.
Can someone from Toradex side help with this issue?
The current implementation does not allow to skip the firmware loading. I will check how easy it would be to add this feature.
It’s not only about the actual firmware loading (which is simple to avoid). But you probably want the relevant clocks to be enabled by the library before loading the firmware, and possibly the M4 firmware needs to be up and running before the actual channel initialization.
the function Rpmsg_Open() will neither load nor start the M4 firmware, so you can use JTAG do this before calling Rpmsg_Open().
The approach works, but I found that it is not always stable. For example, I observed that my JTAG environment (I’m using a SEGGER J-Link) stops both cores while downloading the firmware, which sometimes caused the VisualStudio debugger to lose the connection.
I’m going to test it in the next few days and I let you know.
In the meanwhile I’ve been doing other tests from my side and I found that, after having fixed some kind of heap and/or stack overflow in my code for M4, I can build the DEBUG version of the firmware for M4 and load and execute it from A5 core (using the official library 2.3-20181011).
Then I can use JTAG to debug the M4 in “connect-only” mode (connection to M4 while it’s running).
After this progress I was able to see the unexpected memory corruption that I described below.
One more thing, that could be useful to Toradex engineers:
I leave my application working for at least a couple of hours, with A5 core sending a request to M4 every second; M4 sends out to the UART the messages received and transmitted to A5.
Suddenly Rpmsg_Read() on A5 core returns a wrong buffer.
Both the UART and the debugger connected to M4 confirms that M4 core received the right requets and write to memory the expected (right) answer.
Could it be that for some reasons A5 core reads from the wrong message buffer in the FIFO?
Is it possible that Rpmsg library returns the memory address of message buffer where it reads from? In this way I can compare with the address where M4 writes (that I send to UART).
I’ve double checked and it’s not a matter of memory leak on A5 side.
Hi @vix
Unfortunately I didn’t find too much time to debug your issue.
One thing I could easily do is generate a version of the RpmsgLib which outputs more information on the debug serial port, This will slow down the message transfer, but I think this is not relevant for your application.
Regards, Andy
There’s an additional buffer where the WinCe-RpmsgLib stores the incoming messages before you read them from your application. Therefore you see two sets of addresses displayed on UARTA.
BTW: I implemented the additional debug messages in a way that I can activate them by setting a preprocessor #define DBGBUFFERS. I will remove this #define again for any public version of the RpMsgLib. If you will need it again in the future, let me know to build a temporary library version again.