Hi @vix
You also need to enable debug messages in the bootloader config block. First thing you will notice is, that you see some driver loading messages from the image booting into WinCe.
Regards, Andy
Hi @vix
You also need to enable debug messages in the bootloader config block. First thing you will notice is, that you see some driver loading messages from the image booting into WinCe.
Regards, Andy
Hi @andy.tx
I’ve alread enabled debug messages in the bootloader and I see them.
No messages from Rpmsg at all
Hi @vix
Sorry for not doing a final test before - I turned out that the #define DBGBUFFERS
got lost in the process of creating the preliminary release package.
Here’s the updated libraries:
Regards, Andy
Hi @andy.tx
now the messages are printed out as expected.
First of all, one strange (but I can’t say unexpected thing): this is the first message
RpmsgLib can buffer 1024 incoming messages.
Rpmsg: Map sharedMem phys 0x3f060000 -> logic 0x00bd0000
Rpmsg: Map vring[0] phys 0x3f070000 -> logic 0x00be0000
Rpmsg: Map vring[1] phys 0x3f074000 -> logic 0x00bf0000
Does this mean that 0x3f060000 is used as shared memory between cores?
And so M4 should not use it (as I found during my debug)?
A couple of years ago I asked about strange memory overwriting at addresses above 0x3f060000
(see here).
The answer from Toradex side was not clear (at that time).
Can we say now that memory between 0x3f060000
and 0x3f064000
is used by rpmsg library?
And so M4 core can’t use it?
Is it possible to add this info to the documentation clearly?
Another minor thing: the firsl line is
RpmsgLib can buffer 1024 incoming messages
Is this right? Shouldn’t be 32 incoming messages (512 bytes each)?
Dear @vix
I analyzed the rpmsg behavior in more detail in the recent days, so I meanwhile have a better insight int what’s happening. For example there are multiple buffers working together, and which are not perfectly adjusted to each other in size. However, I didn’t want to change too much now.
Here’s a (still somewhat simplified) view:
0x3f060000
contains the actual message data (512 bytes each)You are correct: memory between 0x3f060000
and 0x3f064000
[there was a typo in your comment above] is used by rpmsg library, and therefore not available for anything else. The space reserved is used for 32 buffers, 512 bytes each.
I will add this to the documentation.
The 1024 messages reported in the log output are about the local buffer inside the RpMsgLib - the library reads the messages from the M4 in an interrupt routine. Your application could read these 1024 messages long after they arrived.
Regards, Andy
Hi @andy.tx
your investigation clarifies some points.
I see that the log messages from the library are like this:
Rpmsg: got M4 data[0] @shared 0x00bd0210, store @local buf 0x00525ec0
Rpmsg: App read local buf @0x00525ec0
Rpmsg: got M4 data[1] @shared 0x00bd0410, store @local buf 0x00df7880
Rpmsg: App read local buf @0x00df7880
Can you clarify a little bit hte meaning of the seveal addresses?
I think that @shared
points to the original buffer in the shared memory.
Otherwise store @local
and App read local
buf should be the memory addresses where rpmsg reads the message.
Is this right?
Which kind of checks should I do on this addresses?
I have a full log (more or less 2500 messages) which exhibits the issue and I like finding the reason.
Dear @vix
Your assumptions are correct.
You have to look at these addresses in context with the initialization messages in the beginning:
Rpmsg: Map sharedMem phys 0x3f060000 -> logic 0x000f0000
Rpmsg: Map vring[0] phys 0x3f070000 -> logic 0x00100000
Rpmsg: Map vring[1] phys 0x3f074000 -> logic 0x00110000
Rpmsg: got M4 data[0] @shared 0x000f0210, store @local buf 0x00043ee0
Rpmsg: App read local buf @0x00043ee0
Rpmsg: got M4 data[1] @shared 0x000f0410, store @local buf 0x00043ee0
Rpmsg: App read local buf @0x00043ee0
Please consider
Map sharedMem phys 0x3f060000 -> logic 0x000f0000
, so all accesses to the memory range 0x000f0000
further down do actually access the physical memory @ 0x3f060000
. The translated address can change, in your example above, it seems to be 0x00bd0000
.This means you can check all addresses @ 0x000f0000
, translate them and verify that your M4 software has written to the translated address just before.
Regards, Andy
Hi @andy.tx
translated addresses seems ok because at every message the address increments by 0x200
(i.e. 512 bytes) and it overlaps after 0x2000 (as expected).
Does the application (i.e. Rpmsg_Read
) read data from the translated addresse or from the local buffers?
Is it possible to print out the number of received bytes?
Rpmsg: got M4 data[0] xxx bytes @shared 0x000f0210, store @local buf 0x00043ee0
It would be useful for my investigation
Dear @vix
Rpmsg_Read()
itself reads only from the local buffer.got M4 data[0] ...
.It is possible to print the number of bytes, but it is probably not helpful: The message data from M4 to A5 always uses exactly 512 bytes. The number of bytes would just be an indication how many of the 512 bytes are useful data, and how many are unused. If I am correct these unused bytes are cleared to zero on the M4 side.
Regards, Andy
Hi @andy.tx
I modified something in my application and I added a messageID to every message sent from A5 to M4.
M4 copies this messageID into the answer message and sends to A5.
I found what I expected: randomly (sometimes after some seconds, sometimes after minutes) A5 core is one message back: it sends messageID = X
but it reads back messageID = X-1
.
The translated addresses seems ok.
How can I explain this behavior?
Do you mean copying inside ToradexCElibraries\libs\lib
folder (side by side to Toradex_CE600 (ARMv4I)Release
folder)?
Hi @andy.tx
my investigation is going on and I have more info (useful, I hope).
For some reasons (I can’t guess why) sometimes WaitForSingleObject
returns an error (I’m going to investigate which kind of error) even if M4 core received the rpmsg and answered in the expected way. Let’s suppose this message has ID = X
.
The following call to Rpmsg_Write() sends a new message to M4 core (ID = X+1
) but hte matching Rpms_Read() returns the previous message ID = X
(which has been received properly).
And so on. Tx and Rx are now misaligned…
I’m going to investigate on WaitForSingleObject
, but you’re welcome if you have ideas.
Dear @vix,
The most common error is a TIMEOUT. And as it seems that your application continues successfully after this error, you simply need to change your algorithm to repeat the WaitForSingleObject ()
if it was not successful (or use a longer timeout).
Regards, Andy
Yes, I confirm it’s a timeout (100 ms).
The strange thing (from my side) is that I used some GPIO to debug M4 core and I’m sure the answer takes less than 5 ms. And 100 ms seems longer enough to me.
Anyway I’m going to increse it and see what happens.
Dear @vix
If a WinCe thread with a higher priority than your application is busy for more than 100ms, your application is blocked for this time. Even if there are multiple busy threads of equal priority, they will alternate in a cycle of 100ms each.
You can either enlarge the timeout, or give your application a higher priority.
Regards, Andy
Since the topic is quite long, I summarize the final results.
The sporadic issues I saw came from two different issues:
0x3f060000
and 0x3f064000
is used by rpmsg library, and therefore not available for anything else. The space reserved is used for 32 buffers, 512 bytes each and must be left empty by the M4 firmware.Rpmsg_Read()
is sometimes too short. I increased it to 500 ms (that is ok for my needs)Fixing these two issues solved my problem.