I’ve just upgraded CE6 from release 1.6b4 to 1.7b2 without modifying my application (which uses Toradex CE Libraries 2.3-20181011).
I’ve noticed that the behavior of rpmsg communication changed under some rare conditions:
if I send packets from A5 to M4 at a high rate (i.e. a packet every 200 ms), sometimes an answer packet from M4 get lost.
Sadly, it is a situation similar to the old one that I was able to fix one year ago.
Maybe this new release of CE6 needs a higher timeout, or there is some other hidden problem.
I need time to investigate, but if someone can give some idea it would be useful.
Thank you for contacting the community!
I guess, you already know we are releasing Rpmsg_Demo application along with Toradex CE Libraries 2.3-20181011 release. Could you test the standalone Rpmsg demo application and let us know if you are seeing any difference between 1.6 vs 1.7B2.
As you know, we did most of the changes for the Flash filesystem and driver layer : https://developer.toradex.com/software/windows-embedded-compact/vf50-vf61-wec-software/release-details, maybe if you are accessing Flash filesystem or NAND block regions along with Rpmsg that would be the reason for the problem. Hence we request you to test with standalone Rpmsg demo application and let us know the feedback.
If you do any modification on our demo application for the issue, please share it with us. We will look the issue with that.
Hello @raja.tx I need some time to investigate, but I let you know
Based on the first investigations I suspect a kind of multithread issue.
@raja.tx can you confirm that RPMsg library is not multithread-safe (and so special care must be take when used in a multithread application)?
I need your support to discuss a couple of ideas that came to my mind on the multi-thread usage of Toradex CE Libraries.
Yes, we can schedule a skype meeting tomorrow morning 10:00 CET.
Is this ok for you?
You can send me a private message to organize the call
I sent you a private message with some additional information and some questions.
I wrote a reply through private mail. You could write reply to the private mail or here.
If you found the solution, could you update us? then I will update here
For sure I’m going to update you.
I need time because the issue is really difficult to debug, but I have some ideas.
is it possible to compile my application in release (with VS2008) but link it with the debug version of ToradexCElibraries?
I mean the files *.lib in
libs\lib\Toradex_CE600 (ARMv4I)Debug folder
Of course, you can do this.
Delete following line
"AdditionalLibraryDirectories="..\..\libs\lib\$(PlatformName)$(ConfigurationName)\; ..\..\libs\dll\$(PlatformName)$(ConfigurationName)\"" in the
TdxExe.vsprops file if you are using our library release demo.
Set this one :
AdditionalLibraryDirectories="..\TdxLibs\libs\lib\Toradex_CE600 (ARMv4I)Debug" under VS project properties → LInked → General.
and then you can build the application with the debug library linked.
I need some time to narrow down the real origin of the issue, but in the meanwhile:
are the release version of ToradexCElibraries (especially rpmsg) built with some optimization (/O2, /O3) that can produce a violation of the “as-if” rule?
I wonder if some missing
volatile modifer or something like that could be the reason…
From my test it seems that the rpmsg fails with the libraries built in release, and it doesn’t fail with the libraries built in debug.
One more question: does rpmsg implementation on CE6 side use a FIFO for the messages whose length is either 16 or 32?
As you stated, this would be a release build compiler optimization problem.
Documentation link: https://docs.microsoft.com/en-us/cpp/build/reference/o1-o2-minimize-size-maximize-speed?view=vs-2017
Could you share a reproducible project with us? It would help me to investigate assembly code generation between release and debug build and let you know if found out something from that.
Are you asking about Libray FIFO length? then it is 1024 messages that can be stored on the library message buffer.
it’s not so easy for me giving an answer, unfortunately.
I can say that I’ve never seen an issue when my application is built in Debug (with “no optimization” and with /O2) statically linked to the debug version of ToradexCELibraries.
I saw the issue once with my application built in Release with “no optimization” and statically linked to the releasevversion of ToradexCELibraries.
It seems that the several configuration when I build my code don’t make the difference, but the ToradexCELibraries debug vs release do.
But as I wrote above, I cannot be sure, because the issue comes time to time and so I need a lot of time after every change.
In this moment I can’t share a reproducible project, but I’ve doing my best.
I asked about the FIFO lenght because I saw a strange issue once: M4 answers to all the requests (I put a message ID in every message), and suddenly Rpmsg_Read() returned an old message (ID was 16 less that the expected value).
I’m sure that ID has been received 16 messgaes before and so I suspected some error in the pointer to the FIFO.
This is all I can say at the moment.
I’m quite ready to share a project (not a small one, unfortunately) that reproduces the rpmsg issue after a couple of hours.
Should I send you a private message with all the files necessary?
you can run my application on Colibri Evaluation Board, but you need two additional hardware devices:
- 4.3" display WQVGA 480*272 (I think you can find one of them)
- a matrix keypad connected to some specific IO pins (and I think this can be a problem)
What do you think?
Yes, I have 4.3-inch display. I do have a matrix keypad or I will try to do some hack to reproduce the issue with your application.
Let you know if the matrix keypad is really blocking to reproduce the issue.
You are from Italy, right? I hope you are well now. Be safe during the touch period!
I’ll take some more time to see if I can get an application without the need of matrix keypad.
It needs to press two keys: the first one toleave the main page and the second one to start the acquisition. Nothing else.
Yes, I’m from Italy. I’ve been doing my best
I confirm the sporadic issue in getting the right answer message.
I send one message from A5 to M4 every 1" and the ID in the message starts from 1.
Here is what I see on the debug console after some hours:
Rpmsg_Read() returned the wrong message id (tx : 10683 - rx : 10667)
Rpmsg_Read() returned the wrong message id (tx : 12462 - rx : 12446)
Rpmsg_Read() returned the wrong message id (tx : 20254 - rx : 20238)
You can see that all of them, the rx ID is 16 less than the tx ID.
All the other messages have been properly received.
Maybe this information can help you investigating inside rpmsg library when optimization in on.
@raja.tx Maybe you can use a USB keyboard on the Evaluation Board.
It should work.
I let you know as soon as possible