Rare rpmsg issue on VF61 with CE6

Several years ago I’ve already struggled against rpmsg issues on Colibri VF running CE6.
I opened some topics:

and I worked together with Toradex engineers to investigate and patch the rpmsg library for CE6.
As far as I can say, Windows Embedded CE 6.0 v 1.8 together with Toradex CE Libraries v 2.5 work almost well.
I mean that we produced some hundreds of devices without any problem.

  • In every device, the rpmsg communication runs continously with a message every second.
  • the Cortex-A includes a counter in every message (at every message the counter is incremented by 1)
  • the Cortex-M copies the counter from the request into its response. So the value of the counter in the response matches the value of the counter in the request.

But we have some of the devices (I mean 4 or 5) where quite often (let’s say every hour, or something like that) the rpmsg communication fails in this way:

  • the hReceiveGlobal created over "DataAvailableEvent" is fired as expected
  • but when Rpmsg_Read() is called, the counter in the response is 1 less than the counter in the request.
    This means that rpmsg library returns (one more time) the previous response.

I can confirm that if a SoM shows this issue, it appears quite often.
A lot of SoM never show the issue.
I suspect some clock/timing/concurrence in the library itself, but it’s almost impossible for me debugging this.

And the customer who has that specific faulty device is not happy and he’s complaining.
The only solution that we have is replacing it (free of charge).

Can someone from Toradex help investigating this?