Hi
This question is not directly related to the Colibri VF chip. Only through the fact, that my software is run on it.
I’m facing a very strange behavior which causes the system to go dark after some error, making it impossible to debug because there is no way to access anything.
The story goes like this:
Of 4 identical test systems, 3 of them “crash” after 4 to 10 hours, whereas the last one runs for days without a problem. Sometimes all of them Crash, but never at the same time and always within a broad time frame.
This “crash” somehow kills the network stack, and with it my only connection to the system, preventing me from acquiring any kind of runtime information.
If there is an exception message, I am unable to persist it. My log files do not show anything out of the ordinary.
I tried various different angles without any success. I reduced the Stack to 1/10 of the original stack size, but the application does not crash earlier. Also, when doubling the stack size, it does not crash later. Therefore, I don’t believe in a Stack Overflow in this case.
When running a Debug-build, the software runs indefinitely, without any problems, crushing my hope of a useful core-dump. Running a release build will never actually write a core dump.
Debugging on another platform is nearly impossible, since the crash only seems to manifest itself when other hardware, like a motor, is utilized which I cannot use on anything other than our own proprietary motherboard.
At this point, after soon to be 2 months of trial and error, I don’t know what else I could try. Without any reliable access to the system, I can only speculate on what might be the cause, but I can never really scrutinize my hypothesis.
I’m looking for any kind of input which could give me a new idea on how to tackle this issue. My money is on SIGSEGV, some problem when accessing memory.
Please give me some insight about how you would handle a problem such as this.