Hello all,
We are in the process of porting our application from WinCE to embedded linux and we have encountered a crash that is proving extremely difficult to find. It is reproducible but because our application is multi-threaded, we could not pinpoint the location of the crash.
We are able to do remote debugging with VisualStudio but whenever that crash happens, the stack trace does not give any useful information:
> libc.so.6![Unknown/Just-In-Time compiled code]
libc.so.6!raise
libc.so.6!abort
libc.so.6![Unknown/Just-In-Time compiled code]
We installed a signal-handler to the application that uses backtrace() and backtrace_symbols_fd() but that did not give us any more information than the debugger did:
malloc(): mismatching next->prev_size (unsorted)
Error: signal 6:
/var/iris/application/FireFox.out(_Z13signalHandleri+0x28)[0x13f9e90]
/lib/libc.so.6[0x4de1b260]
/lib/libc.so.6[0x4de0c1d6]
/lib/libc.so.6(gsignal+0x6b)[0x4de1a874]
/lib/libc.so.6(abort+0xa3)[0x4de0bc60]
The first line above is an error output from libc (it is not from our code) while the second line is code from our signal-handler function. The next lines are from backtrace_symbols_fd() and shows basically the same information as the debugger. The only new information was our signal-handler function.
The crash happens when the application is destroying an object that has several sub-objects that are running on their own threads. The error output from libc tells us that the crash is being caused by memory-management issues. However, when I set the environment variable MALLOC_CHECK_ to 1,2, or 3, the crash doesn’t happen but I also do not get diagnostic/warning messages so it doesn’t help find the malloc/free issue.
Does anyone know how to get a full (or at least useful) backtrace from SIGABRT? I tried the signal-handler function with SIGSEGV and there it works. It can read the stack frames from our application’s code and not just those inside libc.
Also, does anyone have an idea why setting the MALLOC_CHECK_ variable appears to “fix” the crash? I used the term “fix” because in the case of MALLOC_CHECK_=2 or MALLOC_CHECK=3, the program should still abort() if problems were found. But it did not and we never see warnings for any malloc() problems. It is also a bit strange that MALLOC_CHECK=0 is supposed to silently ignore errors but in this case, the crash (SIGABRT) happens – same as if MALLOC_CHECK_ was not defined.
Here is the version we are using:
GNU C Library (GNU libc) stable release version 2.30.
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 9.2.0 20190812 (iris-r1).
libc ABIs: UNIQUE ABSOLUTE
For bug reporting instructions, please see:
<http://www.gnu.org/software/libc/bugs.html>.
Thanks in advance,
Dennis Lauzon