No backtrace on SIGABRT caused by malloc error

Dennis · December 8, 2021, 3:17pm

Hello all,

We are in the process of porting our application from WinCE to embedded linux and we have encountered a crash that is proving extremely difficult to find. It is reproducible but because our application is multi-threaded, we could not pinpoint the location of the crash.

We are able to do remote debugging with VisualStudio but whenever that crash happens, the stack trace does not give any useful information:

>	libc.so.6![Unknown/Just-In-Time compiled code]	
 	libc.so.6!raise	
 	libc.so.6!abort	
 	libc.so.6![Unknown/Just-In-Time compiled code]

We installed a signal-handler to the application that uses backtrace() and backtrace_symbols_fd() but that did not give us any more information than the debugger did:

malloc(): mismatching next->prev_size (unsorted)
Error: signal 6:
/var/iris/application/FireFox.out(_Z13signalHandleri+0x28)[0x13f9e90]
/lib/libc.so.6[0x4de1b260]
/lib/libc.so.6[0x4de0c1d6]
/lib/libc.so.6(gsignal+0x6b)[0x4de1a874]
/lib/libc.so.6(abort+0xa3)[0x4de0bc60]

The first line above is an error output from libc (it is not from our code) while the second line is code from our signal-handler function. The next lines are from backtrace_symbols_fd() and shows basically the same information as the debugger. The only new information was our signal-handler function.

The crash happens when the application is destroying an object that has several sub-objects that are running on their own threads. The error output from libc tells us that the crash is being caused by memory-management issues. However, when I set the environment variable MALLOC_CHECK_ to 1,2, or 3, the crash doesn’t happen but I also do not get diagnostic/warning messages so it doesn’t help find the malloc/free issue.

Does anyone know how to get a full (or at least useful) backtrace from SIGABRT? I tried the signal-handler function with SIGSEGV and there it works. It can read the stack frames from our application’s code and not just those inside libc.

Also, does anyone have an idea why setting the MALLOC_CHECK_ variable appears to “fix” the crash? I used the term “fix” because in the case of MALLOC_CHECK_=2 or MALLOC_CHECK=3, the program should still abort() if problems were found. But it did not and we never see warnings for any malloc() problems. It is also a bit strange that MALLOC_CHECK=0 is supposed to silently ignore errors but in this case, the crash (SIGABRT) happens – same as if MALLOC_CHECK_ was not defined.

Here is the version we are using:

GNU C Library (GNU libc) stable release version 2.30.
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 9.2.0 20190812 (iris-r1).
libc ABIs: UNIQUE ABSOLUTE
For bug reporting instructions, please see:
<http://www.gnu.org/software/libc/bugs.html>.

Thanks in advance,
Dennis Lauzon

ritesh.tx · January 27, 2022, 4:54am

Hi @Dennis ,

Have you already solved this issue if not please try building toolchain using yocto build and compile libc with -funwind-tables and -fasynchronous-unwind-tables and see if that helps.

We don’t have much experience with the issue you mentioned but seems like to get full backtrace above flag should be enough.

Let me know if you have any queries.

Best Regards
Ritesh Kumar

Dennis · January 27, 2022, 8:10am

Hello @Ritesh,

This issue has been resolved for us.

As a background, I have to add that the yocto build generation is being done by another team. During communications with them, the flags you mentioned -funwind-tables and -fasynchronous-unwind-tables were also suggested however I could not ascertain whether they finally compiled the toolchain with those flags or not.

What did solve our problem was when they suggested that the debug version of the libraries (libc, etc) should be deployed to the device as well. I took for granted that remote debugging will use the debug symbols in the sdk stored on the PC where Visual Studio is running. But that is not the case and it seems that the glibc debug-package is not included in our yocto image so we have to deploy them manually.

Greetings,
Dennis Lauzon
Freescale iMX6SOLO rev1.4