Bug 58232 - Occasional crashes during garbage collection
Summary: Occasional crashes during garbage collection
Status: NEEDINFO
Alias: None
Product: Runtime
Classification: Mono
Component: GC (show other bugs)
Version: 5.0 (2017-02)
Hardware: PC Linux
: --- normal
Target Milestone: ---
Assignee: Bugzilla
URL:
Depends on:
Blocks:
 
Reported: 2017-07-19 06:47 UTC by Uwe Laas
Modified: 2017-09-05 14:42 UTC (History)
4 users (show)

See Also:
Tags:
Is this bug a regression?: ---
Last known good build:


Attachments

Description Uwe Laas 2017-07-19 06:47:16 UTC
We have a large web application running on mono 5.0.1.1 which crashes every few days. The logs always contain the following snippet:

Jul 13 11:34:01 app01a mono: 2017-07-13 11:34:01 Starting collection with heap size 449982952 bytes
Jul 13 11:34:01 app01a mono: Stacktrace:
Jul 13 11:34:01 app01a mono: at <unknown> <0xffffffff>
Jul 13 11:34:01 app01a mono: at (wrapper managed-to-native) object.__icall_wrapper_mono_gc_alloc_vector (intptr,intptr,intptr) [0x00000] in <3753d1715b8842d8bb13a30db0388b60>:0

Sometimes the trace is longer but these lines are always at the top. What kind of information can we provide to help resolve this issue?
Comment 1 Uwe Laas 2017-07-19 06:50:00 UTC
I forgot to add the config params we are using. Here they are:

MONO_GC_PARAMS="major=marksweep,max-heap-size=8g,nursery-size=64m"
MONO_GC_DEBUG="print-allowance"
Comment 2 Ludovic Henry 2017-07-20 14:30:04 UTC
Hello, could you please provide us with a full stack trace of the crash (native + managed stack trace)? A stack trace of the native code would be most useful, and if you could also provide us with a repro case, that would be great. This seems like a GC crash, and these ones are hard to track without a reliable repro case that we can stress test on our side. Thank you.
Comment 3 Uwe Laas 2017-07-21 10:45:26 UTC
Hello, we will provide a complete stack trace from the next crash and try to provide a self contained test case.
Comment 4 Uwe Laas 2017-09-05 06:40:22 UTC
Hello,
this is the best we have to offer: obviously the native code deadlocks while trying to unwind the managed stack. :-(

(gdb) t a a bt 30

Thread 26 (Thread 0x7ffa94a57780 (LWP 14492)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24943 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005e199d in mono_os_cond_timedwait ()
#5  0x00000000005e3816 in mono_w32handle_timedwait_signal_handle.constprop.8 ()
#6  0x00000000005e3a74 in mono_w32handle_wait_one ()
#7  0x00000000005e4129 in mono_w32handle_wait_multiple ()
#8  0x00000000005c4386 in mono_wait_uninterrupted.isra.23 ()
#9  0x00000000005c720b in ves_icall_System_Threading_WaitHandle_WaitOne_internal ()
#10 0x0000000040c74c98 in ?? ()
#11 0x00007ffe1a20ffd0 in ?? ()
#12 0x00007ffe1a20ff80 in ?? ()
#13 0x00000000011841f8 in ?? ()
#14 0x00007ffe1a20ff80 in ?? ()
#15 0x0000000000000000 in ?? ()

Thread 25 (Thread 0x7ffa0e0e3700 (LWP 29104)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 24 (Thread 0x7ffa0e7eb700 (LWP 29103)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 23 (Thread 0x7ffa0eb93700 (LWP 29102)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 22 (Thread 0x7ffa071db700 (LWP 13498)):
#0  0x00007ffa93f2742d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007ffa93f22de6 in _L_lock_870 () from /lib64/libpthread.so.0
#2  0x00007ffa93f22cdf in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x000000000057790f in mono_loader_lock ()
#4  0x0000000000521e8f in mono_get_seq_points ()
#5  0x0000000000521f81 in mono_find_prev_seq_point_for_native_offset ()
#6  0x00000000004aba91 in mono_walk_stack_full ()
#7  0x00000000004abed1 in mono_walk_stack_with_state ()
#8  0x00000000004abfc8 in mono_walk_stack ()
#9  0x00000000004ae210 in mono_handle_native_crash ()
#10 0x0000000000513c86 in altstack_handle_and_restore ()
#11 0x0000000000658e70 in major_copy_or_mark_object_canonical ()
#12 0x000000000000001a in ?? ()
#13 0x00007ffa88000000 in ?? ()
#14 0x00007ffa00000000 in ?? ()
#15 0x00007ffa071d54a0 in ?? ()
#16 0x00007ffa2c000e80 in ?? ()
#17 0x00007ffa071d5790 in ?? ()
#18 0x00007ffa2c000e80 in ?? ()
#19 0x0000000000000000 in ?? ()

Thread 21 (Thread 0x7ffa077f3700 (LWP 13497)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 20 (Thread 0x7ffa10fff700 (LWP 20934)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 19 (Thread 0x7ffa1adff700 (LWP 18294)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 18 (Thread 0x7ffa352ff700 (LWP 23084)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa939bd806 in _int_malloc () from /lib64/libc.so.6
#4  0x00007ffa939c0b64 in calloc () from /lib64/libc.so.6
#5  0x0000000000688bdb in g_calloc ()
#6  0x0000000000602d71 in reflection_methodbuilder_to_mono_method ()
#7  0x000000000060736d in ves_icall_DynamicMethod_create_dynamic_method ()
#8  0x0000000040c38f9e in ?? ()
#9  0x00007ffa8be8b8e0 in ?? ()
#10 0x00007ffa643befe8 in ?? ()
#11 0x0000000000000000 in ?? ()

Thread 17 (Thread 0x7ffa362fe700 (LWP 19641)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 16 (Thread 0x7ffa364ff700 (LWP 19640)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 15 (Thread 0x7ffa584fb700 (LWP 14568)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 14 (Thread 0x7ffa586fc700 (LWP 14567)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 13 (Thread 0x7ffa588fd700 (LWP 14566)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7ffa58afe700 (LWP 14565)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7ffa58cff700 (LWP 14564)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7ffa590cb700 (LWP 14563)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7ffa592cc700 (LWP 14562)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f2742b in __lll_lock_wait () from /lib64/libpthread.so.0
#4  0x00007ffa93f22dcb in _L_lock_812 () from /lib64/libpthread.so.0
#5  0x00007ffa93f22c98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#6  0x0000000000642f87 in sgen_gc_lock ()
#7  0x000000000061e7d5 in mono_gc_set_skip_thread ()
#8  0x00000000005ce633 in poll_event_wait ()
#9  0x00000000005cf7fc in selector_thread ()
#10 0x00000000005c4e76 in start_wrapper ()
#11 0x000000000067f848 in inner_start_thread ()
#12 0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#13 0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7ffa8478c700 (LWP 14506)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7ffa84991700 (LWP 14505)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7ffa84b96700 (LWP 14504)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7ffa84d9b700 (LWP 14503)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7ffa84fa0700 (LWP 14502)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x000000000068026b in mono_thread_info_sleep ()
#5  0x00000000005cc8e5 in monitor_thread ()
#6  0x00000000005c4e76 in start_wrapper ()
#7  0x000000000067f848 in inner_start_thread ()
#8  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#9  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7ffa862bc700 (LWP 14501)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005e194f in mono_os_cond_timedwait ()
#5  0x00000000005e3816 in mono_w32handle_timedwait_signal_handle.constprop.8 ()
#6  0x00000000005e3ac7 in mono_w32handle_wait_one ()
#7  0x00000000005e4129 in mono_w32handle_wait_multiple ()
#8  0x00000000005c4386 in mono_wait_uninterrupted.isra.23 ()
#9  0x00000000005c720b in ves_icall_System_Threading_WaitHandle_WaitOne_internal ()
#10 0x0000000040c74c98 in ?? ()
#11 0x0000000000000038 in ?? ()
#12 0x0000000001943698 in ?? ()
#13 0x0000000000000031 in ?? ()
#14 0x00007ffa88352298 in ?? ()
#15 0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7ffa86f17700 (LWP 14495)):
#0  0x00007ffa93f2742d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007ffa93f22dcb in _L_lock_812 () from /lib64/libpthread.so.0
#2  0x00007ffa93f22c98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000642f87 in sgen_gc_lock ()
#4  0x000000000061e7d5 in mono_gc_set_skip_thread ()
#5  0x00000000005ed50a in finalizer_thread ()
#6  0x00000000005c4e76 in start_wrapper ()
#7  0x000000000067f848 in inner_start_thread ()
#8  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#9  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7ffa8d416700 (LWP 14494)):
#0  0x00007ffa93f24945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000066abbf in thread_func ()
#2  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#3  0x00007ffa93a3834d in clone () from /lib64/libc.so.6
Comment 5 Ludovic Henry 2017-09-05 13:46:38 UTC
Thank you, but unfortunately the stack trace doesn't give us enough informations to figure out what's going wrong.

@Vlad, what other information could we request to get a better idea of what's going wrong?
Comment 6 Vlad Brezae 2017-09-05 14:42:52 UTC
The stacktrace doesn't make much sense. major_copy_or_mark_object_canonical should always be called from known code. There might be some symbol problems going on.

It could be useful to run mono with MONO_DEBUG=suspend-on-sigsegv so you can attach with gdb once it's crashing. This would  be especially useful if mono was built with compiler optimizations disabled (CFLAGS="-O0" to ./autogen.sh).

Having said that, it's very likely that this type of bug would need some sort of repro.

Note You need to log in before you can comment on or make changes to this bug.