Bug 58232 - Occasional crashes during garbage collection
Summary: Occasional crashes during garbage collection
Status: RESOLVED NORESPONSE
Alias: None
Product: Runtime
Classification: Mono
Component: GC (show other bugs)
Version: 5.0 (2017-02)
Hardware: PC Linux
: --- normal
Target Milestone: ---
Assignee: Bugzilla
URL:
Depends on:
Blocks:
 
Reported: 2017-07-19 06:47 UTC by Uwe Laas
Modified: 2018-01-24 16:23 UTC (History)
4 users (show)

See Also:
Tags:
Is this bug a regression?: ---
Last known good build:


Attachments

Description Uwe Laas 2017-07-19 06:47:16 UTC
We have a large web application running on mono 5.0.1.1 which crashes every few days. The logs always contain the following snippet:

Jul 13 11:34:01 app01a mono: 2017-07-13 11:34:01 Starting collection with heap size 449982952 bytes
Jul 13 11:34:01 app01a mono: Stacktrace:
Jul 13 11:34:01 app01a mono: at <unknown> <0xffffffff>
Jul 13 11:34:01 app01a mono: at (wrapper managed-to-native) object.__icall_wrapper_mono_gc_alloc_vector (intptr,intptr,intptr) [0x00000] in <3753d1715b8842d8bb13a30db0388b60>:0

Sometimes the trace is longer but these lines are always at the top. What kind of information can we provide to help resolve this issue?
Comment 1 Uwe Laas 2017-07-19 06:50:00 UTC
I forgot to add the config params we are using. Here they are:

MONO_GC_PARAMS="major=marksweep,max-heap-size=8g,nursery-size=64m"
MONO_GC_DEBUG="print-allowance"
Comment 2 Ludovic Henry 2017-07-20 14:30:04 UTC
Hello, could you please provide us with a full stack trace of the crash (native + managed stack trace)? A stack trace of the native code would be most useful, and if you could also provide us with a repro case, that would be great. This seems like a GC crash, and these ones are hard to track without a reliable repro case that we can stress test on our side. Thank you.
Comment 3 Uwe Laas 2017-07-21 10:45:26 UTC
Hello, we will provide a complete stack trace from the next crash and try to provide a self contained test case.
Comment 4 Uwe Laas 2017-09-05 06:40:22 UTC
Hello,
this is the best we have to offer: obviously the native code deadlocks while trying to unwind the managed stack. :-(

(gdb) t a a bt 30

Thread 26 (Thread 0x7ffa94a57780 (LWP 14492)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24943 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005e199d in mono_os_cond_timedwait ()
#5  0x00000000005e3816 in mono_w32handle_timedwait_signal_handle.constprop.8 ()
#6  0x00000000005e3a74 in mono_w32handle_wait_one ()
#7  0x00000000005e4129 in mono_w32handle_wait_multiple ()
#8  0x00000000005c4386 in mono_wait_uninterrupted.isra.23 ()
#9  0x00000000005c720b in ves_icall_System_Threading_WaitHandle_WaitOne_internal ()
#10 0x0000000040c74c98 in ?? ()
#11 0x00007ffe1a20ffd0 in ?? ()
#12 0x00007ffe1a20ff80 in ?? ()
#13 0x00000000011841f8 in ?? ()
#14 0x00007ffe1a20ff80 in ?? ()
#15 0x0000000000000000 in ?? ()

Thread 25 (Thread 0x7ffa0e0e3700 (LWP 29104)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 24 (Thread 0x7ffa0e7eb700 (LWP 29103)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 23 (Thread 0x7ffa0eb93700 (LWP 29102)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 22 (Thread 0x7ffa071db700 (LWP 13498)):
#0  0x00007ffa93f2742d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007ffa93f22de6 in _L_lock_870 () from /lib64/libpthread.so.0
#2  0x00007ffa93f22cdf in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x000000000057790f in mono_loader_lock ()
#4  0x0000000000521e8f in mono_get_seq_points ()
#5  0x0000000000521f81 in mono_find_prev_seq_point_for_native_offset ()
#6  0x00000000004aba91 in mono_walk_stack_full ()
#7  0x00000000004abed1 in mono_walk_stack_with_state ()
#8  0x00000000004abfc8 in mono_walk_stack ()
#9  0x00000000004ae210 in mono_handle_native_crash ()
#10 0x0000000000513c86 in altstack_handle_and_restore ()
#11 0x0000000000658e70 in major_copy_or_mark_object_canonical ()
#12 0x000000000000001a in ?? ()
#13 0x00007ffa88000000 in ?? ()
#14 0x00007ffa00000000 in ?? ()
#15 0x00007ffa071d54a0 in ?? ()
#16 0x00007ffa2c000e80 in ?? ()
#17 0x00007ffa071d5790 in ?? ()
#18 0x00007ffa2c000e80 in ?? ()
#19 0x0000000000000000 in ?? ()

Thread 21 (Thread 0x7ffa077f3700 (LWP 13497)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 20 (Thread 0x7ffa10fff700 (LWP 20934)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 19 (Thread 0x7ffa1adff700 (LWP 18294)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 18 (Thread 0x7ffa352ff700 (LWP 23084)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa939bd806 in _int_malloc () from /lib64/libc.so.6
#4  0x00007ffa939c0b64 in calloc () from /lib64/libc.so.6
#5  0x0000000000688bdb in g_calloc ()
#6  0x0000000000602d71 in reflection_methodbuilder_to_mono_method ()
#7  0x000000000060736d in ves_icall_DynamicMethod_create_dynamic_method ()
#8  0x0000000040c38f9e in ?? ()
#9  0x00007ffa8be8b8e0 in ?? ()
#10 0x00007ffa643befe8 in ?? ()
#11 0x0000000000000000 in ?? ()

Thread 17 (Thread 0x7ffa362fe700 (LWP 19641)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 16 (Thread 0x7ffa364ff700 (LWP 19640)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 15 (Thread 0x7ffa584fb700 (LWP 14568)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 14 (Thread 0x7ffa586fc700 (LWP 14567)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 13 (Thread 0x7ffa588fd700 (LWP 14566)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7ffa58afe700 (LWP 14565)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7ffa58cff700 (LWP 14564)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7ffa590cb700 (LWP 14563)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7ffa592cc700 (LWP 14562)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f2742b in __lll_lock_wait () from /lib64/libpthread.so.0
#4  0x00007ffa93f22dcb in _L_lock_812 () from /lib64/libpthread.so.0
#5  0x00007ffa93f22c98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#6  0x0000000000642f87 in sgen_gc_lock ()
#7  0x000000000061e7d5 in mono_gc_set_skip_thread ()
#8  0x00000000005ce633 in poll_event_wait ()
#9  0x00000000005cf7fc in selector_thread ()
#10 0x00000000005c4e76 in start_wrapper ()
#11 0x000000000067f848 in inner_start_thread ()
#12 0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#13 0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7ffa8478c700 (LWP 14506)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7ffa84991700 (LWP 14505)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7ffa84b96700 (LWP 14504)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7ffa84d9b700 (LWP 14503)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005cba99 in worker_thread ()
#5  0x00000000005c4e76 in start_wrapper ()
#6  0x000000000067f848 in inner_start_thread ()
#7  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7ffa84fa0700 (LWP 14502)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x000000000068026b in mono_thread_info_sleep ()
#5  0x00000000005cc8e5 in monitor_thread ()
#6  0x00000000005c4e76 in start_wrapper ()
#7  0x000000000067f848 in inner_start_thread ()
#8  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#9  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7ffa862bc700 (LWP 14501)):
#0  0x00007ffa93975572 in sigsuspend () from /lib64/libc.so.6
#1  0x00000000006819ee in suspend_signal_handler ()
#2  <signal handler called>
#3  0x00007ffa93f24cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005e194f in mono_os_cond_timedwait ()
#5  0x00000000005e3816 in mono_w32handle_timedwait_signal_handle.constprop.8 ()
#6  0x00000000005e3ac7 in mono_w32handle_wait_one ()
#7  0x00000000005e4129 in mono_w32handle_wait_multiple ()
#8  0x00000000005c4386 in mono_wait_uninterrupted.isra.23 ()
#9  0x00000000005c720b in ves_icall_System_Threading_WaitHandle_WaitOne_internal ()
#10 0x0000000040c74c98 in ?? ()
#11 0x0000000000000038 in ?? ()
#12 0x0000000001943698 in ?? ()
#13 0x0000000000000031 in ?? ()
#14 0x00007ffa88352298 in ?? ()
#15 0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7ffa86f17700 (LWP 14495)):
#0  0x00007ffa93f2742d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007ffa93f22dcb in _L_lock_812 () from /lib64/libpthread.so.0
#2  0x00007ffa93f22c98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000642f87 in sgen_gc_lock ()
#4  0x000000000061e7d5 in mono_gc_set_skip_thread ()
#5  0x00000000005ed50a in finalizer_thread ()
#6  0x00000000005c4e76 in start_wrapper ()
#7  0x000000000067f848 in inner_start_thread ()
#8  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#9  0x00007ffa93a3834d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7ffa8d416700 (LWP 14494)):
#0  0x00007ffa93f24945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000066abbf in thread_func ()
#2  0x00007ffa93f20e25 in start_thread () from /lib64/libpthread.so.0
#3  0x00007ffa93a3834d in clone () from /lib64/libc.so.6
Comment 5 Ludovic Henry 2017-09-05 13:46:38 UTC
Thank you, but unfortunately the stack trace doesn't give us enough informations to figure out what's going wrong.

@Vlad, what other information could we request to get a better idea of what's going wrong?
Comment 6 Vlad Brezae 2017-09-05 14:42:52 UTC
The stacktrace doesn't make much sense. major_copy_or_mark_object_canonical should always be called from known code. There might be some symbol problems going on.

It could be useful to run mono with MONO_DEBUG=suspend-on-sigsegv so you can attach with gdb once it's crashing. This would  be especially useful if mono was built with compiler optimizations disabled (CFLAGS="-O0" to ./autogen.sh).

Having said that, it's very likely that this type of bug would need some sort of repro.
Comment 7 Ludovic Henry 2018-01-24 16:23:52 UTC
Please provide a reproduction case or information as requested at https://bugzilla.xamarin.com/show_bug.cgi?id=58232#c6 to reopen. Thank you.

Note You need to log in before you can comment on or make changes to this bug.