Bug 43921 - System.Threading.ThreadHelper.ThreadStart_Context tries to allocate, crashes
Summary: System.Threading.ThreadHelper.ThreadStart_Context tries to allocate, crashes
Status: RESOLVED FIXED
Alias: None
Product: Runtime
Classification: Mono
Component: GC (show other bugs)
Version: master
Hardware: PC Mac OS
: --- normal
Target Milestone: ---
Assignee: Aleksey Kliger
URL:
Depends on:
Blocks:
 
Reported: 2016-08-31 21:58 UTC by Andi McClure
Modified: 2016-09-28 14:22 UTC (History)
4 users (show)

See Also:
Tags:
Is this bug a regression?: ---
Last known good build:


Attachments

Description Andi McClure 2016-08-31 21:58:41 UTC
It is possible this was hiding inside the noise of bugzilla 43320, but I think this is something that has appeared in the last week.

We are currently seeing a relatively frequent SIGSEGV during tests which either concern domain unloading, or the "new-threads-dont-join-stw" tests (this concerns what happens when a thread is created while the GC is trying to enter an stw). The managed stack looks like:

                                                +++++++++++++++++++
                                                STACK TRACE:
                                                Stacktrace:

  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) object.__icall_wrapper_mono_gc_alloc_vector (intptr,intptr,intptr) <0x00012>
  at (wrapper alloc) object.AllocVector (intptr,intptr) <0x0010b>
  at Driver.AllocStuff () <0x0003b>
  at Driver.BackgroundNoise () <0x0000f>
  at System.Threading.ThreadHelper.ThreadStart_Context (object) <0x0008b>
  at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) <0x001b2>
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) <0x00033>
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext,System.Threading.ContextCallback,object) <0x0005a>
  at System.Threading.ThreadHelper.ThreadStart () <0x00039>
  at (wrapper runtime-invoke) object.runtime_invoke_void__this__ (object,intptr,intptr,intptr) <0x00092>

It always concerns __icall_wrapper_mono_gc_alloc_vector being indirectly called by ThreadStart_Context-- what occurs between those two frames varies.

The native stack is never recovered.

Some examples:

https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-amd64/4747/testReport/MonoTests/sgen-regular-tests-ms-split-95/sgen_domain_unload_2_exe/
https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-i386/4751/testReport/MonoTests/sgen-regular-tests-ms-split-clear-at-gc/sgen_domain_unload_2_exe/
https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-i386/4742/testReport/MonoTests/sgen-regular-tests-ms-split-95/sgen_domain_unload_2_exe/
https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-i386/4744/testReport/MonoTests/sgen-regular-tests-ms-split-clear-at-gc/sgen_new_threads_dont_join_stw_2_exe/

The issue seems(?) to only occur on mac.
Comment 1 Ludovic Henry 2016-09-01 10:44:01 UTC
Here is a more complete stacktrace:

> ludovic-laptop :: ~/Xamarin/mono ‹097c8e1› » for i in `seq 1 100`; do; MONO_PATH=mcs/class/lib/net_4_x mono/mini/mono mono/tests/sgen-domain-unload-2.exe || break; done                                                                                                  130 ↵
> .....................
> ....................
> ....................
> ....................
> ..........Stacktrace:
> 
>   at <unknown> <0xffffffff>
>   at (wrapper managed-to-native) object.__icall_wrapper_mono_gc_alloc_vector (intptr,intptr,intptr) <0x00065>
>   at (wrapper alloc) object.AllocVector (intptr,intptr) <0x00167>
>   at Driver.AllocStuff () <0x00051>
>   at Driver.BackgroundNoise () <0x00014>
>   at System.Threading.ThreadHelper.ThreadStart_Context (object) <0x000a6>
>   at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) <0x00233>
>   at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) <0x00028>
>   at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext,System.Threading.ContextCallback,object) <0x00061>
>   at System.Threading.ThreadHelper.ThreadStart () <0x00039>
>   at (wrapper runtime-invoke) object.runtime_invoke_void__this__ (object,intptr,intptr,intptr) <0x000e0>
> 
> Native stacktrace:
> 
> 
> Debug info from gdb:
> 
> (lldb) command source -s 0 '/tmp/mono-gdb-commands.nUWn9e'
> Executing commands in '/tmp/mono-gdb-commands.nUWn9e'.
> (lldb) process attach --pid 18529
> Process 18529 stopped
> * thread #1: tid = 0x2a6c046, 0x0000000106f60af0 mono`mono_string_new_handle + 5 at handle.c:62, name = 'tid_50f', queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>     frame #0: 0x0000000106f60af0 mono`mono_string_new_handle + 5 at handle.c:62
>    59
>    60  	retry:
>    61  		if (G_LIKELY (top->size < OBJECTS_PER_HANDLES_CHUNK)) {
> -> 62  			MonoObject **h = &top->objects [top->size++];
>    63  			*h = object;
>    64  			return h;
>    65  		}
> 
> Executable module set to "/Users/ludovic/Xamarin/mono/mono/mini/mono".
> Architecture set to: x86_64h-apple-macosx.
> (lldb) thread list
> Process 18529 stopped
> * thread #1: tid = 0x2a6c046, 0x0000000106f60af0 mono`mono_string_new_handle + 5 at handle.c:62, name = 'tid_50f', queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>   thread #2: tid = 0x2a6c05d, 0x00007fff93cfcdb6 libsystem_kernel.dylib`__psynch_cvwait + 10, name = 'SGen worker'
>   thread #3: tid = 0x2a6c05e, 0x00007fff93cf6fae libsystem_kernel.dylib`semaphore_wait_trap + 10, name = 'Finalizer'
>   thread #4: tid = 0x2a6c060, 0x00007fff93cfdefa libsystem_kernel.dylib`kevent_qos + 10, queue = 'com.apple.libdispatch-manager'
>   thread #5: tid = 0x2a6c067, 0x00007fff93cfcde6 libsystem_kernel.dylib`__psynch_mutexwait + 10, name = 'tid_2e0b'
>   thread #6: tid = 0x2a6c068, 0x00007fff93cfcde6 libsystem_kernel.dylib`__psynch_mutexwait + 10, name = 'tid_230f'
>   thread #7: tid = 0x2a6c069, 0x00007fff93cfcde6 libsystem_kernel.dylib`__psynch_mutexwait + 10, name = 'tid_310b'
>   thread #8: tid = 0x2a6c06a, 0x00007fff93cfd582 libsystem_kernel.dylib`__wait4 + 10, name = 'tid_340b'
> (lldb) thread backtrace all
> warning: could not load any Objective-C class information. This will significantly reduce the quality of type information available.
> * thread #1: tid = 0x2a6c046, 0x0000000106f60af0 mono`mono_string_new_handle + 5 at handle.c:62, name = 'tid_50f', queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>   * frame #0: 0x0000000106f60af0 mono`mono_string_new_handle + 5 at handle.c:62
>     frame #1: 0x0000000106f60aeb mono`mono_string_new_handle(domain=<unavailable>, data=" 4\x1d}�, error=<unavailable>) + 43 at handle.c:162
>     frame #2: 0x0000000106edffff mono`ves_icall_System_RuntimeType_getFullName(object=<unavailable>, full_name=<unavailable>, assembly_qualified=<unavailable>, error=0x00007fff58e7b9c8) + 127 at icall.c:5222
>     frame #3: 0x00000001076c278d
>     frame #4: 0x00000001076ec710
>     frame #5: 0x00000001077a8d0b
>     frame #6: 0x00000001077a8c60
>     frame #7: 0x000000010775c95c
>     frame #8: 0x000000010775c577
>     frame #9: 0x000000010775bc3e
>     frame #10: 0x0000000107759eb5
>     frame #11: 0x000000010776f151
>     frame #12: 0x000000010776d886
> 
>   thread #2: tid = 0x2a6c05d, 0x00007fff93cfcdb6 libsystem_kernel.dylib`__psynch_cvwait + 10, name = 'SGen worker'
>     frame #0: 0x00007fff93cfcdb6 libsystem_kernel.dylib`__psynch_cvwait + 10
>     frame #1: 0x00007fff8b9b5728 libsystem_pthread.dylib`_pthread_cond_wait + 767
>     frame #2: 0x0000000106fdbff0 mono`thread_func [inlined] mono_os_cond_wait(mutex=0x00000001070edf60) + 15 at mono-os-mutex.h:150
>     frame #3: 0x0000000106fdbfe1 mono`thread_func(thread_data=0x0000000000000000) + 609 at sgen-thread-pool.c:110
>     frame #4: 0x00007fff8b9b499d libsystem_pthread.dylib`_pthread_body + 131
>     frame #5: 0x00007fff8b9b491a libsystem_pthread.dylib`_pthread_start + 168
>     frame #6: 0x00007fff8b9b2351 libsystem_pthread.dylib`thread_start + 13
> 
>   thread #3: tid = 0x2a6c05e, 0x00007fff93cf6fae libsystem_kernel.dylib`semaphore_wait_trap + 10, name = 'Finalizer'
>     frame #0: 0x00007fff93cf6fae libsystem_kernel.dylib`semaphore_wait_trap + 10
>     frame #1: 0x0000000106f6adc4 mono`finalizer_thread [inlined] mono_os_sem_wait(flags=MONO_SEM_FLAGS_ALERTABLE) + 11 at mono-os-semaphore.h:90
>     frame #2: 0x0000000106f6adb9 mono`finalizer_thread + 11 at mono-coop-semaphore.h:40
>     frame #3: 0x0000000106f6adae mono`finalizer_thread(unused=<unavailable>) + 174 at gc.c:907
>     frame #4: 0x0000000106f4332a mono`start_wrapper [inlined] start_wrapper_internal + 680 at threads.c:759
>     frame #5: 0x0000000106f43082 mono`start_wrapper(data=<unavailable>) + 50 at threads.c:821
>     frame #6: 0x0000000107003228 mono`inner_start_thread(arg=<unavailable>) + 488 at mono-threads-posix.c:154
>     frame #7: 0x00007fff8b9b499d libsystem_pthread.dylib`_pthread_body + 131
>     frame #8: 0x00007fff8b9b491a libsystem_pthread.dylib`_pthread_start + 168
>     frame #9: 0x00007fff8b9b2351 libsystem_pthread.dylib`thread_start + 13
> 
>   thread #4: tid = 0x2a6c060, 0x00007fff93cfdefa libsystem_kernel.dylib`kevent_qos + 10, queue = 'com.apple.libdispatch-manager'
>     frame #0: 0x00007fff93cfdefa libsystem_kernel.dylib`kevent_qos + 10
>     frame #1: 0x00007fff9a974165 libdispatch.dylib`_dispatch_mgr_invoke + 216
>     frame #2: 0x00007fff9a973dcd libdispatch.dylib`_dispatch_mgr_thread + 52
> 
>   thread #5: tid = 0x2a6c067, 0x00007fff93cfcde6 libsystem_kernel.dylib`__psynch_mutexwait + 10, name = 'tid_2e0b'
>     frame #0: 0x00007fff93cfcde6 libsystem_kernel.dylib`__psynch_mutexwait + 10
>     frame #1: 0x00007fff8b9b4e4a libsystem_pthread.dylib`_pthread_mutex_lock_wait + 89
>     frame #2: 0x0000000106fb687a mono`sgen_gc_lock [inlined] mono_os_mutex_lock + 12 at mono-os-mutex.h:96
>     frame #3: 0x0000000106fb686e mono`sgen_gc_lock + 42 at mono-coop-mutex.h:53
>     frame #4: 0x0000000106fb6844 mono`sgen_gc_lock + 4 at sgen-gc.c:3165
>     frame #5: 0x0000000106fa84d0 mono`mono_gc_alloc_vector(vtable=0x00007fc60300c810, size=288, max_length=249) + 112 at sgen-mono.c:1743
>     frame #6: 0x0000000107531d16
>     frame #7: 0x00000001075351b2
>     frame #8: 0x0000000107534ad9
>     frame #9: 0x0000000106d91a57 mono`mono_jit_runtime_invoke(method=<unavailable>, obj=<unavailable>, params=0x0000700000622e68, exc=<unavailable>, error=0x00000000000000f9) + 2215 at mini-runtime.c:2553
>     frame #10: 0x0000000106f726a8 mono`do_runtime_invoke(method=0x00007fc602000250, obj=0x0000000107803ad0, params=0x0000700000622e68, exc=0x0000000000000000, error=0x0000700000622e00) + 88 at object.c:2949
>     frame #11: 0x0000000106f43430 mono`start_wrapper + 942 at threads.c:765
>     frame #12: 0x0000000106f43082 mono`start_wrapper(data=<unavailable>) + 50 at threads.c:821
>     frame #13: 0x0000000107003228 mono`inner_start_thread(arg=<unavailable>) + 488 at mono-threads-posix.c:154
>     frame #14: 0x00007fff8b9b499d libsystem_pthread.dylib`_pthread_body + 131
>     frame #15: 0x00007fff8b9b491a libsystem_pthread.dylib`_pthread_start + 168
>     frame #16: 0x00007fff8b9b2351 libsystem_pthread.dylib`thread_start + 13
> 
>   thread #6: tid = 0x2a6c068, 0x00007fff93cfcde6 libsystem_kernel.dylib`__psynch_mutexwait + 10, name = 'tid_230f'
>     frame #0: 0x00007fff93cfcde6 libsystem_kernel.dylib`__psynch_mutexwait + 10
>     frame #1: 0x00007fff8b9b4e4a libsystem_pthread.dylib`_pthread_mutex_lock_wait + 89
>     frame #2: 0x0000000106fb687a mono`sgen_gc_lock [inlined] mono_os_mutex_lock + 12 at mono-os-mutex.h:96
>     frame #3: 0x0000000106fb686e mono`sgen_gc_lock + 42 at mono-coop-mutex.h:53
>     frame #4: 0x0000000106fb6844 mono`sgen_gc_lock + 4 at sgen-gc.c:3165
>     frame #5: 0x0000000106fa84d0 mono`mono_gc_alloc_vector(vtable=0x00007fc60300c810, size=272, max_length=238) + 112 at sgen-mono.c:1743
>     frame #6: 0x0000000107531d16
>     frame #7: 0x00000001075351b2
>     frame #8: 0x0000000107534ad9
>     frame #9: 0x0000000106d91a57 mono`mono_jit_runtime_invoke(method=<unavailable>, obj=<unavailable>, params=0x0000700000825e68, exc=<unavailable>, error=0x00000000000000ee) + 2215 at mini-runtime.c:2553
>     frame #10: 0x0000000106f726a8 mono`do_runtime_invoke(method=0x00007fc602000250, obj=0x0000000107803c50, params=0x0000700000825e68, exc=0x0000000000000000, error=0x0000700000825e00) + 88 at object.c:2949
>     frame #11: 0x0000000106f43430 mono`start_wrapper + 942 at threads.c:765
>     frame #12: 0x0000000106f43082 mono`start_wrapper(data=<unavailable>) + 50 at threads.c:821
>     frame #13: 0x0000000107003228 mono`inner_start_thread(arg=<unavailable>) + 488 at mono-threads-posix.c:154
>     frame #14: 0x00007fff8b9b499d libsystem_pthread.dylib`_pthread_body + 131
>     frame #15: 0x00007fff8b9b491a libsystem_pthread.dylib`_pthread_start + 168
>     frame #16: 0x00007fff8b9b2351 libsystem_pthread.dylib`thread_start + 13
> 
>   thread #7: tid = 0x2a6c069, 0x00007fff93cfcde6 libsystem_kernel.dylib`__psynch_mutexwait + 10, name = 'tid_310b'
>     frame #0: 0x00007fff93cfcde6 libsystem_kernel.dylib`__psynch_mutexwait + 10
>     frame #1: 0x00007fff8b9b4e4a libsystem_pthread.dylib`_pthread_mutex_lock_wait + 89
>     frame #2: 0x0000000106fb687a mono`sgen_gc_lock [inlined] mono_os_mutex_lock + 12 at mono-os-mutex.h:96
>     frame #3: 0x0000000106fb686e mono`sgen_gc_lock + 42 at mono-coop-mutex.h:53
>     frame #4: 0x0000000106fb6844 mono`sgen_gc_lock + 4 at sgen-gc.c:3165
>     frame #5: 0x0000000106fa84d0 mono`mono_gc_alloc_vector(vtable=0x00007fc60300c810, size=56, max_length=23) + 112 at sgen-mono.c:1743
>     frame #6: 0x0000000107531d16
>     frame #7: 0x00000001075351b2
>     frame #8: 0x0000000107534ad9
>     frame #9: 0x0000000106d91a57 mono`mono_jit_runtime_invoke(method=<unavailable>, obj=<unavailable>, params=0x0000700000a28e68, exc=<unavailable>, error=0x0000000000000017) + 2215 at mini-runtime.c:2553
>     frame #10: 0x0000000106f726a8 mono`do_runtime_invoke(method=0x00007fc602000250, obj=0x0000000107803d88, params=0x0000700000a28e68, exc=0x0000000000000000, error=0x0000700000a28e00) + 88 at object.c:2949
>     frame #11: 0x0000000106f43430 mono`start_wrapper + 942 at threads.c:765
>     frame #12: 0x0000000106f43082 mono`start_wrapper(data=<unavailable>) + 50 at threads.c:821
>     frame #13: 0x0000000107003228 mono`inner_start_thread(arg=<unavailable>) + 488 at mono-threads-posix.c:154
>     frame #14: 0x00007fff8b9b499d libsystem_pthread.dylib`_pthread_body + 131
>     frame #15: 0x00007fff8b9b491a libsystem_pthread.dylib`_pthread_start + 168
>     frame #16: 0x00007fff8b9b2351 libsystem_pthread.dylib`thread_start + 13
> 
>   thread #8: tid = 0x2a6c06a, 0x00007fff93cfd582 libsystem_kernel.dylib`__wait4 + 10, name = 'tid_340b'
>     frame #0: 0x00007fff93cfd582 libsystem_kernel.dylib`__wait4 + 10
>     frame #1: 0x0000000106e42961 mono`mono_handle_native_sigsegv(signal=<unavailable>, ctx=<unavailable>, info=<unavailable>) + 433 at mini-exceptions.c:2424
>     frame #2: 0x0000000106e9ea1a mono`mono_arch_handle_altstack_exception(sigctx=0x000000010767bf48, siginfo=<unavailable>, fault_addr=<unavailable>, stack_ovf=0) + 90 at exceptions-amd64.c:795
>     frame #3: 0x0000000106d8ea08 mono`mono_sigsegv_signal_handler(_dummy=<unavailable>, _info=<unavailable>, context=<unavailable>) + 440 at mini-runtime.c:2873
>     frame #4: 0x00007fff9760952a libsystem_platform.dylib`_sigtramp + 26
>     frame #5: 0x0000000106fd6689 mono`copy_object_no_checks [inlined] sgen_vtable_get_descriptor(vtable=0x0000000000000000) + 1 at sgen-client-mono.h:28
>     frame #6: 0x0000000106fd6688 mono`copy_object_no_checks(obj=0x00000001078570d0, queue=0x0000700000c2b520) + 24 at sgen-copy-object.h:74
>     frame #7: 0x0000000106fd535b mono`simple_nursery_serial_copy_object(obj_slot=0x00007fc601004820, queue=<unavailable>) + 75 at sgen-minor-copy-object.h:110
>     frame #8: 0x0000000106f60a0e mono`mono_handle_stack_scan(stack=<unavailable>, func=(mono`simple_nursery_serial_copy_object at sgen-minor-copy-object.h:63), gc_data=0x0000700000c2b520) + 62 at handle.c:122
>     frame #9: 0x0000000106fa9ac8 mono`sgen_client_scan_thread_data(start_nursery=0x0000000107800000, end_nursery=0x0000000107c00000, precise=1, ctx=<unavailable>) + 280 at sgen-mono.c:2429
>     frame #10: 0x0000000106fdcbe3 mono`sgen_workers_enqueue_job(job=0x0000000107521050, enqueue=<unavailable>) + 35 at sgen-workers.c:124
>     frame #11: 0x0000000106fb920d mono`enqueue_scan_from_roots_jobs(gc_thread_gray_queue=<unavailable>, heap_start=<unavailable>, heap_end=<unavailable>, ops=<unavailable>, enqueue=0) + 189 at sgen-gc.c:1441
>     frame #12: 0x0000000106fb5d5e mono`collect_nursery(reason=<unavailable>, is_overflow=<unavailable>, unpin_queue=<unavailable>, finish_up_concurrent_mark=0) + 798 at sgen-gc.c:1578
>     frame #13: 0x0000000106fb55f4 mono`sgen_perform_collection(requested_size=4096, generation_to_collect=0, reason="Nursery full", wait_to_finish=0, stw=1) + 420 at sgen-gc.c:2260
>     frame #14: 0x0000000106faae7a mono`sgen_alloc_obj_nolock(vtable=0x00007fc60300c810, size=304) + 746 at sgen-alloc.c:262
>     frame #15: 0x0000000106fa84db mono`mono_gc_alloc_vector(vtable=0x00007fc60300c810, size=304, max_length=269) + 123 at sgen-mono.c:1745
>     frame #16: 0x0000000107531d16
>     frame #17: 0x00000001075351b2
>     frame #18: 0x0000000107534ad9
>     frame #19: 0x0000000106d91a57 mono`mono_jit_runtime_invoke(method=<unavailable>, obj=<unavailable>, params=0x0000700000c2be68, exc=<unavailable>, error=0x000000000000010d) + 2215 at mini-runtime.c:2553
>     frame #20: 0x0000000106f726a8 mono`do_runtime_invoke(method=0x00007fc602000250, obj=0x0000000107838ec0, params=0x0000700000c2be68, exc=0x0000000000000000, error=0x0000700000c2be00) + 88 at object.c:2949
>     frame #21: 0x0000000106f43430 mono`start_wrapper + 942 at threads.c:765
>     frame #22: 0x0000000106f43082 mono`start_wrapper(data=<unavailable>) + 50 at threads.c:821
>     frame #23: 0x0000000107003228 mono`inner_start_thread(arg=<unavailable>) + 488 at mono-threads-posix.c:154
>     frame #24: 0x00007fff8b9b499d libsystem_pthread.dylib`_pthread_body + 131
>     frame #25: 0x00007fff8b9b491a libsystem_pthread.dylib`_pthread_start + 168
>     frame #26: 0x00007fff8b9b2351 libsystem_pthread.dylib`thread_start + 13
> (lldb) detach
> error: Detach failed: Sending disconnect packet failed.
Comment 2 Aleksey Kliger 2016-09-01 22:25:54 UTC
This is happening because there's a race between mono_handle_stack_scan and mono_handle_new because despite being "coop" handles, they have to work with non-coop, too.  So there need to be atomic operations here and more care about the order of operations.  (In this case, we went to scan a stale pointer because the increment of size happened before we put a valid value in there).
Comment 3 Aleksey Kliger 2016-09-28 14:22:16 UTC
Fixed on mono master https://github.com/mono/mono/commit/76b32f3ee38e7b29cdbaa657a4826b65579f4e93
and on the mono-4.8.0-branch https://github.com/mono/mono/commit/662513e62b3d4ff078294d7211a9e15bca5b54df

At least the particular case that has a stack trace involving coop handles (see Comment 1)

Note You need to log in before you can comment on or make changes to this bug.