Bug 50529

Summary: crash in thread-native-exit.exe
Product: [Mono] Runtime Reporter: Zoltan Varga <vargaz>
Component: io-layerAssignee: Zoltan Varga <vargaz>
Status: RESOLVED FIXED    
Severity: normal CC: bernhard.urban, kumpera, ludovic, mono-bugs+mono, mono-bugs+runtime
Priority: ---    
Version: master   
Target Milestone: ---   
Hardware: PC   
OS: Mac OS   
Tags: Is this bug a regression?: ---
Last known good build:
Attachments: c testcase

Description Zoltan Varga 2016-12-20 21:05:12 UTC
The above regression test sometimes crashes. To repro:

while true; do echo -n "."; ./mono --llvm thread-native-exit.exe || break; done

Do something which generates load, like running make check in tests.

The stack trace is the following:

#16 0x000000010ca4cf31 in mono_handle_native_crash (signal=0x10d5dc241 "SIGSEGV", ctx=0x0, info=0x0) at mini-exceptions.c:2610
#17 0x000000010cb4987b in altstack_handle_and_restore (ctx=0x7fff532d7860, obj=0x0, stack_ovf=0) at exceptions-amd64.c:780
#18 0x00007fff981fbb0c in _pthread_create ()
#19 0x000000010cce475d in mono_gc_pthread_create (new_thread=0x7fff532d7b48, attr=0x7fff532d7b78, start_routine=0x10cd7b630 <inner_start_thread>, arg=0x7fd57f703100) at sgen-mono.c:2508
#20 0x000000010cd7f3bc in mono_threads_platform_create_thread (thread_fn=0x10cd7b630 <inner_start_thread>, thread_data=0x7fd57f703100, stack_size=0x7fff532d7c48, out_tid=0x7fff532d7c58) at mono-threads-posix.c:126
#21 0x000000010cd7b49a in mono_threads_create_thread (start=0x10cc4dc80 <start_wrapper>, arg=0x7fd57f700da0, stack_size=0x7fff532d7c48, out_tid=0x7fff532d7c58) at mono-threads.c:1188
#22 0x000000010cc459cd in create_thread (thread=0x10e000b18, internal=0x10dd04798, start_delegate=0x10e000b90, start_func=0, start_func_arg=0x0, threadpool_thread=0, stack_size=0, error=0x7fff532d7cf0) at threads.c:790
#23 0x000000010cc46b35 in ves_icall_System_Threading_Thread_Thread_internal (this_obj=0x10e000b18, start=0x10e000b90) at threads.c:1209

The crash is always at exactly this location:

(gdb) x/20i $pc
(gdb) x/20i $pc
0x7fff981fbb0c <_pthread_create+383>:	mov    0x10(%rbx),%eax
0x7fff981fbb0f <_pthread_create+386>:	mov    %eax,%ecx
0x7fff981fbb11 <_pthread_create+388>:	or     $0x2,%ecx
0x7fff981fbb14 <_pthread_create+391>:	mov    %ecx,0x10(%rbx)

_pthread_create () contains is the following:

	pthread_t t2;
	t2 = __bsdthread_create(start_routine, arg, stack, t, flags);
	if (t2 == (pthread_t)-1) {
		if (flags & PTHREAD_START_CUSTOM) {
			// free the thread and stack if we allocated it
			_pthread_deallocate(t);
		}
		return EAGAIN;
	}
	if (t == NULL) {
		t = t2;
	}

	__pthread_add_thread(t, true, from_mach_thread);

Here, the crash happens at the following line of the inlined __pthread_add_thread():
		t->parentcheck = 1;

So 'rbx' is supposed to be 't', the newly created thread. rbx usually has a value like 0x700006e4d000 which looks like a plausable thread address.
Comment 1 Zoltan Varga 2016-12-20 22:29:35 UTC
What seems to happen is that mono_threads_add_joinable_thread () is called for the same tid twice, so we try to join the same tid twice. Its called from
sgen_client_thread_unregister (). It looks like there are two threadinfo structures for the same thread/tid.
Comment 2 Zoltan Varga 2016-12-21 00:40:19 UTC
Correction: this happens when tid's are reused, i.e. a thread is created with a given tid, dies, then a new thread is created with the same tid.
Comment 4 Zoltan Varga 2017-02-04 20:05:04 UTC
Created attachment 19724 [details]
c testcase
Comment 5 Zoltan Varga 2017-02-04 20:13:42 UTC
The attached c testcase can be used to reproduce this, its a reduced version of the testcase in comment #3. The testcase looks correct to me, so this looks like a OS/libpthread bug.

To reproduce:
clang -O2 -g crash.c
while true; do echo -n "."; ./a.out || break; done

This will crash after a while with the stacktrace above. Its reproducible on sierra (10.12.3), yosemite, and mavericks (10.9.5).
Comment 6 Rodrigo Kumpera 2017-02-06 19:24:57 UTC
Hi Zoltan,

Please work on this issue.
Comment 7 Zoltan Varga 2017-02-14 03:50:38 UTC
Reported as apple radar #30506046.
Comment 8 Ludovic Henry 2017-09-06 18:08:33 UTC
Fixed with 38ceab15479475d54159de8d2c0d297c56e5f80b