Bug 19755

Summary: Root domain gets destroyed before unregister_thread() for the main thread is called
Product: [Mono] Runtime Reporter: krystian.garlinski
Component: GeneralAssignee: Bugzilla <bugzilla>
Status: RESOLVED FIXED    
Severity: normal CC: mono-bugs+mono, mono-bugs+runtime, vargaz
Priority: ---    
Version: 3.4.0   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Tags: Is this bug a regression?: ---
Last known good build:

Description krystian.garlinski 2014-05-14 06:25:52 UTC
When running application through mono, root domain gets cleaned in line 2053 in mono_main() (mono/mini/driver.c). After the main thread is finished (main() function returns), __nptl_deallocate_tsd() is called to free any TLS left for the current thread. It calls unregister_thread(), which tries to lock domain's lock in line 482 of mono/metadata/threads.c. This leads to either deadlock or a failed assertion, because a mutex is already freed (along with a domain). Shouldn't the thread be unregistered before the call to mini_cleanup(domain) ?
Comment 1 Zoltan Varga 2014-05-14 08:29:45 UTC
What mono version is this, i.e. what is the output of mono --version ?
Comment 2 krystian.garlinski 2014-05-14 08:56:49 UTC
Mono JIT compiler version 3.4.0 (tarball śro, 14 maj 2014, 12:55:43 CEST)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
        TLS:           __thread
        SIGSEGV:       altstack
        Notifications: epoll
        Architecture:  amd64
        Disabled:      none
        Misc:          softdebug 
        LLVM:          supported, not enabled.
        GC:            sgen


Here is some more info on the issue:

If I run the simple example (just the basic Hello World as created by monodevelop) it seems to work correctly. However, if I just call one function from my shared library (pinvoke using DllImport, the call may just simply do nothing) it crashes at exit as described earlier. My library does not use TLS, but has static destructors (on library unload) and uses OpenCL (currently AMD fglrx, which might do something with TLS). Anyway, crash happens during the unregister_thread() under conditions described earlier. If I just change the line 316 of utils/mono-threads.c to not have any destructor, it runs fine, without the crash. I think that it is a terrible idea to do so, and that you should explicitly set the thread's TLS to NULL on destroying the root domain.
Comment 3 krystian.garlinski 2014-05-14 09:50:28 UTC
Even simple Hello World app hangs at exit when I preload my library and run it (LD_PRELOAD=./libMyLib.so mono Test.exe) without calling any of the functions in it.
Comment 4 Zoltan Varga 2014-05-14 12:05:41 UTC
Can you post the whole call stack when this happens ? threads.c:482 is get_current_thread_ptr_for_domain (), which shouldn't be called during shutdown.
Comment 5 krystian.garlinski 2014-05-15 02:52:58 UTC
#0  0x00007ffff711af79 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff711e388 in __GI_abort () at abort.c:89
#2  0x0000000000638575 in monoeg_g_logv (log_domain=log_domain@entry=0x0, 
    log_level=log_level@entry=G_LOG_LEVEL_ERROR, 
    format=format@entry=0x6410a8 "* Assertion at %s:%d, condition `%s' not met\n", args=args@entry=0x7fffffffdf28) at goutput.c:175
#3  0x00000000006386b6 in monoeg_assertion_message (
    format=format@entry=0x6410a8 "* Assertion at %s:%d, condition `%s' not met\n") at goutput.c:195
#4  0x000000000058a140 in get_current_thread_ptr_for_domain (domain=0xc1f5c0, 
    thread=0x7fffefd00010) at threads.c:478
#5  0x000000000058b90c in mono_thread_current () at threads.c:1383
#6  0x00000000005d5583 in sgen_thread_detach (p=<optimized out>)
    at sgen-gc.c:4177
#7  0x0000000000633213 in unregister_thread (arg=0xb34760)
    at mono-threads.c:187
#8  0x00007ffff74b1f82 in __nptl_deallocate_tsd () at pthread_create.c:158
#9  0x00007ffff7105ee9 in __libc_start_main (main=0x419960 <main>, argc=2, 
    argv=0x7fffffffe188, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7fffffffe178) at libc-start.c:293
#10 0x0000000000419c1a in _start ()
Comment 6 krystian.garlinski 2014-05-15 03:01:47 UTC
Sorry, above callstack got dumped from mono 3.2.8 (from Ubuntu 14.04 repos). This is the callstack I get for the 3.4.0:

#0  0x00007ffff711af79 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff711e388 in __GI_abort () at abort.c:89
#2  0x0000000000636185 in monoeg_g_logv (log_domain=log_domain@entry=0x0, log_level=log_level@entry=G_LOG_LEVEL_ERROR, format=format@entry=0x6580a8 "* Assertion at %s:%d, condition `%s' not met\n", args=args@entry=0x7fffffffd9e8) at goutput.c:175
#3  0x00000000006362c6 in monoeg_assertion_message (format=format@entry=0x6580a8 "* Assertion at %s:%d, condition `%s' not met\n") at goutput.c:195
#4  0x0000000000589e10 in get_current_thread_ptr_for_domain (domain=0xa20bc0, thread=0x7ffff14cc010) at threads.c:482
#5  0x000000000058b4ac in mono_thread_current () at threads.c:1234
#6  0x00000000005d4e33 in sgen_thread_detach (p=<optimized out>) at sgen-gc.c:4158
#7  0x0000000000630989 in unregister_thread (arg=0xa1f5c0) at mono-threads.c:187
#8  0x00007ffff74b1f82 in __nptl_deallocate_tsd () at pthread_create.c:158
#9  0x00007ffff7105ee9 in __libc_start_main (main=0x419d10 <main>, argc=2, argv=0x7fffffffdc48, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffdc38) at libc-start.c:293
#10 0x0000000000419fca in _start ()
Comment 7 krystian.garlinski 2014-05-15 04:57:33 UTC
I've found the culprit of this weird behavior. I had such construction in my library:

struct PosixQuit {
	~PosixQuit() {
		// ensures correct cleanup of thread local variables
		pthread_exit(nullptr);
	}
};

namespace {
	static PosixQuit _posixQuit;
}

Which has called the TLS destructor (unregister_thread()) on the main thread (which is legal to do in Linux pthread implementation). I still think that you should explicitly set the TLS for the main thread to NULL (which would in turn call this destructor) BEFORE completely destroying the root domain.
Comment 8 Zoltan Varga 2014-05-15 08:05:39 UTC
Should be fixed in mono master bf4c6fe29565a629a8a33744639d0946246aabb8.