Bug 24544 - Deadlock or assertion failed (threads != NULL) at shutdown
Summary: Deadlock or assertion failed (threads != NULL) at shutdown
Status: NEW
Alias: None
Product: Runtime
Classification: Mono
Component: General (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- normal
Target Milestone: ---
Assignee: Bugzilla
URL:
Depends on:
Blocks:
 
Reported: 2014-11-14 15:23 UTC by Vlad Brezae
Modified: 2014-11-14 15:23 UTC (History)
2 users (show)

See Also:
Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
Test for bug (485 bytes, text/x-csharp)
2014-11-14 15:23 UTC, Vlad Brezae
Details

Description Vlad Brezae 2014-11-14 15:23:37 UTC
Created attachment 8747 [details]
Test for bug

The bug occurs on an Ubuntu 14.04 amd64 machine.

Deadlock case

(gdb) info threads
  Id   Target Id         Frame 
  5    Thread 0x7ffff43fe700 (LWP 9057) "Threadpool work" 0x00007ffff74b9b9d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
  4    Thread 0x7ffff7ef7700 (LWP 9056) "Threadpool moni" 0x00007ffff711af52 in do_sigsuspend (set=0x9e8560 <suspend_signal_mask>) at ../sysdeps/unix/sysv/linux/sigsuspend.c:31
  3    Thread 0x7ffff45ff700 (LWP 9055) "Timer-Scheduler" 0x00007ffff711af52 in do_sigsuspend (set=0x9e8560 <suspend_signal_mask>) at ../sysdeps/unix/sysv/linux/sigsuspend.c:31
  2    Thread 0x7ffff491c700 (LWP 9054) "Finalizer" 0x00007ffff711af52 in do_sigsuspend (set=0x9e8560 <suspend_signal_mask>) at ../sysdeps/unix/sysv/linux/sigsuspend.c:31
* 1    Thread 0x7ffff7fdc800 (LWP 9050) "mono-sgen" sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85

Thread 1

#0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
#1  0x000000000068a8c2 in mono_sem_wait (sem=sem@entry=0x7fffe8000928, alertable=alertable@entry=0) at mono-semaphore.c:101
#2  0x0000000000690e82 in mono_threads_core_resume (info=info@entry=0x7fffe80008e0) at mono-threads-posix.c:424
#3  0x000000000068f7f9 in mono_thread_info_core_resume (info=info@entry=0x7fffe80008e0) at mono-threads.c:507
#4  0x000000000068fff0 in mono_thread_info_finish_suspend_and_resume (info=info@entry=0x7fffe80008e0) at mono-threads.c:549
#5  0x00000000005cc6fd in abort_thread_internal (thread=<optimized out>, install_async_abort=1, can_raise_exception=1) at threads.c:4605
#6  0x00000000005cfe3c in remove_and_abort_threads (key=<optimized out>, value=0x7ffff7f6c2d0, user=0x7fffffffd9a0) at threads.c:2831
#7  0x00000000005f66d8 in mono_g_hash_table_foreach_remove (hash=0xaa6ec0, func=func@entry=0x5cfda0 <remove_and_abort_threads>, user_data=user_data@entry=0x7fffffffd9a0) at mono-hash.c:359
#8  0x00000000005d058d in mono_thread_manage () at threads.c:2944
#9  0x0000000000492894 in mono_main (argc=2, argv=<optimized out>) at driver.c:2023
#10 0x00007ffff7105ec5 in __libc_start_main (main=0x420720 <main>, argc=2, argv=0x7fffffffdfd8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffdfc8) at libc-start.c:287
#11 0x00000000004209c4 in _start ()


Thread 2

#0  0x00007ffff711af52 in do_sigsuspend (set=0x9e8560 <suspend_signal_mask>) at ../sysdeps/unix/sysv/linux/sigsuspend.c:31
#1  __GI___sigsuspend (set=set@entry=0x9e8560 <suspend_signal_mask>) at ../sysdeps/unix/sysv/linux/sigsuspend.c:45
#2  0x0000000000627f34 in suspend_thread (context=0x7ffff491b780, info=0x7ffff00008e0) at sgen-os-posix.c:126
#3  suspend_handler (_dummy=<optimized out>, _info=<optimized out>, context=0x7ffff491b780) at sgen-os-posix.c:153
#4  <signal handler called>
#5  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:84
#6  0x000000000068a8de in mono_sem_wait (sem=sem@entry=0x9e7d40 <finalizer_sem>, alertable=alertable@entry=1) at mono-semaphore.c:101
#7  0x00000000005f14ad in finalizer_thread (unused=unused@entry=0x0) at gc.c:1077
#8  0x00000000005d11da in start_wrapper_internal (data=<optimized out>) at threads.c:657
#9  start_wrapper (data=<optimized out>) at threads.c:704
#10 0x00000000006907d6 in inner_start_thread (arg=0x7fffffffdb30) at mono-threads-posix.c:88
#11 0x00007ffff74b2182 in start_thread (arg=0x7ffff491c700) at pthread_create.c:312
#12 0x00007ffff71defbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 3

#0  0x00007ffff711af52 in do_sigsuspend (set=0x9e8560 <suspend_signal_mask>) at ../sysdeps/unix/sysv/linux/sigsuspend.c:31
#1  __GI___sigsuspend (set=set@entry=0x9e8560 <suspend_signal_mask>) at ../sysdeps/unix/sysv/linux/sigsuspend.c:45
#2  0x0000000000627f34 in suspend_thread (context=0x7ffff45fdb00, info=0x7fffe80008e0) at sgen-os-posix.c:126
#3  suspend_handler (_dummy=<optimized out>, _info=<optimized out>, context=0x7ffff45fdb00) at sgen-os-posix.c:153
#4  <signal handler called>
#5  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
#6  0x000000000068a8c2 in mono_sem_wait (sem=sem@entry=0x7fffe8000948, alertable=alertable@entry=0) at mono-semaphore.c:101
#7  0x0000000000690992 in suspend_signal_handler (_dummy=<optimized out>, info=<optimized out>, context=0x7ffff45fe180) at mono-threads-posix.c:312
#8  <signal handler called>
#9  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#10 0x0000000000666a9b in _wapi_handle_timedwait_signal_handle (handle=handle@entry=0x409, timeout=timeout@entry=0x0, alertable=alertable@entry=1, poll=poll@entry=0) at handles.c:1614
#11 0x0000000000666b5b in _wapi_handle_wait_signal_handle (handle=handle@entry=0x409, alertable=alertable@entry=1) at handles.c:1559
#12 0x000000000067ae12 in WaitForSingleObjectEx (handle=handle@entry=0x409, timeout=timeout@entry=4294967295, alertable=alertable@entry=1) at wait.c:194
#13 0x00000000005cf147 in mono_wait_uninterrupted (multiple=0, numhandles=1, waitall=0, alertable=1, ms=-1, handles=<synthetic pointer>, thread=0x7ffff7f6c2d0) at threads.c:1357
#14 ves_icall_System_Threading_WaitHandle_WaitOne_internal (this=<optimized out>, handle=0x409, ms=-1, exitContext=<optimized out>) at threads.c:1490
#15 0x0000000040004500 in ?? ()
#16 0x0000000000000000 in ?? ()

Thread 4

#0  0x00007ffff711af52 in do_sigsuspend (set=0x9e8560 <suspend_signal_mask>) at ../sysdeps/unix/sysv/linux/sigsuspend.c:31
#1  __GI___sigsuspend (set=set@entry=0x9e8560 <suspend_signal_mask>) at ../sysdeps/unix/sysv/linux/sigsuspend.c:45
#2  0x0000000000627f34 in suspend_thread (context=0x7ffff7ef66c0, info=0x7fffec0008e0) at sgen-os-posix.c:126
#3  suspend_handler (_dummy=<optimized out>, _info=<optimized out>, context=0x7ffff7ef66c0) at sgen-os-posix.c:153
#4  <signal handler called>
#5  __clock_nanosleep (clock_id=1, flags=1, req=0x7ffff7ef6cb0, rem=0xffffffffffffffff) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:49
#6  0x000000000067c208 in SleepEx (ms=ms@entry=500, alertable=alertable@entry=1) at wthreads.c:277
#7  0x00000000005d37a5 in monitor_thread (unused=unused@entry=0x0) at threadpool.c:909
#8  0x00000000005d11da in start_wrapper_internal (data=<optimized out>) at threads.c:657
#9  start_wrapper (data=<optimized out>) at threads.c:704
#10 0x00000000006907d6 in inner_start_thread (arg=0x7ffff45fe750) at mono-threads-posix.c:88
#11 0x00007ffff74b2182 in start_thread (arg=0x7ffff7ef7700) at pthread_create.c:312
#12 0x00007ffff71defbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111


Thread 5

#0  0x00007ffff74b9b9d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x000000000069e198 in monoeg_g_usleep (microseconds=microseconds@entry=12770) at gdate-unix.c:53
#2  0x000000000065954d in restart_threads_until_none_in_managed_allocator () at sgen-stw.c:154
#3  sgen_stop_world (generation=generation@entry=1) at sgen-stw.c:229
#4  0x00000000006330c2 in sgen_perform_collection (requested_size=requested_size@entry=0, generation_to_collect=generation_to_collect@entry=1, reason=reason@entry=0x76d4f0 "user request", 
    wait_to_finish=wait_to_finish@entry=1) at sgen-gc.c:3181
#5  0x0000000000633fd8 in mono_gc_collect (generation=1) at sgen-gc.c:4344
#6  0x00000000400250e2 in ?? ()
#7  0x00007ffff5c00bd0 in ?? ()
#8  0x00007ffff5c0bc60 in ?? ()
....

The problem appears to be that after the Main function ends execution, the main thread tries to kill the remaining threads. In doing so, it enters a critical region that cannot be interrupted by GC, sends a signal to the thread and tries to synchronize with it on MonoThreadInfo resume_semaphore and finish_resume_semaphore. The GC from another thread further suspends the previous thread but cannot suspend the main thread (which is in a critical region) and it never exits it because the thread that should signal it it's stopped by GC.


The bug is not always reproductible and sometimes it actually crashes with :
 * Assertion at threadpool.c:1201, condition `threads != NULL' not met (within threadpool_start_thread).

Stacktrace:

Native stacktrace:

	./mono-sgen() [0x4c4f2b]
	/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340) [0x7f27af8e9340]
	/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x39) [0x7f27af549bb9]
	/lib/x86_64-linux-gnu/libc.so.6(abort+0x148) [0x7f27af54cfc8]
	./mono-sgen() [0x695e59]
	./mono-sgen() [0x69605e]
	./mono-sgen() [0x6961a6]
	./mono-sgen() [0x5d370e]
	./mono-sgen() [0x5d39df]
	./mono-sgen() [0x5d11da]
	./mono-sgen() [0x6907d6]
	/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182) [0x7f27af8e1182]
	/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f27af60dfbd]

 I haven't been able catch this assertion under gdb today

Note You need to log in before you can comment on or make changes to this bug.