Bug 51219

Summary: ves_icall_System_Threading_ThreadPool_RequestWorkerThread called after threadpool cleanup
Product: [Mono] Runtime Reporter: Neale Ferguson <neale>
Component: GeneralAssignee: Ludovic Henry <ludovic>
Status: RESOLVED FIXED    
Severity: normal CC: andi.mcclure, flutos, kumpera, mono-bugs+mono, mono-bugs+runtime
Priority: ---    
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Tags: Is this bug a regression?: ---
Last known good build:

Description Neale Ferguson 2017-01-04 16:13:48 UTC
When an nunit test runner shuts down it will crash with the stack trace. We start up and http listener and do Web requests to a controller so presumably during the nunit runner shutdown this http listener is exiting and we see this crash.

mono_os_mutex_trylock: pthread_mutex_trylock failed with "Invalid argument" (22)
Stacktrace:

  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) System.Threading.ThreadPool.RequestWorkerThread () [0x00006] in <307229303d3a4bcc809209ed3a306b36>:0
  at System.Threading.ThreadPoolWorkQueue.EnsureThreadRequested () [0x0001f] in build/mono/trunk/mcs/class/referencesource/mscorlib/system/threading/threadpool.cs:620
  at System.Threading.ThreadPoolWorkQueue.Enqueue (System.Threading.IThreadPoolWorkItem,bool) [0x0006c] in build/mono/trunk/mcs/class/referencesource/mscorlib/system/threading/threadpool.cs:679
  at System.Threading.ThreadPool.QueueUserWorkItemHelper (System.Threading.WaitCallback,object,System.Threading.StackCrawlMark&,bool) [0x00016] in build/mono/trunk/mcs/class/referencesource/mscorlib/system/threading/threadpool.cs:1680
  at System.Threading.ThreadPool.UnsafeQueueUserWorkItem (System.Threading.WaitCallback,object) [0x00002] in build/mono/trunk/mcs/class/referencesource/mscorlib/system/threading/threadpool.cs:1652
  at System.Net.ListenerAsyncResult.Complete (System.Exception) [0x00075] in /build/mono/trunk/mcs/class/System/System.Net/ListenerAsyncResult.cs:69
  at System.Net.HttpListener.Cleanup (bool) [0x00176] in /build/mono/trunk/mcs/class/System/System.Net/HttpListener.cs:314
  at System.Net.HttpListener.Close (bool) [0x0000c] in /build/mono/trunk/mcs/class/System/System.Net/HttpListener.cs:280
  at System.Net.HttpListener.Stop () [0x0000d] in /build/mono/trunk/mcs/class/System/System.Net/HttpListener.cs:405
  at Microsoft.Owin.Host.HttpListener.OwinHttpListener.Dispose () [0x00013] in <d87779ef62ca4c3698568e9d3e48e24e>:0
  at Microsoft.Owin.Hosting.Engine.HostingEngine/<>c__DisplayClass1.<Start>b__0 () [0x00006] in <f9c6a24237684643b127a582774aa6dc>:0
  at Microsoft.Owin.Hosting.Utilities.Disposable.Dispose () [0x00006] in <f9c6a24237684643b127a582774aa6dc>:0
  at Drms.Server.Service.Coordinator.WebApi.WebHost.HostWebThread (object) [0x000c7] in /build/COFLOW/Src/Server/Service/Coordinator/WebApi/WebHost.cs:95
  at System.Threading.ThreadHelper.ThreadStart_Context (object) [0x00025] in build/mono/trunk/mcs/class/referencesource/mscorlib/system/threading/thread.cs:72
  at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) [0x00071] in build/mono/trunk/mcs/class/referencesource/mscorlib/system/threading/executioncontext.cs:957
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) [0x00000] in build/mono/trunk/mcs/class/referencesource/mscorlib/system/threading/executioncontext.cs:904
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext,System.Threading.ContextCallback,object) [0x0002b] in build/mono/trunk/mcs/class/referencesource/mscorlib/system/threading/executioncontext.cs:893
  at System.Threading.ThreadHelper.ThreadStart (object) [0x0000f] in build/mono/trunk/mcs/class/referencesource/mscorlib/system/threading/thread.cs:87
  at (wrapper runtime-invoke) <Module>.runtime_invoke_void__this___object (object,intptr,intptr,intptr) [0x00053] in <307229303d3a4bcc809209ed3a306b36>:0

A display of the threadpool structure shows that the mutex has been destroyed which indicates the cleanup() routine has been called:

 p *threadpool
$3 = {ref = {ref = 3566501632, destructor = 0x7f82dacd8868 <main_arena+264>}, domains = 0x39e1ef0, domains_lock = {m = {__data = {__lock = 0, __count = 0, __owner = 0,
        __nusers = 0, __kind = -1, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 16 times>, "\377\377\377\377", '\000' <repeats 19 times>,
      __align = 0}}, threads = 0x39da040, threads_lock = {m = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = -1, __spins = 0, __list = {__prev = 0x0,
          __next = 0x0}}, __size = '\000' <repeats 16 times>, "\377\377\377\377", '\000' <repeats 19 times>, __align = 0}}, threads_exit_cond = {c = {__data = {__lock = 1,
        __futex = 0, __total_seq = 18446744073709551615, __wakeup_seq = 0, __woken_seq = 49, __mutex = 0x7f82dacd87d8 <main_arena+120>, __nwaiters = 3670902744,
        __broadcast_seq = 32642},
      __size = "\001\000\000\000\000\000\000\000\377\377\377\377\377\377\377\377\000\000\000\000\000\000\000\000\061\000\000\000\000\000\000\000Ø\315Ú\177\000\000Ø\315Ú\177\000", __align = 1}}, counters = {_ = {starting = 0, working = 0}, as_gint32 = 0}, limit_io_min = 24, limit_io_max = 200, worker = 0x30}
Comment 1 Neale Ferguson 2017-01-06 21:00:47 UTC
What triggers the threadpool cleanup? Should we be triggering an exception if a RequestWorkerThread is issued after the threadpool has been cleaned up or are we cleaning up too early?
Comment 2 Neale Ferguson 2017-01-09 18:33:22 UTC
Looking at the reference code, it seems to simply discard things rather than trying to be neat as the mono code appears to do.
Comment 3 Neale Ferguson 2017-01-11 00:52:34 UTC
What I am seeing is that mono_try_runtime_shutdown() is being called and it will perform the threadpool cleanup. It will then try to suspend other threads etc. This would seem to leave a window for an existing thread to request a worker thread and thus lead to the problem we are seeing.
Comment 4 Neale Ferguson 2017-01-11 22:55:23 UTC
Here is the state of the system at the time a Worker Thread Request is being made:

(gdb) br ves_icall_System_Threading_ThreadPool_RequestWorkerThread
Breakpoint 2 at 0x5b4d70: file threadpool.c, line 790.
(gdb) c
Continuing.
[Switching to Thread 0x7fffe6b3a700 (LWP 29377)]

Breakpoint 2, ves_icall_System_Threading_ThreadPool_RequestWorkerThread () at threadpool.c:790
790     {
(gdb) thr apply all bt

Thread 82 (Thread 0x7fffd7dff700 (LWP 29477)):
#0  0x00007ffff74b7c90 in __nptl_death_event () from /lib64/libpthread.so.0
#1  0x00007ffff74b8f1c in start_thread () from /lib64/libpthread.so.0
#2  0x00007ffff6fd021d in clone () from /lib64/libc.so.6

Thread 79 (Thread 0x7fffb45fe700 (LWP 29474)):
#0  0x00007ffff74b7c90 in __nptl_death_event () from /lib64/libpthread.so.0
#1  0x00007ffff74b8f1c in start_thread () from /lib64/libpthread.so.0
#2  0x00007ffff6fd021d in clone () from /lib64/libc.so.6

Thread 78 (Thread 0x7fffb4cff700 (LWP 29473)):
#0  0x00007ffff6fcabf7 in madvise () from /lib64/libc.so.6
#1  0x00007ffff74b8e71 in start_thread () from /lib64/libpthread.so.0
#2  0x00007ffff6fd021d in clone () from /lib64/libc.so.6

Thread 77 (Thread 0x7fffd477b700 (LWP 29472)):
#0  0x00007ffff74b7c90 in __nptl_death_event () from /lib64/libpthread.so.0
#1  0x00007ffff74b8f1c in start_thread () from /lib64/libpthread.so.0
#2  0x00007ffff6fd021d in clone () from /lib64/libc.so.6

Thread 75 (Thread 0x7fffb50a9700 (LWP 29470)):
#0  0x00007ffff74b7c90 in __nptl_death_event () from /lib64/libpthread.so.0
#1  0x00007ffff74b8f1c in start_thread () from /lib64/libpthread.so.0
#2  0x00007ffff6fd021d in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7fffd535b700 (LWP 29384)):
#0  0x00007ffff74bf4ad in accept () from /lib64/libpthread.so.0
#1  0x00000000006479c3 in _wapi_accept (fd=fd@entry=14, addr=addr@entry=0x0, addrlen=addrlen@entry=0x0) at sockets.c:200
#2  0x000000000059d66a in ves_icall_System_Net_Sockets_Socket_Accept_internal (sock=14, werror=0x7fffd535a798, blocking=<optimized out>)
    at socket-io.c:724
#3  0x00000000406d7051 in ?? ()
#4  0x00007ffff07e7b30 in ?? ()
#5  0x00007ffff07e7ac0 in ?? ()
#6  0x00007ffff07e7ac0 in ?? ()
#7  0x00007ffff07e7ac0 in ?? ()
#8  0x00007ffff07e3460 in ?? ()
#9  0x00007fffc473c700 in ?? ()
#10 0x00007fffd535a670 in ?? ()
#11 0x00007fffd535a780 in ?? ()
#12 0x00007fffd535a6e0 in ?? ()
#13 0x00000000406d6f80 in ?? ()
#14 0x4200656b6f766e49 in ?? ()
#15 0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7fffe6b3a700 (LWP 29377)):
#0  ves_icall_System_Threading_ThreadPool_RequestWorkerThread () at threadpool.c:790
#1  0x0000000040454f6c in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7fffe6d3b700 (LWP 29376)):
#0  0x00007ffff74bf4ad in accept () from /lib64/libpthread.so.0
#1  0x00000000006479c3 in _wapi_accept (fd=fd@entry=8, addr=addr@entry=0x0, addrlen=addrlen@entry=0x0) at sockets.c:200
#2  0x000000000059d66a in ves_icall_System_Net_Sockets_Socket_Accept_internal (sock=8, werror=0x7fffe6d3a7a8, blocking=<optimized out>)
    at socket-io.c:724
#3  0x0000000040066711 in ?? ()
#4  0x00007ffff0525a08 in ?? ()
#5  0x00007ffff0525ba8 in ?? ()
#6  0x00007ffff0525998 in ?? ()
#7  0x00007ffff0525ac8 in ?? ()
#8  0x00007ffff0510b20 in ?? ()
#9  0x00007fffe0001d90 in ?? ()
#10 0x00007fffe6d3a680 in ?? ()
#11 0x00007fffe6d3a790 in ?? ()
#12 0x00007fffe6d3a6f0 in ?? ()
#13 0x0000000040065fe0 in ?? ()
#14 0x4200656b6f766e49 in ?? ()
#15 0x02ae2d02a92d02a4 in ?? ()
#16 0x942d02942b02942b in ?? ()
#17 0x0000000000000001 in ?? ()
#18 0x00007fffe6d3a7a8 in ?? ()
#19 0x00007ffff0525400 in ?? ()
#20 0x0000000040065f76 in ?? ()
#21 0x00000000005b04d0 in ?? () at threads.c:4476
#22 0x00007ffff0525998 in ?? ()
#23 0x0000000040065fa0 in ?? ()
#24 0x00007fffe6d3a880 in ?? ()
#25 0x0000000040065e38 in ?? ()
#26 0x00007ffff0510a00 in ?? ()
#27 0x00007fff00000000 in ?? ()
#28 0x0000000040065e00 in ?? ()
#29 0x0000000000e8c730 in ?? ()
#30 0x0000000040065ce4 in ?? ()
#31 0x00000000005b04d0 in ?? () at threads.c:4476
#32 0x0000000040065b82 in ?? ()
#33 0x0000000040065e00 in ?? ()
#34 0x00007fffe6d3a880 in ?? ()
#35 0x0000000040065cb4 in ?? ()
#36 0x00007ffff0525998 in ?? ()
#37 0x00000000400659e4 in ?? ()
#38 0x01ee1901e71901e3 in ?? ()
#39 0xd11901d11901c719 in ?? ()
#40 0x942d02942b02942b in ?? ()
#41 0x2502872501fe1901 in ?? ()
#42 0x00007ffff0510298 in ?? ()
#43 0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7fffedf8e700 (LWP 29375)):
#0  0x00007ffff74b9ef7 in pthread_join () from /lib64/libpthread.so.0
#1  0x000000000065e01e in mono_native_thread_join (tid=tid@entry=140736219571968) at mono-threads-posix.c:266
#2  0x00000000005b1a79 in mono_threads_join_threads () at threads.c:4927
#3  0x00000000005d5cd5 in finalizer_thread (unused=unused@entry=0x0) at gc.c:931
#4  0x00000000005af656 in start_wrapper_internal (stack_ptr=0x7fffedf8ddf8, start_info=0x0) at threads.c:675
#5  start_wrapper (data=0xa30820) at threads.c:727
#6  0x000000000065c279 in inner_start_thread (data=0xa30880) at mono-threads.c:1163
#7  0x00007ffff74b8dc5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffff6fd021d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7ffff03ff700 (LWP 29374)):
#0  0x00007ffff74bc6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000063e9bf in mono_os_cond_wait (mutex=0x9c3c60 <lock>, cond=0x9c3c20 <work_cond>) at ../../mono/utils/mono-os-mutex.h:146
#2  thread_func (thread_data=0x7ffff7ff2008) at sgen-thread-pool.c:110
#3  0x00007ffff74b8dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007ffff6fd021d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7ffff7fd6780 (LWP 29370)):
#0  0x00007ffff74be790 in sem_wait () from /lib64/libpthread.so.0
#1  0x000000000065b55c in mono_os_sem_wait (flags=<optimized out>, sem=<optimized out>) at ../../mono/utils/mono-os-semaphore.h:209
#2  mono_os_sem_timedwait (sem=0x9c4740 <suspend_semaphore>, flags=MONO_SEM_FLAGS_NONE, timeout_ms=4294967295)
    at ../../mono/utils/mono-os-semaphore.h:242
#3  mono_threads_wait_pending_operations () at mono-threads.c:244
#4  0x000000000065c355 in suspend_sync (interrupt_kernel=1, tid=140736770455296) at mono-threads.c:913
#5  suspend_sync_nolock (interrupt_kernel=1, id=140736770455296) at mono-threads.c:930
#6  mono_thread_info_safe_suspend_and_run (id=140736770455296, interrupt_kernel=interrupt_kernel@entry=1,
    callback=callback@entry=0x5aa1b0 <async_abort_critical>, user_data=user_data@entry=0x7fffffffd3b0) at mono-threads.c:973
#7  0x00000000005aa31c in async_abort_internal (thread=0x7fffd539c508, install_async_abort=1) at threads.c:4735
#8  0x00000000005acb06 in remove_and_abort_threads (key=<optimized out>, value=0x7fffd539c508, user=0x7fffffffd490) at threads.c:3061
#9  0x00000000005da329 in mono_g_hash_table_foreach_remove (hash=0x9ff930, func=func@entry=0x5aca60 <remove_and_abort_threads>,
    user_data=user_data@entry=0x7fffffffd490) at mono-hash.c:353
#10 0x00000000005aee61 in mono_thread_manage () at threads.c:3174
#11 0x000000000047ac22 in mono_main (argc=13, argv=<optimized out>) at driver.c:2160
#12 0x0000000000420070 in mono_main_with_options (argv=0x7fffffffdbf8, argc=13) at main.c:45
#13 main (argc=13, argv=0x7fffffffdbf8) at main.c:338
Comment 5 Ludovic Henry 2017-01-18 16:43:27 UTC
Could you please check that https://github.com/mono/mono/pull/4263 fixes this issue? Thank you!
Comment 6 Ludovic Henry 2017-02-08 20:57:48 UTC
We witnessed equivalent bugs internally, so it should be fixed with https://github.com/mono/mono/pull/4348. Once it get merged, could you please confirm if you can still witness it? Thank you!
Comment 7 Ludovic Henry 2017-04-10 17:56:40 UTC
If you can still reproduce, please reopen the bug with more information. Thank you.