Bug 51219 - ves_icall_System_Threading_ThreadPool_RequestWorkerThread called after threadpool cleanup
Summary: ves_icall_System_Threading_ThreadPool_RequestWorkerThread called after thread...
Status: RESOLVED FIXED
Alias: None
Product: Runtime
Classification: Mono
Component: General (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- normal
Target Milestone: ---
Assignee: Ludovic Henry
URL:
Depends on:
Blocks:
 
Reported: 2017-01-04 16:13 UTC by Neale Ferguson
Modified: 2017-04-10 17:56 UTC (History)
5 users (show)

See Also:
Tags:
Is this bug a regression?: ---
Last known good build:


Attachments

Description Neale Ferguson 2017-01-04 16:13:48 UTC
When an nunit test runner shuts down it will crash with the stack trace. We start up and http listener and do Web requests to a controller so presumably during the nunit runner shutdown this http listener is exiting and we see this crash.

mono_os_mutex_trylock: pthread_mutex_trylock failed with "Invalid argument" (22)
Stacktrace:

  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) System.Threading.ThreadPool.RequestWorkerThread () [0x00006] in <307229303d3a4bcc809209ed3a306b36>:0
  at System.Threading.ThreadPoolWorkQueue.EnsureThreadRequested () [0x0001f] in build/mono/trunk/mcs/class/referencesource/mscorlib/system/threading/threadpool.cs:620
  at System.Threading.ThreadPoolWorkQueue.Enqueue (System.Threading.IThreadPoolWorkItem,bool) [0x0006c] in build/mono/trunk/mcs/class/referencesource/mscorlib/system/threading/threadpool.cs:679
  at System.Threading.ThreadPool.QueueUserWorkItemHelper (System.Threading.WaitCallback,object,System.Threading.StackCrawlMark&,bool) [0x00016] in build/mono/trunk/mcs/class/referencesource/mscorlib/system/threading/threadpool.cs:1680
  at System.Threading.ThreadPool.UnsafeQueueUserWorkItem (System.Threading.WaitCallback,object) [0x00002] in build/mono/trunk/mcs/class/referencesource/mscorlib/system/threading/threadpool.cs:1652
  at System.Net.ListenerAsyncResult.Complete (System.Exception) [0x00075] in /build/mono/trunk/mcs/class/System/System.Net/ListenerAsyncResult.cs:69
  at System.Net.HttpListener.Cleanup (bool) [0x00176] in /build/mono/trunk/mcs/class/System/System.Net/HttpListener.cs:314
  at System.Net.HttpListener.Close (bool) [0x0000c] in /build/mono/trunk/mcs/class/System/System.Net/HttpListener.cs:280
  at System.Net.HttpListener.Stop () [0x0000d] in /build/mono/trunk/mcs/class/System/System.Net/HttpListener.cs:405
  at Microsoft.Owin.Host.HttpListener.OwinHttpListener.Dispose () [0x00013] in <d87779ef62ca4c3698568e9d3e48e24e>:0
  at Microsoft.Owin.Hosting.Engine.HostingEngine/<>c__DisplayClass1.<Start>b__0 () [0x00006] in <f9c6a24237684643b127a582774aa6dc>:0
  at Microsoft.Owin.Hosting.Utilities.Disposable.Dispose () [0x00006] in <f9c6a24237684643b127a582774aa6dc>:0
  at Drms.Server.Service.Coordinator.WebApi.WebHost.HostWebThread (object) [0x000c7] in /build/COFLOW/Src/Server/Service/Coordinator/WebApi/WebHost.cs:95
  at System.Threading.ThreadHelper.ThreadStart_Context (object) [0x00025] in build/mono/trunk/mcs/class/referencesource/mscorlib/system/threading/thread.cs:72
  at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) [0x00071] in build/mono/trunk/mcs/class/referencesource/mscorlib/system/threading/executioncontext.cs:957
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) [0x00000] in build/mono/trunk/mcs/class/referencesource/mscorlib/system/threading/executioncontext.cs:904
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext,System.Threading.ContextCallback,object) [0x0002b] in build/mono/trunk/mcs/class/referencesource/mscorlib/system/threading/executioncontext.cs:893
  at System.Threading.ThreadHelper.ThreadStart (object) [0x0000f] in build/mono/trunk/mcs/class/referencesource/mscorlib/system/threading/thread.cs:87
  at (wrapper runtime-invoke) <Module>.runtime_invoke_void__this___object (object,intptr,intptr,intptr) [0x00053] in <307229303d3a4bcc809209ed3a306b36>:0

A display of the threadpool structure shows that the mutex has been destroyed which indicates the cleanup() routine has been called:

 p *threadpool
$3 = {ref = {ref = 3566501632, destructor = 0x7f82dacd8868 <main_arena+264>}, domains = 0x39e1ef0, domains_lock = {m = {__data = {__lock = 0, __count = 0, __owner = 0,
        __nusers = 0, __kind = -1, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 16 times>, "\377\377\377\377", '\000' <repeats 19 times>,
      __align = 0}}, threads = 0x39da040, threads_lock = {m = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = -1, __spins = 0, __list = {__prev = 0x0,
          __next = 0x0}}, __size = '\000' <repeats 16 times>, "\377\377\377\377", '\000' <repeats 19 times>, __align = 0}}, threads_exit_cond = {c = {__data = {__lock = 1,
        __futex = 0, __total_seq = 18446744073709551615, __wakeup_seq = 0, __woken_seq = 49, __mutex = 0x7f82dacd87d8 <main_arena+120>, __nwaiters = 3670902744,
        __broadcast_seq = 32642},
      __size = "\001\000\000\000\000\000\000\000\377\377\377\377\377\377\377\377\000\000\000\000\000\000\000\000\061\000\000\000\000\000\000\000Ø\315Ú\177\000\000Ø\315Ú\177\000", __align = 1}}, counters = {_ = {starting = 0, working = 0}, as_gint32 = 0}, limit_io_min = 24, limit_io_max = 200, worker = 0x30}
Comment 1 Neale Ferguson 2017-01-06 21:00:47 UTC
What triggers the threadpool cleanup? Should we be triggering an exception if a RequestWorkerThread is issued after the threadpool has been cleaned up or are we cleaning up too early?
Comment 2 Neale Ferguson 2017-01-09 18:33:22 UTC
Looking at the reference code, it seems to simply discard things rather than trying to be neat as the mono code appears to do.
Comment 3 Neale Ferguson 2017-01-11 00:52:34 UTC
What I am seeing is that mono_try_runtime_shutdown() is being called and it will perform the threadpool cleanup. It will then try to suspend other threads etc. This would seem to leave a window for an existing thread to request a worker thread and thus lead to the problem we are seeing.
Comment 4 Neale Ferguson 2017-01-11 22:55:23 UTC
Here is the state of the system at the time a Worker Thread Request is being made:

(gdb) br ves_icall_System_Threading_ThreadPool_RequestWorkerThread
Breakpoint 2 at 0x5b4d70: file threadpool.c, line 790.
(gdb) c
Continuing.
[Switching to Thread 0x7fffe6b3a700 (LWP 29377)]

Breakpoint 2, ves_icall_System_Threading_ThreadPool_RequestWorkerThread () at threadpool.c:790
790     {
(gdb) thr apply all bt

Thread 82 (Thread 0x7fffd7dff700 (LWP 29477)):
#0  0x00007ffff74b7c90 in __nptl_death_event () from /lib64/libpthread.so.0
#1  0x00007ffff74b8f1c in start_thread () from /lib64/libpthread.so.0
#2  0x00007ffff6fd021d in clone () from /lib64/libc.so.6

Thread 79 (Thread 0x7fffb45fe700 (LWP 29474)):
#0  0x00007ffff74b7c90 in __nptl_death_event () from /lib64/libpthread.so.0
#1  0x00007ffff74b8f1c in start_thread () from /lib64/libpthread.so.0
#2  0x00007ffff6fd021d in clone () from /lib64/libc.so.6

Thread 78 (Thread 0x7fffb4cff700 (LWP 29473)):
#0  0x00007ffff6fcabf7 in madvise () from /lib64/libc.so.6
#1  0x00007ffff74b8e71 in start_thread () from /lib64/libpthread.so.0
#2  0x00007ffff6fd021d in clone () from /lib64/libc.so.6

Thread 77 (Thread 0x7fffd477b700 (LWP 29472)):
#0  0x00007ffff74b7c90 in __nptl_death_event () from /lib64/libpthread.so.0
#1  0x00007ffff74b8f1c in start_thread () from /lib64/libpthread.so.0
#2  0x00007ffff6fd021d in clone () from /lib64/libc.so.6

Thread 75 (Thread 0x7fffb50a9700 (LWP 29470)):
#0  0x00007ffff74b7c90 in __nptl_death_event () from /lib64/libpthread.so.0
#1  0x00007ffff74b8f1c in start_thread () from /lib64/libpthread.so.0
#2  0x00007ffff6fd021d in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7fffd535b700 (LWP 29384)):
#0  0x00007ffff74bf4ad in accept () from /lib64/libpthread.so.0
#1  0x00000000006479c3 in _wapi_accept (fd=fd@entry=14, addr=addr@entry=0x0, addrlen=addrlen@entry=0x0) at sockets.c:200
#2  0x000000000059d66a in ves_icall_System_Net_Sockets_Socket_Accept_internal (sock=14, werror=0x7fffd535a798, blocking=<optimized out>)
    at socket-io.c:724
#3  0x00000000406d7051 in ?? ()
#4  0x00007ffff07e7b30 in ?? ()
#5  0x00007ffff07e7ac0 in ?? ()
#6  0x00007ffff07e7ac0 in ?? ()
#7  0x00007ffff07e7ac0 in ?? ()
#8  0x00007ffff07e3460 in ?? ()
#9  0x00007fffc473c700 in ?? ()
#10 0x00007fffd535a670 in ?? ()
#11 0x00007fffd535a780 in ?? ()
#12 0x00007fffd535a6e0 in ?? ()
#13 0x00000000406d6f80 in ?? ()
#14 0x4200656b6f766e49 in ?? ()
#15 0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7fffe6b3a700 (LWP 29377)):
#0  ves_icall_System_Threading_ThreadPool_RequestWorkerThread () at threadpool.c:790
#1  0x0000000040454f6c in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7fffe6d3b700 (LWP 29376)):
#0  0x00007ffff74bf4ad in accept () from /lib64/libpthread.so.0
#1  0x00000000006479c3 in _wapi_accept (fd=fd@entry=8, addr=addr@entry=0x0, addrlen=addrlen@entry=0x0) at sockets.c:200
#2  0x000000000059d66a in ves_icall_System_Net_Sockets_Socket_Accept_internal (sock=8, werror=0x7fffe6d3a7a8, blocking=<optimized out>)
    at socket-io.c:724
#3  0x0000000040066711 in ?? ()
#4  0x00007ffff0525a08 in ?? ()
#5  0x00007ffff0525ba8 in ?? ()
#6  0x00007ffff0525998 in ?? ()
#7  0x00007ffff0525ac8 in ?? ()
#8  0x00007ffff0510b20 in ?? ()
#9  0x00007fffe0001d90 in ?? ()
#10 0x00007fffe6d3a680 in ?? ()
#11 0x00007fffe6d3a790 in ?? ()
#12 0x00007fffe6d3a6f0 in ?? ()
#13 0x0000000040065fe0 in ?? ()
#14 0x4200656b6f766e49 in ?? ()
#15 0x02ae2d02a92d02a4 in ?? ()
#16 0x942d02942b02942b in ?? ()
#17 0x0000000000000001 in ?? ()
#18 0x00007fffe6d3a7a8 in ?? ()
#19 0x00007ffff0525400 in ?? ()
#20 0x0000000040065f76 in ?? ()
#21 0x00000000005b04d0 in ?? () at threads.c:4476
#22 0x00007ffff0525998 in ?? ()
#23 0x0000000040065fa0 in ?? ()
#24 0x00007fffe6d3a880 in ?? ()
#25 0x0000000040065e38 in ?? ()
#26 0x00007ffff0510a00 in ?? ()
#27 0x00007fff00000000 in ?? ()
#28 0x0000000040065e00 in ?? ()
#29 0x0000000000e8c730 in ?? ()
#30 0x0000000040065ce4 in ?? ()
#31 0x00000000005b04d0 in ?? () at threads.c:4476
#32 0x0000000040065b82 in ?? ()
#33 0x0000000040065e00 in ?? ()
#34 0x00007fffe6d3a880 in ?? ()
#35 0x0000000040065cb4 in ?? ()
#36 0x00007ffff0525998 in ?? ()
#37 0x00000000400659e4 in ?? ()
#38 0x01ee1901e71901e3 in ?? ()
#39 0xd11901d11901c719 in ?? ()
#40 0x942d02942b02942b in ?? ()
#41 0x2502872501fe1901 in ?? ()
#42 0x00007ffff0510298 in ?? ()
#43 0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7fffedf8e700 (LWP 29375)):
#0  0x00007ffff74b9ef7 in pthread_join () from /lib64/libpthread.so.0
#1  0x000000000065e01e in mono_native_thread_join (tid=tid@entry=140736219571968) at mono-threads-posix.c:266
#2  0x00000000005b1a79 in mono_threads_join_threads () at threads.c:4927
#3  0x00000000005d5cd5 in finalizer_thread (unused=unused@entry=0x0) at gc.c:931
#4  0x00000000005af656 in start_wrapper_internal (stack_ptr=0x7fffedf8ddf8, start_info=0x0) at threads.c:675
#5  start_wrapper (data=0xa30820) at threads.c:727
#6  0x000000000065c279 in inner_start_thread (data=0xa30880) at mono-threads.c:1163
#7  0x00007ffff74b8dc5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffff6fd021d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7ffff03ff700 (LWP 29374)):
#0  0x00007ffff74bc6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000063e9bf in mono_os_cond_wait (mutex=0x9c3c60 <lock>, cond=0x9c3c20 <work_cond>) at ../../mono/utils/mono-os-mutex.h:146
#2  thread_func (thread_data=0x7ffff7ff2008) at sgen-thread-pool.c:110
#3  0x00007ffff74b8dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007ffff6fd021d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7ffff7fd6780 (LWP 29370)):
#0  0x00007ffff74be790 in sem_wait () from /lib64/libpthread.so.0
#1  0x000000000065b55c in mono_os_sem_wait (flags=<optimized out>, sem=<optimized out>) at ../../mono/utils/mono-os-semaphore.h:209
#2  mono_os_sem_timedwait (sem=0x9c4740 <suspend_semaphore>, flags=MONO_SEM_FLAGS_NONE, timeout_ms=4294967295)
    at ../../mono/utils/mono-os-semaphore.h:242
#3  mono_threads_wait_pending_operations () at mono-threads.c:244
#4  0x000000000065c355 in suspend_sync (interrupt_kernel=1, tid=140736770455296) at mono-threads.c:913
#5  suspend_sync_nolock (interrupt_kernel=1, id=140736770455296) at mono-threads.c:930
#6  mono_thread_info_safe_suspend_and_run (id=140736770455296, interrupt_kernel=interrupt_kernel@entry=1,
    callback=callback@entry=0x5aa1b0 <async_abort_critical>, user_data=user_data@entry=0x7fffffffd3b0) at mono-threads.c:973
#7  0x00000000005aa31c in async_abort_internal (thread=0x7fffd539c508, install_async_abort=1) at threads.c:4735
#8  0x00000000005acb06 in remove_and_abort_threads (key=<optimized out>, value=0x7fffd539c508, user=0x7fffffffd490) at threads.c:3061
#9  0x00000000005da329 in mono_g_hash_table_foreach_remove (hash=0x9ff930, func=func@entry=0x5aca60 <remove_and_abort_threads>,
    user_data=user_data@entry=0x7fffffffd490) at mono-hash.c:353
#10 0x00000000005aee61 in mono_thread_manage () at threads.c:3174
#11 0x000000000047ac22 in mono_main (argc=13, argv=<optimized out>) at driver.c:2160
#12 0x0000000000420070 in mono_main_with_options (argv=0x7fffffffdbf8, argc=13) at main.c:45
#13 main (argc=13, argv=0x7fffffffdbf8) at main.c:338
Comment 5 Ludovic Henry 2017-01-18 16:43:27 UTC
Could you please check that https://github.com/mono/mono/pull/4263 fixes this issue? Thank you!
Comment 6 Ludovic Henry 2017-02-08 20:57:48 UTC
We witnessed equivalent bugs internally, so it should be fixed with https://github.com/mono/mono/pull/4348. Once it get merged, could you please confirm if you can still witness it? Thank you!
Comment 7 Ludovic Henry 2017-04-10 17:56:40 UTC
If you can still reproduce, please reopen the bug with more information. Thank you.

Note You need to log in before you can comment on or make changes to this bug.