Bug 33020 - Hang in System.Environment.Exit
Summary: Hang in System.Environment.Exit
Status: CONFIRMED
Alias: None
Product: Runtime
Classification: Mono
Component: GC (show other bugs)
Version: 4.8.0 (C9)
Hardware: PC Linux
: --- normal
Target Milestone: ---
Assignee: Bugzilla
URL:
Depends on:
Blocks:
 
Reported: 2015-08-12 15:29 UTC by Neale Ferguson
Modified: 2017-09-06 15:34 UTC (History)
8 users (show)

See Also:
Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
MonoThreadInfo dump and Thread Backtrace (32.07 KB, text/plain)
2016-10-26 17:27 UTC, Neale Ferguson
Details

Description Neale Ferguson 2015-08-12 15:29:41 UTC
Using the following script:

while [ 1 ]
do 
    MONO_PATH=$HOME/mono/mcs/class/lib/net_4_x  ../../runtime/mono-wrapper subthread-exit.exe >/dev/null
    echo -n "$? "
done

After anywhere between 20 and 800 iterations the process will hang. 

gdb shows the following:

Thread 4 (Thread 0x7f5ed4fff700 (LWP 16717)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00000000005f78bc in thread_func (thread_data=0x0) at sgen-thread-pool.c:118
#2  0x00007f5ed5bf3182 in start_thread (arg=0x7f5ed4fff700) at pthread_create.c:312
#3  0x00007f5ed592047d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 3 (Thread 0x7f5ed2d15700 (LWP 16718)):
#0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
#1  0x000000000061ac07 in mono_sem_wait (sem=sem@entry=0x949c00 <finalizer_sem>, alertable=alertable@entry=1)
    at mono-semaphore.c:107
#2  0x00000000005a13b6 in finalizer_thread (unused=<optimized out>) at gc.c:1096
#3  0x0000000000583e2b in start_wrapper_internal (data=<optimized out>) at threads.c:723
#4  start_wrapper (data=<optimized out>) at threads.c:770
#5  0x0000000000621746 in inner_start_thread (arg=0x7ffc03654980) at mono-threads-posix.c:97
#6  0x00007f5ed5bf3182 in start_thread (arg=0x7f5ed2d15700) at pthread_create.c:312
#7  0x00007f5ed592047d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 2 (Thread 0x7f5ed2b14700 (LWP 16719)):
#0  0x00007f5ed5bfab9d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x0000000000633339 in monoeg_g_usleep (microseconds=60560) at gdate-unix.c:53
#2  0x000000000061f63b in suspend_sync_nolock (id=id@entry=140045301499840, interrupt_kernel=interrupt_kernel@entry=1)
    at mono-threads.c:869
#3  0x000000000061faa2 in mono_thread_info_safe_suspend_and_run (id=140045301499840, 
    interrupt_kernel=interrupt_kernel@entry=1, callback=callback@entry=0x580e90 <suspend_thread_critical>, 
    user_data=user_data@entry=0x7f5ed2b13430) at mono-threads.c:890
#4  0x0000000000581959 in suspend_thread_internal (thread=0x7f5ed667c130, interrupt=1) at threads.c:4532
#5  0x000000000058313f in mono_thread_suspend_all_other_threads () at threads.c:3152
#6  0x00000000005373e6 in ves_icall_System_Environment_Exit (result=0) at icall.c:5999
#7  0x00000000405376a2 in ?? ()
#8  0x00007f5ec4001910 in ?? ()
#9  0x00007f5ed5000620 in ?? ()
#10 0x00007f5ed5000448 in ?? ()
#11 0x00007f5ed5000448 in ?? ()
#12 0x00007f5ed5000560 in ?? ()
#13 0x00007f5ec4000e10 in ?? ()
#14 0x00000000405356f8 in ?? ()
#15 0x00007f5ed2b13a30 in ?? ()
#16 0x00007f5ed2b13910 in ?? ()
#17 0x00000000405356f3 in ?? ()
#18 0x00007f5ed2b13a30 in ?? ()
#19 0x00007f5ed2eb6fcb in System_Threading_ThreadHelper_ThreadStart_Context_object (state=0x0)
    at /root/mono/external/referencesource/mscorlib/system/threading/thread.cs:68
#20 0x00007f5ed2e8dc77 in System_Threading_ExecutionContext_RunInternal_System_Threading_ExecutionContext_System_Threading_ContextCallback_object_bool (executionContext=..., callback=..., state=0x7f5ed5000560, preserveSyncCtx=false)
    at /root/mono/external/referencesource/mscorlib/system/threading/executioncontext.cs:581
#21 0x00007f5ed2e8daa1 in System_Threading_ExecutionContext_Run_System_Threading_ExecutionContext_System_Threading_ContextCallback_object_bool (executionContext=..., callback=..., state=0x7f5ed5000560, preserveSyncCtx=false)
    at /root/mono/external/referencesource/mscorlib/system/threading/executioncontext.cs:530
#22 0x00007f5ed2e8da2a in System_Threading_ExecutionContext_Run_System_Threading_ExecutionContext_System_Threading_ContextCallback_object (executionContext=..., callback=..., state=0x7f5ed5000560)
    at /root/mono/external/referencesource/mscorlib/system/threading/executioncontext.cs:519
#23 0x00007f5ed2eb711f in System_Threading_ThreadHelper_ThreadStart (this=...)
    at /root/mono/external/referencesource/mscorlib/system/threading/thread.cs:105
#24 0x0000000040535669 in ?? ()
#25 0x000000000000001f in ?? ()
#26 0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f5ed672c7c0 (LWP 16708)):
#0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
#1  0x000000000061ac07 in mono_sem_wait (sem=sem@entry=0xd3a970, alertable=alertable@entry=0) at mono-semaphore.c:107
#2  0x000000000061f92a in mono_thread_info_wait_for_resume (info=<optimized out>) at mono-threads.c:110
#3  mono_thread_info_end_self_suspend () at mono-threads.c:691
#4  0x000000000058398c in self_suspend_internal (thread=0x7f5ed667c130) at threads.c:4547
#5  mono_thread_execute_interruption (thread=thread@entry=0x7f5ed667c130) at threads.c:4048
#6  0x0000000000584e39 in ves_icall_System_Threading_Thread_Sleep_internal (ms=1200) at threads.c:1204
#7  0x0000000040535542 in ?? ()
#8  0x0000000000da0960 in ?? ()
#9  0x00007ffc03654b70 in ?? ()
#10 0x0000000040534e40 in ?? ()
#11 0x0000000000d3fb10 in ?? ()
#12 0x0000000040534d50 in ?? ()
#13 0x0000000000d628e0 in ?? ()
#14 0x00007f5ed672c758 in ?? ()
#15 0x00007ffc03654880 in ?? ()
#16 0x00007ffc03654790 in ?? ()
#17 0x00007f5ed2eb79c7 in System_Threading_Thread_Sleep_int (millisecondsTimeout=1200)
    at /root/mono/external/referencesource/mscorlib/system/threading/thread.cs:715
#18 0x0000000040534de6 in ?? ()
#19 0x00007f5ed50004a8 in ?? ()
#20 0x00007f5ed5000448 in ?? ()
#21 0x0000000040534d50 in ?? ()
#22 0x0000000040534f28 in ?? ()
#23 0x000000000000001f in ?? ()
#24 0x0000000000000000 in ?? ()

Thread 2 is continually waking up and sleeping again, each time incrementing the microsecond count by 10.

Ubuntu 14.04.3 LTS - Linux ubuntu11-x64-01 3.16.0-45-generic #60~14.04.1-Ubuntu SMP Fri Jul 24 21:16:23 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Neale
Comment 1 Alexander Köplinger 2015-09-14 06:45:53 UTC
I think I see the same hang with a bunch of other runtime tests: https://bugzilla.xamarin.com/show_bug.cgi?id=33283
Comment 2 Mirco Bauer 2016-01-25 17:33:30 UTC
I can reproduce this issue on Mono 4.2.1. That test code hangs in 1 out of 10 runs for me.

mono -V
Mono JIT compiler version 4.2.1 (Debian 4.2.1.102+dfsg2-5)

This issue affects Smuxi, see https://smuxi.im/issues/show/1084
Comment 3 Mirco Bauer 2016-01-25 17:34:59 UTC
meebey@redhorse:~/Downloads$ mcs subthread-exit.cs
meebey@redhorse:~/Downloads$ mono subthread-exit.exe
Thread running
Thread exiting
meebey@redhorse:~/Downloads$ mono subthread-exit.exe
Thread running
Thread exiting
meebey@redhorse:~/Downloads$ mono subthread-exit.exe
Thread running
Thread exiting
meebey@redhorse:~/Downloads$ mono subthread-exit.exe
Thread running
Thread exiting

^C
meebey@redhorse:~/Downloads$ mono subthread-exit.exe
Thread running
Thread exiting
meebey@redhorse:~/Downloads$ mono subthread-exit.exe
Thread running
Thread exiting
^C
meebey@redhorse:~/Downloads$
Comment 4 Neale Ferguson 2016-01-25 17:54:31 UTC
Hmm, this seemed to fix our problem: https://github.com/mono/mono/commit/23e34d77f2fd23ab5d61f6ba4aa905817f57668a

But that's from September and would pre-date 4.2.1?
Comment 5 Jo Shields 2016-01-25 18:19:08 UTC
It's in master from September, but isn't in any tagged release
Comment 6 Llewelyn Trahaearn 2016-01-29 23:38:51 UTC
I can confirm that this issue effects Smuxi on my 64-bit Arch Linux system based around an Intel Nehalem i7. The test application subthread-exit hangs after attempting to run it successively in loop under the current 4.2.2.30 release from the Arch repository.

Pulling the modified mono-context.h from https://github.com/mono/mono/commit/23e34d77f2fd23ab5d61f6ba4aa905817f57668a into a build of 4.2.2.30 resolves the issue. Smuxi exits as expected and the test application subthread-exit executed correctly over a thousand times before I finally interrupted the loop.

A copy of the PKGBUILD is available here for inspection: https://gist.github.com/WoefulDerelict/568619501509d7a75281
Comment 7 Mirco Bauer 2016-01-30 11:07:47 UTC
Thank you for testing the patch. So the fix should get cherry-picked for the other release branches.
Comment 8 Jo Shields 2016-02-01 11:09:39 UTC
This commit is already in the branch for the next major release

I don't know if we plan to make another minor release based on the current major - I need to wait for the east coast to wake up for a definitive answer. If yes, I'll ask for the required QA approval to cherry-pick to that branch. Meanwhile, I'd recommend to downstreams that they cherry-pick this fix at a distribution level to their 4.2.2 packages.
Comment 9 Neale Ferguson 2016-10-13 14:17:07 UTC
A similar problem exists with 4.8 that rears its head more often. From the person running the test suite:

- It happens every time but I have to run all the tests, if I just run the 1
suite of tests its hanging in then it will work . It's like something is
happening in the other tests that's building up to when it hits this point and
can not exit anymore.

- I did find the offending code in the Environment.Exit which was trying to do
threading shutdown and commented it out and everything worked ok but 
what are the consequences of not doing this cleanup

- I did notice too some fixme comments on this method mentioning there is some
hanging issues they are trying to figure out too.

Problem description - 

When running tests using mono-4.8, eventually exiting the nunitrunner will get stuck trying to shutdown. In past releases this problem would arise sporadically but is now happening consistently:

(gdb) thr apply all bt

Thread 13 (Thread 0x7f4b691f9700 (LWP 121322)):
#0  0x00007f4b6fd046d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000072558f in mono_os_cond_wait (mutex=0xb21680 <lock>, cond=0xb21640 <work_cond>)
    at ../../mono/utils/mono-os-mutex.h:150
#2  thread_func (thread_data=0x0) at sgen-thread-pool.c:110
#3  0x00007f4b6fd00dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f4b6f817ced in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7f4b66ada700 (LWP 121323)):
#0  0x00007f4b6fd06790 in sem_wait () from /lib64/libpthread.so.0
#1  0x00000000005c69fc in mono_os_sem_wait (flags=MONO_SEM_FLAGS_ALERTABLE, sem=0xb119e0 <finalizer_sem>)
    at ../../mono/utils/mono-os-semaphore.h:209
#2  mono_coop_sem_wait (flags=MONO_SEM_FLAGS_ALERTABLE, sem=0xb119e0 <finalizer_sem>)
    at ../../mono/utils/mono-coop-semaphore.h:40
#3  finalizer_thread (unused=unused@entry=0x0) at gc.c:905
#4  0x00000000005a3c6c in start_wrapper_internal (stack_ptr=0x7f4b66ad9de8, start_info=0x0) at threads.c:760
#5  start_wrapper (data=0x1dcd880) at threads.c:824
#6  0x000000000074d33b in inner_start_thread (data=0x1dcd8e0) at mono-threads.c:1181
#7  0x00007f4b6fd00dc5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f4b6f817ced in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7f4b637ac700 (LWP 121324)):
#0  0x00007f4b6fd06790 in sem_wait () from /lib64/libpthread.so.0
#1  0x000000000074ae82 in mono_os_sem_wait (flags=MONO_SEM_FLAGS_NONE, sem=0x7f4b4c000920)
    at ../../mono/utils/mono-os-semaphore.h:209
#2  mono_thread_info_wait_for_resume (info=0x7f4b4c0008c0) at mono-threads.c:156
#3  0x00000000005a34ed in mono_thread_execute_interruption () at threads.c:4424
#4  0x00000000005a3b3c in mono_thread_interruption_checkpoint_request (
    bypass_abort_protection=bypass_abort_protection@entry=1) at threads.c:4544
#5  0x00000000005a498a in mono_thread_force_interruption_checkpoint_noraise () at threads.c:4569
---Type <return> to continue, or q <return> to quit---
#6  0x000000000047ec29 in mono_interruption_checkpoint_from_trampoline () at jit-icalls.c:1909
#7  0x00000000004b6234 in mono_magic_trampoline (regs=<optimized out>, 
    code=0x416ae074 "L\213\360H\205\300u\336", <incomplete sequence \353\232>, arg=0x1e2f6b0, tramp=<optimized out>)
    at mini-trampolines.c:831
#8  0x0000000041df128c in ?? ()
#9  0x0000000000000000 in ?? ()

Thread 10 (Thread 0x7f4b635ab700 (LWP 121325)):
#0  0x00007f4b6fd06790 in sem_wait () from /lib64/libpthread.so.0
#1  0x000000000074ae82 in mono_os_sem_wait (flags=MONO_SEM_FLAGS_NONE, sem=0x7f4b40000920)
    at ../../mono/utils/mono-os-semaphore.h:209
#2  mono_thread_info_wait_for_resume (info=0x7f4b400008c0) at mono-threads.c:156
#3  0x00000000005a34ed in mono_thread_execute_interruption () at threads.c:4424
#4  0x00000000005a3701 in mono_wait_uninterrupted (numhandles=numhandles@entry=1, handles=handles@entry=0x7f4b635aa4a8, 
    waitall=waitall@entry=0, ms=ms@entry=9999, error=error@entry=0x7f4b635aa4b0, thread=<optimized out>) at threads.c:1592
#5  0x00000000005a5856 in ves_icall_System_Threading_WaitHandle_WaitOne_internal (handle=0x406, ms=9999) at threads.c:1722
#6  0x000000004042c338 in ?? ()
#7  0x0000000000000002 in ?? ()
#8  0x00007f4b501d5018 in ?? ()
#9  0x00007f4b501d4fe8 in ?? ()
#10 0x000000000000270f in ?? ()
#11 0x0000000000000000 in ?? ()

Thread 9 (Thread 0x7f4b3deff700 (LWP 121331)):
#0  0x00007f4b6fd06790 in sem_wait () from /lib64/libpthread.so.0
#1  0x000000000074ae82 in mono_os_sem_wait (flags=MONO_SEM_FLAGS_NONE, sem=0x7f4b2c000920)
    at ../../mono/utils/mono-os-semaphore.h:209
#2  mono_thread_info_wait_for_resume (info=0x7f4b2c0008c0) at mono-threads.c:156
#3  0x00000000005a34ed in mono_thread_execute_interruption () at threads.c:4424
#4  0x00000000005a3b3c in mono_thread_interruption_checkpoint_request (bypass_abort_protection=<optimized out>)
    at threads.c:4544
---Type <return> to continue, or q <return> to quit---
#5  0x00000000416dd81c in ?? ()
#6  0x00007f4b5cadae28 in ?? ()
#7  0x00007f4b5cadadb8 in ?? ()
#8  0x000000000000000d in ?? ()
#9  0x00007f4b3defe750 in ?? ()
#10 0x0000000000000000 in ?? ()

Thread 8 (Thread 0x7f4b3dcfe700 (LWP 121362)):
#0  0x00007f4b6fd06790 in sem_wait () from /lib64/libpthread.so.0
#1  0x000000000074ae82 in mono_os_sem_wait (flags=MONO_SEM_FLAGS_NONE, sem=0x7f4b200044b0)
    at ../../mono/utils/mono-os-semaphore.h:209
#2  mono_thread_info_wait_for_resume (info=0x7f4b20004450) at mono-threads.c:156
#3  0x00000000005a34ed in mono_thread_execute_interruption () at threads.c:4424
#4  0x00000000005a5343 in ves_icall_System_Threading_Thread_Sleep_internal (ms=10000) at threads.c:1218
#5  0x00000000405a2932 in ?? ()
#6  0x00007f4b58076ed8 in ?? ()
#7  0x00007f4b58076e68 in ?? ()
#8  0x00007f4b58076e68 in ?? ()
#9  0x00007f4b58076e68 in ?? ()
#10 0x0000000000002710 in ?? ()
#11 0x00007f4b2000d9a0 in ?? ()
#12 0x00007f4b3dcfd310 in ?? ()
#13 0x00007f4b3dcfd840 in ?? ()
#14 0x00007f4b3dcfd360 in ?? ()
#15 0x00000000405a2891 in ?? ()
#16 0x00007f4b58076f30 in ?? ()
#17 0x00007f4b58076f30 in ?? ()
#18 0x00007f4b3dcfd840 in ?? ()
#19 0x00000000408691aa in ?? ()
#20 0x0000000000000000 in ?? ()

---Type <return> to continue, or q <return> to quit---
Thread 7 (Thread 0x7f4b1bfff700 (LWP 121364)):
#0  0x00007f4b6fd06790 in sem_wait () from /lib64/libpthread.so.0
#1  0x000000000074ae82 in mono_os_sem_wait (flags=MONO_SEM_FLAGS_NONE, sem=0x7f4b14000920)
    at ../../mono/utils/mono-os-semaphore.h:209
#2  mono_thread_info_wait_for_resume (info=0x7f4b140008c0) at mono-threads.c:156
#3  0x00000000005a34ed in mono_thread_execute_interruption () at threads.c:4424
#4  0x00000000005a5343 in ves_icall_System_Threading_Thread_Sleep_internal (ms=5000) at threads.c:1218
#5  0x00000000405a2932 in ?? ()
#6  0x00007f4b5b5cfb58 in ?? ()
#7  0x00007f4b5b5cfae8 in ?? ()
#8  0x00007f4b5b5cfae8 in ?? ()
#9  0x00007f4b5b5cfae8 in ?? ()
#10 0x0000000000001388 in ?? ()
#11 0x00007f4b14002200 in ?? ()
#12 0x0000000000000000 in ?? ()

Thread 6 (Thread 0x7f4b1b3fb700 (LWP 121397)):
#0  0x00007f4b6fd06790 in sem_wait () from /lib64/libpthread.so.0
#1  0x000000000074ae82 in mono_os_sem_wait (flags=MONO_SEM_FLAGS_NONE, sem=0x7f4afc000920)
    at ../../mono/utils/mono-os-semaphore.h:209
#2  mono_thread_info_wait_for_resume (info=0x7f4afc0008c0) at mono-threads.c:156
#3  0x00000000005a34ed in mono_thread_execute_interruption () at threads.c:4424
#4  0x00000000005a5343 in ves_icall_System_Threading_Thread_Sleep_internal (ms=3000) at threads.c:1218
#5  0x00000000405a2932 in ?? ()
#6  0x00007f4b5bf2dbd8 in ?? ()
#7  0x00007f4b1b3fa7e8 in ?? ()
#8  0x00007f4b5bf2dab0 in ?? ()
#9  0x00007f4b59e99868 in ?? ()
#10 0x0000000000000bb8 in ?? ()
#11 0x00007f4afc001d90 in ?? ()
#12 0x00007f4b1b3fa6d0 in ?? ()
---Type <return> to continue, or q <return> to quit---
#13 0x00007f4b1b3fa840 in ?? ()
#14 0x00007f4b1b3fa710 in ?? ()
#15 0x00000000405a2891 in ?? ()
#16 0x00007f4b59e99868 in ?? ()
#17 0x0000000040928c4c in ?? ()
#18 0x0000000040928c1c in ?? ()
#19 0x0000000040928a1a in ?? ()
#20 0x00007f4b5bf2dc30 in ?? ()
#21 0x0000000040429694 in ?? ()
#22 0x5f74656700656b6f in ?? ()
#23 0x00007f4b5bf2dab0 in ?? ()
#24 0x0000000002782c98 in ?? ()
#25 0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7f4b3f2d2700 (LWP 121944)):
#0  0x00007f4b6fd06790 in sem_wait () from /lib64/libpthread.so.0
#1  0x000000000074ae82 in mono_os_sem_wait (flags=MONO_SEM_FLAGS_NONE, sem=0x7f4b0400caa0)
    at ../../mono/utils/mono-os-semaphore.h:209
#2  mono_thread_info_wait_for_resume (info=0x7f4b0400ca40) at mono-threads.c:156
#3  0x00000000005a34ed in mono_thread_execute_interruption () at threads.c:4424
#4  0x00000000005a3b3c in mono_thread_interruption_checkpoint_request (
    bypass_abort_protection=bypass_abort_protection@entry=1) at threads.c:4544
#5  0x00000000005a498a in mono_thread_force_interruption_checkpoint_noraise () at threads.c:4569
#6  0x000000000047ec29 in mono_interruption_checkpoint_from_trampoline () at jit-icalls.c:1909
#7  0x00000000004b6234 in mono_magic_trampoline (regs=<optimized out>, 
    code=0x404baba0 "H\213\310D\211}\260H\211M\270H\205\300t\020H\213E\270H\213\370\350\004\200\223\001\306E\250", 
    arg=0x1e2f6b0, tramp=<optimized out>) at mini-trampolines.c:831
#8  0x0000000041df128c in ?? ()
#9  0x00007f4b3f2d160c in ?? ()
#10 0x00007f4b04009065 in ?? ()
#11 0x00000000404baba0 in ?? ()
---Type <return> to continue, or q <return> to quit---
#12 0x0000000000871a10 in ?? ()
#13 0x00007f4b3f2d16d0 in ?? ()
#14 0x00007f4b3f2d1588 in ?? ()
#15 0x00007f4b3f2d26c0 in ?? ()
#16 0x0000000000000001 in ?? ()
#17 0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7f4b1b7fd700 (LWP 121945)):
#0  0x00007f4b6f756972 in sigsuspend () from /lib64/libc.so.6
#1  0x000000000074e876 in suspend_signal_handler (_dummy=<optimized out>, info=<optimized out>, context=0x7f4b1b7fc040)
    at mono-threads-posix-signals.c:208
#2  <signal handler called>
#3  0x00007f4b6fd06790 in sem_wait () from /lib64/libpthread.so.0
#4  0x000000000041fc9a in mono_os_sem_wait (flags=<optimized out>, sem=<optimized out>)
    at ../../mono/utils/mono-os-semaphore.h:209
#5  mono_os_sem_timedwait (flags=MONO_SEM_FLAGS_ALERTABLE, timeout_ms=4294967295, sem=0x7f4b00001dc0)
    at ../../mono/utils/mono-os-semaphore.h:242
#6  mono_coop_sem_timedwait (sem=0x7f4b00001dc0, timeout_ms=timeout_ms@entry=4294967295, flags=MONO_SEM_FLAGS_ALERTABLE)
    at ../../mono/utils/mono-coop-semaphore.h:54
#7  0x00000000005ca246 in mono_monitor_try_enter_inflated (id=72, allow_interruption=1, ms=4294967295, obj=0x7f4b3ca162a8)
    at monitor.c:886
#8  ves_icall_System_Threading_Monitor_Monitor_wait (obj=0x7f4b3ca162a8, ms=<optimized out>) at monitor.c:1310
#9  0x00000000404bab83 in ?? ()
#10 0x00007f4b00000008 in ?? ()
#11 0x00007f4b3ca162a8 in ?? ()
#12 0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7f4b1b5fc700 (LWP 121946)):
#0  0x00007f4b6fd06790 in sem_wait () from /lib64/libpthread.so.0
#1  0x000000000041fc9a in mono_os_sem_wait (flags=<optimized out>, sem=<optimized out>)
    at ../../mono/utils/mono-os-semaphore.h:209
---Type <return> to continue, or q <return> to quit---
#2  mono_os_sem_timedwait (flags=MONO_SEM_FLAGS_ALERTABLE, timeout_ms=4294967295, sem=0x7f4b00001dc0)
    at ../../mono/utils/mono-os-semaphore.h:242
#3  mono_coop_sem_timedwait (sem=0x7f4b00001dc0, timeout_ms=timeout_ms@entry=4294967295, flags=MONO_SEM_FLAGS_ALERTABLE)
    at ../../mono/utils/mono-coop-semaphore.h:54
#4  0x00000000005ca246 in mono_monitor_try_enter_inflated (id=78, allow_interruption=1, ms=4294967295, obj=0x7f4b3ca162a8)
    at monitor.c:886
#5  ves_icall_System_Threading_Monitor_Monitor_wait (obj=0x7f4b3ca162a8, ms=<optimized out>) at monitor.c:1310
#6  0x00000000404bab83 in ?? ()
#7  0x00007f4b00000008 in ?? ()
#8  0x00007f4b3ca162a8 in ?? ()
#9  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7f4b1b1fa700 (LWP 121948)):
#0  0x00007f4b6fd06790 in sem_wait () from /lib64/libpthread.so.0
#1  0x000000000041fc9a in mono_os_sem_wait (flags=<optimized out>, sem=<optimized out>)
    at ../../mono/utils/mono-os-semaphore.h:209
#2  mono_os_sem_timedwait (flags=MONO_SEM_FLAGS_ALERTABLE, timeout_ms=4294967295, sem=0x7f4b00001dc0)
    at ../../mono/utils/mono-os-semaphore.h:242
#3  mono_coop_sem_timedwait (sem=0x7f4b00001dc0, timeout_ms=timeout_ms@entry=4294967295, flags=MONO_SEM_FLAGS_ALERTABLE)
    at ../../mono/utils/mono-coop-semaphore.h:54
#4  0x00000000005ca246 in mono_monitor_try_enter_inflated (id=79, allow_interruption=1, ms=4294967295, obj=0x7f4b3ca162a8)
    at monitor.c:886
#5  ves_icall_System_Threading_Monitor_Monitor_wait (obj=0x7f4b3ca162a8, ms=<optimized out>) at monitor.c:1310
#6  0x00000000404bab83 in ?? ()
#7  0x00007f4b00000008 in ?? ()
#8  0x00007f4b3ca162a8 in ?? ()
#9  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f4b70823780 (LWP 121321)):
#0  0x00007f4b6fd06790 in sem_wait () from /lib64/libpthread.so.0
#1  0x000000000074b87c in mono_os_sem_wait (flags=<optimized out>, sem=<optimized out>)
---Type <return> to continue, or q <return> to quit---
    at ../../mono/utils/mono-os-semaphore.h:209
#2  mono_os_sem_timedwait (sem=0xb222e0 <suspend_semaphore>, flags=MONO_SEM_FLAGS_NONE, timeout_ms=4294967295)
    at ../../mono/utils/mono-os-semaphore.h:242
#3  mono_threads_wait_pending_operations () at mono-threads.c:241
#4  0x000000000074c55d in suspend_sync (interrupt_kernel=1, tid=139960560637696) at mono-threads.c:928
#5  suspend_sync_nolock (interrupt_kernel=1, id=139960560637696) at mono-threads.c:945
#6  mono_thread_info_safe_suspend_and_run (id=139960560637696, interrupt_kernel=1, 
    callback=callback@entry=0x59e5c0 <async_suspend_critical>, user_data=user_data@entry=0x7ffe116ac6d0)
    at mono-threads.c:988
#7  0x000000000059ee3c in async_suspend_internal (thread=0x7f4ab6d0bc30, interrupt=<optimized out>) at threads.c:4867
#8  0x00000000005a204c in mono_thread_suspend_all_other_threads () at threads.c:3385
#9  0x000000000054d8d6 in ves_icall_System_Environment_Exit (result=0) at icall.c:6772
#10 0x00000000416eabd2 in ?? ()
#11 0x00007f4b500007c0 in ?? ()
#12 0x00007f4b500c1620 in ?? ()
#13 0x00007f4b500c2420 in ?? ()
#14 0x0000000000000000 in ?? ()
Comment 10 Neale Ferguson 2016-10-13 17:29:33 UTC
The area in question is in Icall.c.ves_icall_System_environment_exit(). Note the big fixme note. I'm not sure which one it was but I commented out these 2 lines in the method: 

Mono_thread_suspend_app_other_thread(); 
Mono_runtime_quit(); 

With those commented out all our tests and integration tests all worked without a problem but Im not sure what kind of problems it has with .net. Regardless, something in there trying to suspend threads seems to be a problem.
Comment 11 Neale Ferguson 2016-10-21 15:56:26 UTC
The symptoms sometimes change. In this case there are no more messages after the last one listed here:

-------------- Start of Thread Suspension ---------------
2016-10-21 07:28:38 [113058] SUSPENDING tid 0x7fb448768700
2016-10-21 07:28:38 [113058] ------ BEGIN GLOBAL OP sp 0 rp 0 ap 0 wd 0 po 0 (sp + rp + ap == wd) (wd == po)
2016-10-21 07:28:38 [113058] [ASYNC_SUSPEND_REQUESTED][0x7fb448768700] RUNNING -> ASYNC_SUSPEND_REQUESTED (0 -> 1)
2016-10-21 07:28:38 [113058] sending signal 39 to 0x7fb42c0008c0[0x7fb448768700]
2016-10-21 07:28:38 [113058] added 0x7fb448768700 to pending suspend
2016-10-21 07:28:38 [113058] [INITIATOR-WAIT-COUNT] 1
2016-10-21 07:28:38 [113058] SIGNAL HANDLER FOR 0x7fb448768700 [(nil)]
2016-10-21 07:28:38 [113058] [FINISH_ASYNC_SUSPEND][0x7fb448768700] ASYNC_SUSPEND_REQUESTED -> ASYNC_SUSPENDED (1 -> 1)
2016-10-21 07:28:38 [113058] [INITIATOR-NOTIFY-SUSPEND] 0x7fb448768700
2016-10-21 07:28:38 [113058] [INITIATOR-WAIT-WAITING]
2016-10-21 07:28:38 [113058] Suspending 1 threads took 0 ms.
2016-10-21 07:28:38 [113058] interrupt prepare    tid 0x7fb448768700 token 0x7fb42c005a30
2016-10-21 07:28:38 [113058] [RESUME][0x7fb448768700] ASYNC_SUSPENDED -> RUNNING (1 -> 0)
2016-10-21 07:28:38 [113058] added 0x7fb448768700 to pending suspend
2016-10-21 07:28:38 [113058] sending signal 37 to 0x7fb42c0008c0[0x7fb448768700]
2016-10-21 07:28:38 [113058] [INITIATOR-WAIT-COUNT] 1
2016-10-21 07:28:38 [113058] [INITIATOR-WAIT-WAITING]
2016-10-21 07:28:38 [113058] [INITIATOR-NOTIFY-RESUME] 0x7fb448768700
2016-10-21 07:28:38 [113058] Suspending 1 threads took 0 ms.
2016-10-21 07:28:38 [113058] ------ END GLOBAL OP sp 1 rp 1 ap 0 wd 2 po 2
2016-10-21 07:28:38 [113058] interrupt finish     token 0x7fb42c005a30
2016-10-21 07:28:38 [113058] SUSPENDING tid 0x7fb448969700
2016-10-21 07:28:38 [113058] ------ BEGIN GLOBAL OP sp 1 rp 1 ap 0 wd 2 po 2 (sp + rp + ap == wd) (wd == po)
2016-10-21 07:28:38 [113058] [ASYNC_SUSPEND_REQUESTED][0x7fb448969700] RUNNING -> ASYNC_SUSPEND_REQUESTED (0 -> 1)
2016-10-21 07:28:38 [113058] sending signal 39 to 0x7fb4280008c0[0x7fb448969700]
2016-10-21 07:28:38 [113058] added 0x7fb448969700 to pending suspend
2016-10-21 07:28:38 [113058] interrupt uninstall  tid 0x7fb448768700 previous_token 0xffffffffffffffff interrupted TRUE
2016-10-21 07:28:38 [113058] interrupt clear self tid 0x7fb448768700 previous_token (nil)
2016-10-21 07:28:38 [113058] [INITIATOR-WAIT-COUNT] 1
2016-10-21 07:28:38 [113058] BEGIN SELF SUSPEND OF 0x7fb42c0008c0
2016-10-21 07:28:38 [113058] SIGNAL HANDLER FOR 0x7fb448969700 [(nil)]
2016-10-21 07:28:38 [113058] [SELF_SUSPEND_REQUEST][0x7fb448768700] RUNNING -> SELF_SUSPEND_REQUESTED (0 -> 1)
2016-10-21 07:28:38 [113058] FINISH SELF SUSPEND OF 0x7fb42c0008c0
2016-10-21 07:28:38 [113058] [STATE_POLL][0x7fb448768700] SELF_SUSPEND_REQUESTED -> SELF_SUSPENDED (1 -> 1)
2016-10-21 07:28:38 [113058] [FINISH_ASYNC_SUSPEND][0x7fb448969700] ASYNC_SUSPEND_REQUESTED -> ASYNC_SUSPENDED (1 -> 1)
2016-10-21 07:28:38 [113058] [INITIATOR-NOTIFY-SUSPEND] 0x7fb448969700
2016-10-21 07:28:38 [113058] **WAIT self-resume 0x7fb448768700
2016-10-21 07:28:38 [113058] [INITIATOR-WAIT-WAITING]
2016-10-21 07:28:38 [113058] Suspending 1 threads took 0 ms.
2016-10-21 07:28:38 [113058] interrupt prepare    tid 0x7fb448969700 token 0x7fb42800e900
2016-10-21 07:28:38 [113058] [RESUME][0x7fb448969700] ASYNC_SUSPENDED -> RUNNING (1 -> 0)
2016-10-21 07:28:38 [113058] added 0x7fb448969700 to pending suspend
2016-10-21 07:28:38 [113058] sending signal 37 to 0x7fb4280008c0[0x7fb448969700]
2016-10-21 07:28:38 [113058] [INITIATOR-WAIT-COUNT] 1
2016-10-21 07:28:38 [113058] [INITIATOR-WAIT-WAITING]
2016-10-21 07:28:38 [113058] [INITIATOR-NOTIFY-RESUME] 0x7fb448969700
2016-10-21 07:28:38 [113058] Suspending 1 threads took 0 ms.
2016-10-21 07:28:38 [113058] interrupt uninstall  tid 0x7fb448969700 previous_token 0xffffffffffffffff interrupted TRUE
2016-10-21 07:28:38 [113058] ------ END GLOBAL OP sp 2 rp 2 ap 0 wd 4 po 4
2016-10-21 07:28:38 [113058] interrupt finish     token 0x7fb42800e900
2016-10-21 07:28:38 [113058] ------ BEGIN GLOBAL OP sp 2 rp 2 ap 0 wd 4 po 4 (sp + rp + ap == wd) (wd == po)
2016-10-21 07:28:38 [113058] sending signal 39 to 0x7fb4280008c0[0x7fb448969700]
2016-10-21 07:28:38 [113058] added 0x7fb448969700 to pending suspend
2016-10-21 07:28:38 [113058] [INITIATOR-WAIT-COUNT] 1
2016-10-21 07:28:38 [113058] [INITIATOR-WAIT-WAITING]

The traceback looks like this:

(gdb) thread apply all bt

Thread 13 (Thread 0x7f18f68ac700 (LWP 8570)):
#0  0x00007f18fd3b76d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000072564f in mono_os_cond_wait (mutex=0xb22680 <lock>, cond=0xb22640 <work_cond>)
    at ../../mono/utils/mono-os-mutex.h:150
#2  thread_func (thread_data=0x0) at sgen-thread-pool.c:110
#3  0x00007f18fd3b3dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f18fcecaced in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7f18f418d700 (LWP 8571)):
#0  0x00007f18fd3b9790 in sem_wait () from /lib64/libpthread.so.0
#1  0x00000000005c6aac in mono_os_sem_wait (flags=MONO_SEM_FLAGS_ALERTABLE, sem=0xb129e0 <finalizer_sem>)
    at ../../mono/utils/mono-os-semaphore.h:209
#2  mono_coop_sem_wait (flags=MONO_SEM_FLAGS_ALERTABLE, sem=0xb129e0 <finalizer_sem>)
    at ../../mono/utils/mono-coop-semaphore.h:40
#3  finalizer_thread (unused=unused@entry=0x0) at gc.c:905
#4  0x00000000005a3d1c in start_wrapper_internal (stack_ptr=0x7f18f418cde8, start_info=0x0) at threads.c:760
#5  start_wrapper (data=0x1f21890) at threads.c:824
#6  0x000000000074dedb in inner_start_thread (data=0x1f218f0) at mono-threads.c:1181
#7  0x00007f18fd3b3dc5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f18fcecaced in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7f18f0e55700 (LWP 8572)):
#0  0x00007f18fd3b9790 in sem_wait () from /lib64/libpthread.so.0
#1  0x000000000074afc2 in mono_os_sem_wait (flags=MONO_SEM_FLAGS_NONE, sem=0x7f18dc000920)
    at ../../mono/utils/mono-os-semaphore.h:209
#2  mono_thread_info_wait_for_resume (info=0x7f18dc0008c0) at mono-threads.c:156
#3  0x00000000005a359d in mono_thread_execute_interruption () at threads.c:4424
#4  0x00000000005a3bec in mono_thread_interruption_checkpoint_request (
    bypass_abort_protection=bypass_abort_protection@entry=1) at threads.c:4544
#5  0x00000000005a4a3a in mono_thread_force_interruption_checkpoint_noraise () at threads.c:4569
#6  0x000000000047ecc9 in mono_interruption_checkpoint_from_trampoline () at jit-icalls.c:1909
#7  0x00000000004b62c4 in mono_magic_trampoline (regs=<optimized out>,
    code=0x41e21f84 "L\213\360H\205\300u\336", <incomplete sequence \353\232>, arg=0x1f836c0, tramp=<optimized out>)
---Type <return> to continue, or q <return> to quit---
    at mini-trampolines.c:831
#8  0x000000004063028c in ?? ()
#9  0x0000000000000000 in ?? ()

Thread 10 (Thread 0x7f18f0c54700 (LWP 8573)):
#0  0x00007f18fd3b9790 in sem_wait () from /lib64/libpthread.so.0
#1  0x000000000074afc2 in mono_os_sem_wait (flags=MONO_SEM_FLAGS_NONE, sem=0x7f18d0000920)
    at ../../mono/utils/mono-os-semaphore.h:209
#2  mono_thread_info_wait_for_resume (info=0x7f18d00008c0) at mono-threads.c:156
#3  0x00000000005a359d in mono_thread_execute_interruption () at threads.c:4424
#4  0x00000000005a37b1 in mono_wait_uninterrupted (numhandles=numhandles@entry=1, handles=handles@entry=0x7f18f0c534a8,
    waitall=waitall@entry=0, ms=ms@entry=4184, error=error@entry=0x7f18f0c534b0, thread=<optimized out>)
    at threads.c:1592
#5  0x00000000005a5906 in ves_icall_System_Threading_WaitHandle_WaitOne_internal (handle=0x406, ms=4184)
    at threads.c:1722
#6  0x0000000040f20338 in ?? ()
#7  0x0000000000000002 in ?? ()
#8  0x00007f18e01d52d0 in ?? ()
#9  0x00007f18e01d52a0 in ?? ()
#10 0x0000000000001058 in ?? ()
#11 0x0000000000000000 in ?? ()

Thread 9 (Thread 0x7f18c35ff700 (LWP 8579)):
#0  0x00007f18fd3b9790 in sem_wait () from /lib64/libpthread.so.0
#1  0x000000000074afc2 in mono_os_sem_wait (flags=MONO_SEM_FLAGS_NONE, sem=0x7f18b0000920)
    at ../../mono/utils/mono-os-semaphore.h:209
#2  mono_thread_info_wait_for_resume (info=0x7f18b00008c0) at mono-threads.c:156
#3  0x00000000005a359d in mono_thread_execute_interruption () at threads.c:4424
#4  0x00000000005a3bec in mono_thread_interruption_checkpoint_request (bypass_abort_protection=<optimized out>)
    at threads.c:4544
#5  0x0000000041e5181c in ?? ()
#6  0x00007f18ecad9cf8 in ?? ()
#7  0x00007f18ecad9c88 in ?? ()
#8  0x000000000000000d in ?? ()
#9  0x00007f18c35fe750 in ?? ()
#10 0x0000000000000000 in ?? ()

Thread 8 (Thread 0x7f18c33fe700 (LWP 8611)):
#0  0x00007f18fd3b9790 in sem_wait () from /lib64/libpthread.so.0
#1  0x000000000074afc2 in mono_os_sem_wait (flags=MONO_SEM_FLAGS_NONE, sem=0x7f18b40044b0)
    at ../../mono/utils/mono-os-semaphore.h:209
#2  mono_thread_info_wait_for_resume (info=0x7f18b4004450) at mono-threads.c:156
#3  0x00000000005a359d in mono_thread_execute_interruption () at threads.c:4424
#4  0x00000000005a53f3 in ves_icall_System_Threading_Thread_Sleep_internal (ms=10000) at threads.c:1218
#5  0x0000000041096932 in ?? ()
#6  0x00007f18e8077500 in ?? ()
#7  0x00007f18e8077490 in ?? ()
#8  0x00007f18e8077490 in ?? ()
#9  0x00007f18e8077490 in ?? ()
#10 0x0000000000002710 in ?? ()
#11 0x00007f18b400cc80 in ?? ()
#12 0x00007f18c33fd310 in ?? ()
#13 0x00007f18c33fd840 in ?? ()
#14 0x00007f18c33fd360 in ?? ()
#15 0x0000000041096891 in ?? ()
#16 0x00007f18e8077558 in ?? ()
#17 0x00007f18e8077558 in ?? ()
#18 0x00007f18c33fd840 in ?? ()
#19 0x000000004135c95a in ?? ()
#20 0x0000000000000000 in ?? ()

Thread 7 (Thread 0x7f18a96ff700 (LWP 8613)):
#0  0x00007f18fd3b9790 in sem_wait () from /lib64/libpthread.so.0
#1  0x000000000074afc2 in mono_os_sem_wait (flags=MONO_SEM_FLAGS_NONE, sem=0x7f18a4000920)
    at ../../mono/utils/mono-os-semaphore.h:209
#2  mono_thread_info_wait_for_resume (info=0x7f18a40008c0) at mono-threads.c:156
#3  0x00000000005a359d in mono_thread_execute_interruption () at threads.c:4424
#4  0x00000000005a53f3 in ves_icall_System_Threading_Thread_Sleep_internal (ms=5000) at threads.c:1218
#5  0x0000000041096932 in ?? ()
#6  0x00007f18eb5cfa00 in ?? ()
#7  0x00007f18eb5cf990 in ?? ()
#8  0x00007f18eb5cf990 in ?? ()
#9  0x00007f18eb5cf990 in ?? ()
#10 0x0000000000001388 in ?? ()
#11 0x00007f18a4002200 in ?? ()
#12 0x0000000000000000 in ?? ()

Thread 6 (Thread 0x7f18a8afb700 (LWP 8654)):
#0  0x00007f18fd3b9790 in sem_wait () from /lib64/libpthread.so.0
#1  0x000000000074afc2 in mono_os_sem_wait (flags=MONO_SEM_FLAGS_NONE, sem=0x7f188c000920)
    at ../../mono/utils/mono-os-semaphore.h:209
#2  mono_thread_info_wait_for_resume (info=0x7f188c0008c0) at mono-threads.c:156
#3  0x00000000005a359d in mono_thread_execute_interruption () at threads.c:4424
#4  0x00000000005a53f3 in ves_icall_System_Threading_Thread_Sleep_internal (ms=3000) at threads.c:1218
#5  0x0000000041096932 in ?? ()
#6  0x00007f18ebf2c740 in ?? ()
#7  0x00007f18a8afa7e8 in ?? ()
#8  0x00007f18ebf2c618 in ?? ()
#9  0x00007f18e1e097a8 in ?? ()
#10 0x0000000000000bb8 in ?? ()
#11 0x00007f188c001d90 in ?? ()
#12 0x00007f18a8afa6d0 in ?? ()
#13 0x00007f18a8afa840 in ?? ()
#14 0x00007f18a8afa710 in ?? ()
#15 0x0000000041096891 in ?? ()
#16 0x00007f18e1e097a8 in ?? ()
#17 0x000000004141861c in ?? ()
#18 0x00000000414185ec in ?? ()
#19 0x00000000414183ea in ?? ()
#20 0x00007f18ebf2c798 in ?? ()
#21 0x0000000040f1d694 in ?? ()
#22 0x206f6e2700656b6f in ?? ()
#23 0x00007f18ebf2c618 in ?? ()
#24 0x00000000028d2a58 in ?? ()
#25 0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7f18a8cfc700 (LWP 9000)):
#0  0x00007f18fd3b9790 in sem_wait () from /lib64/libpthread.so.0
#1  0x000000000074afc2 in mono_os_sem_wait (flags=MONO_SEM_FLAGS_NONE, sem=0x7f18c4000920)
    at ../../mono/utils/mono-os-semaphore.h:209
#2  mono_thread_info_wait_for_resume (info=0x7f18c40008c0) at mono-threads.c:156
#3  0x00000000005a359d in mono_thread_execute_interruption () at threads.c:4424
#4  0x00000000005a3bec in mono_thread_interruption_checkpoint_request (
    bypass_abort_protection=bypass_abort_protection@entry=1) at threads.c:4544
#5  0x00000000005a4a3a in mono_thread_force_interruption_checkpoint_noraise () at threads.c:4569
#6  0x000000000047ecc9 in mono_interruption_checkpoint_from_trampoline () at jit-icalls.c:1909
#7  0x00000000004b62c4 in mono_magic_trampoline (regs=<optimized out>,
    code=0x40faeba0 "H\213\310D\211}\260H\211M\270H\205\300t\020H\213E\270H\213\370\350\004\060h\377\306E\250",
    arg=0x1f836c0, tramp=<optimized out>) at mini-trampolines.c:831
#8  0x000000004063028c in ?? ()
#9  0x00007f18a8cfb60c in ?? ()
#10 0x00007f18c4001d95 in ?? ()
#11 0x0000000040faeba0 in ?? ()
#12 0x0000000000872a10 in ?? ()
#13 0x00007f18a8cfb6d0 in ?? ()
#14 0x00007f18a8cfb588 in ?? ()
#15 0x00007f18a8cfc6c0 in ?? ()
#16 0x0000000000000001 in ?? ()
#17 0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7f18a88fa700 (LWP 9002)):
#0  0x00007f18fd3b9790 in sem_wait () from /lib64/libpthread.so.0
#1  0x000000000041fcda in mono_os_sem_wait (flags=<optimized out>, sem=<optimized out>)
    at ../../mono/utils/mono-os-semaphore.h:209
#2  mono_os_sem_timedwait (flags=MONO_SEM_FLAGS_ALERTABLE, timeout_ms=4294967295, sem=0x7f1890001dc0)
    at ../../mono/utils/mono-os-semaphore.h:242
#3  mono_coop_sem_timedwait (sem=0x7f1890001dc0, timeout_ms=timeout_ms@entry=4294967295, flags=MONO_SEM_FLAGS_ALERTABLE)
    at ../../mono/utils/mono-coop-semaphore.h:54
#4  0x00000000005ca2f6 in mono_monitor_try_enter_inflated (id=84, allow_interruption=1, ms=4294967295,
    obj=0x7f18c240ecf0) at monitor.c:886
#5  ves_icall_System_Threading_Monitor_Monitor_wait (obj=0x7f18c240ecf0, ms=<optimized out>) at monitor.c:1310
#6  0x0000000040faeb83 in ?? ()
#7  0x00007f1800000008 in ?? ()
#8  0x00007f18c240ecf0 in ?? ()
#9  0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7f18a90fe700 (LWP 9003)):
#0  0x00007f18fce09972 in sigsuspend () from /lib64/libc.so.6
#1  0x000000000074f416 in suspend_signal_handler (_dummy=<optimized out>, info=<optimized out>, context=0x7f18a90fd040)
    at mono-threads-posix-signals.c:208
#2  <signal handler called>
#3  0x00007f18fd3b9790 in sem_wait () from /lib64/libpthread.so.0
#4  0x000000000041fcda in mono_os_sem_wait (flags=<optimized out>, sem=<optimized out>)
    at ../../mono/utils/mono-os-semaphore.h:209
#5  mono_os_sem_timedwait (flags=MONO_SEM_FLAGS_ALERTABLE, timeout_ms=4294967295, sem=0x7f1890001dc0)
    at ../../mono/utils/mono-os-semaphore.h:242
#6  mono_coop_sem_timedwait (sem=0x7f1890001dc0, timeout_ms=timeout_ms@entry=4294967295, flags=MONO_SEM_FLAGS_ALERTABLE)
    at ../../mono/utils/mono-coop-semaphore.h:54
#7  0x00000000005ca2f6 in mono_monitor_try_enter_inflated (id=85, allow_interruption=1, ms=4294967295,
    obj=0x7f18c240ecf0) at monitor.c:886
#8  ves_icall_System_Threading_Monitor_Monitor_wait (obj=0x7f18c240ecf0, ms=<optimized out>) at monitor.c:1310
#9  0x0000000040faeb83 in ?? ()
#10 0x00007f1800000008 in ?? ()
#11 0x00007f18c240ecf0 in ?? ()
#12 0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7f18a8efd700 (LWP 9005)):
#0  0x00007f18fd3b9790 in sem_wait () from /lib64/libpthread.so.0
#1  0x000000000041fcda in mono_os_sem_wait (flags=<optimized out>, sem=<optimized out>)
    at ../../mono/utils/mono-os-semaphore.h:209
#2  mono_os_sem_timedwait (flags=MONO_SEM_FLAGS_ALERTABLE, timeout_ms=4294967295, sem=0x7f1890001dc0)
    at ../../mono/utils/mono-os-semaphore.h:242
#3  mono_coop_sem_timedwait (sem=0x7f1890001dc0, timeout_ms=timeout_ms@entry=4294967295, flags=MONO_SEM_FLAGS_ALERTABLE)
    at ../../mono/utils/mono-coop-semaphore.h:54
#4  0x00000000005ca2f6 in mono_monitor_try_enter_inflated (id=86, allow_interruption=1, ms=4294967295,
    obj=0x7f18c240ecf0) at monitor.c:886
#5  ves_icall_System_Threading_Monitor_Monitor_wait (obj=0x7f18c240ecf0, ms=<optimized out>) at monitor.c:1310
#6  0x0000000040faeb83 in ?? ()
#7  0x00007f1800000008 in ?? ()
#8  0x00007f18c240ecf0 in ?? ()
#9  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f18fded6780 (LWP 8569)):
#0  0x00007f18fd3b9790 in sem_wait () from /lib64/libpthread.so.0
#1  0x000000000074c33c in mono_os_sem_wait (flags=<optimized out>, sem=<optimized out>)
    at ../../mono/utils/mono-os-semaphore.h:209
#2  mono_os_sem_timedwait (sem=0xb232e0 <suspend_semaphore>, flags=MONO_SEM_FLAGS_NONE, timeout_ms=4294967295)
    at ../../mono/utils/mono-os-semaphore.h:242
#3  mono_threads_wait_pending_operations () at mono-threads.c:241
#4  0x000000000074d0fd in suspend_sync (interrupt_kernel=1, tid=139743892334336) at mono-threads.c:928
#5  suspend_sync_nolock (interrupt_kernel=1, id=139743892334336) at mono-threads.c:945
#6  mono_thread_info_safe_suspend_and_run (id=139743892334336, interrupt_kernel=1,
    callback=callback@entry=0x59e670 <async_suspend_critical>, user_data=user_data@entry=0x7ffcfc2a8b70)
    at mono-threads.c:988
#7  0x000000000059eeec in async_suspend_internal (thread=0x7f1865bed430, interrupt=<optimized out>) at threads.c:4867
#8  0x00000000005a20fc in mono_thread_suspend_all_other_threads () at threads.c:3385
#9  0x000000000054d97c in ves_icall_System_Environment_Exit (result=0) at icall.c:6774
#10 0x0000000041e5ebd2 in ?? ()
#11 0x00007f18e00007c0 in ?? ()
#12 0x00007f18e00c1620 in ?? ()
#13 0x00007f18e00c2420 in ?? ()
#14 0x0000000000000000 in ?? ()
(gdb)
Comment 12 Neale Ferguson 2016-10-26 17:27:08 UTC
Created attachment 18236 [details]
MonoThreadInfo dump and Thread Backtrace

When the hang was detected we used gdb to examine the threads. Attached are the MonoThreadInfo contents for all of the threads in the thread_list. At the bottom of that output is the output of gdb's thr apply all bt command. I am not sure how easy it would be to correlate the two lots of output.
Comment 13 Neale Ferguson 2016-11-15 22:51:52 UTC
After applying the recent sets of patches related to thread suspend/resume, the problem has not gone away. It's interesting with the patch it patched mono-threads-posix.c  in mono_threads_suspend_begin_async_resume but it seems that it was hanging in mono_threads_suspend_begin_async_suspend doing mono_threads_pthread_kill  so we are in the area but not in the resume but rather in the suspend. Here is the full stack:
 
(gdb) thread apply all bt
 
Thread 13 (Thread 0x7f15b93dc700 (LWP 12174)):
#0  0x00007f15bfee76d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000006372bf in mono_os_cond_wait (mutex=0x9c5c60 <lock>, cond=0x9c5c20 <work_cond>) at ../../mono/utils/mono-os-mutex.h:146
#2  thread_func (thread_data=0x7f15c0a1f008) at sgen-thread-pool.c:110
#3  0x00007f15bfee3dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f15bf9faced in clone () from /lib64/libc.so.6
 
Thread 12 (Thread 0x7f15b6c25700 (LWP 12175)):
#0  0x00007f15bfee9790 in sem_wait () from /lib64/libpthread.so.0
#1  0x00000000005cfe7c in mono_os_sem_wait (flags=MONO_SEM_FLAGS_ALERTABLE, sem=0x9b6900 <finalizer_sem>)
    at ../../mono/utils/mono-os-semaphore.h:209
#2  mono_coop_sem_wait (flags=MONO_SEM_FLAGS_ALERTABLE, sem=0x9b6900 <finalizer_sem>) at ../../mono/utils/mono-coop-semaphore.h:40
#3  finalizer_thread (unused=unused@entry=0x0) at gc.c:907
#4  0x00000000005aa996 in start_wrapper_internal (stack_ptr=0x7f15b6c24de8, start_info=0x0) at threads.c:851
#5  start_wrapper (data=0xd16570) at threads.c:902
#6  0x0000000000658d3b in inner_start_thread (data=0xd165d0) at mono-threads.c:1133
#7  0x00007f15bfee3dc5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f15bf9faced in clone () from /lib64/libc.so.6
 
Thread 11 (Thread 0x7f15b38e0700 (LWP 12177)):
#0  0x00007f15bfee76d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000646d7f in mono_os_cond_wait (mutex=0x9c5dc0 <signal_mutex>, cond=0x7f15b38df320) at mono-os-mutex.h:146
#2  mono_os_event_wait_multiple (events=events@entry=0x7f15b38df398, nevents=nevents@entry=1, waitall=waitall@entry=1,
    timeout=timeout@entry=4294967295) at os-event-unix.c:186
#3  0x000000000064704f in mono_os_event_wait_one (event=event@entry=0x115dac0, timeout=timeout@entry=4294967295) at os-event-unix.c:93
#4  0x00000000005a571f in self_suspend_internal () at threads.c:4837
#5  0x00000000005a9d0d in mono_thread_execute_interruption () at threads.c:4320
#6  0x00000000005aa50c in mono_thread_interruption_checkpoint_request (bypass_abort_protection=bypass_abort_protection@entry=1)
    at threads.c:4440
#7  0x00000000005ab81a in mono_thread_force_interruption_checkpoint_noraise () at threads.c:4465
#8  0x000000000047fe39 in mono_interruption_checkpoint_from_trampoline () at jit-icalls.c:1909
#9  0x00000000004b76d4 in mono_magic_trampoline (regs=<optimized out>,
    code=0x41a25404 "L\213\360H\205\300u\336", <incomplete sequence \353\232>, arg=0xda1cf8, tramp=<optimized out>)
    at mini-trampolines.c:831
#10 0x0000000041c9b38c in ?? ()
#11 0x0000000000000000 in ?? ()
 
Thread 10 (Thread 0x7f15b36df700 (LWP 12178)):
---Type <return> to continue, or q <return> to quit---   
#0  0x00007f15bfee76d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000646d7f in mono_os_cond_wait (mutex=0x9c5dc0 <signal_mutex>, cond=0x7f15b36de320) at mono-os-mutex.h:146
#2  mono_os_event_wait_multiple (events=events@entry=0x7f15b36de398, nevents=nevents@entry=1, waitall=waitall@entry=1,
    timeout=timeout@entry=4294967295) at os-event-unix.c:186
#3  0x000000000064704f in mono_os_event_wait_one (event=event@entry=0x145f1f0, timeout=timeout@entry=4294967295) at os-event-unix.c:93
#4  0x00000000005a571f in self_suspend_internal () at threads.c:4837
#5  0x00000000005a9d0d in mono_thread_execute_interruption () at threads.c:4320
#6  0x00000000005a9f21 in mono_wait_uninterrupted (numhandles=numhandles@entry=1, handles=handles@entry=0x7f15b36de4a8,
    waitall=waitall@entry=0, ms=ms@entry=9660, error=error@entry=0x7f15b36de4b0, thread=<optimized out>) at threads.c:1690
#7  0x00000000005ac856 in ves_icall_System_Threading_WaitHandle_WaitOne_internal (handle=0x402, ms=9660) at threads.c:1810
#8  0x0000000041eb79d8 in ?? ()
#9  0x0000000000000002 in ?? ()
#10 0x00007f15a01d8b98 in ?? ()
#11 0x00007f15a01d8b68 in ?? ()
#12 0x00000000000025bc in ?? ()
#13 0x0000000000000000 in ?? ()
 
Thread 9 (Thread 0x7f156bfff700 (LWP 12192)):
#0  0x00007f15bfee76d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000646d7f in mono_os_cond_wait (mutex=0x9c5dc0 <signal_mutex>, cond=0x7f156bffe500) at mono-os-mutex.h:146
#2  mono_os_event_wait_multiple (events=events@entry=0x7f156bffe578, nevents=nevents@entry=1, waitall=waitall@entry=1,
    timeout=timeout@entry=4294967295) at os-event-unix.c:186
#3  0x000000000064704f in mono_os_event_wait_one (event=event@entry=0x30c9830, timeout=timeout@entry=4294967295) at os-event-unix.c:93
#4  0x00000000005a571f in self_suspend_internal () at threads.c:4837
#5  0x00000000005a9d0d in mono_thread_execute_interruption () at threads.c:4320
#6  0x00000000005aa50c in mono_thread_interruption_checkpoint_request (bypass_abort_protection=<optimized out>) at threads.c:4440
#7  0x0000000041a538ec in ?? ()
#8  0x00007f15aff53db0 in ?? ()
#9  0x00007f15aff53d40 in ?? ()
#10 0x000000000000000d in ?? ()
#11 0x00007f156bffe740 in ?? ()
#12 0x0000000000000000 in ?? ()
 
Thread 8 (Thread 0x7f156bdfe700 (LWP 12223)):
#0  0x00007f15bfee76d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000646d7f in mono_os_cond_wait (mutex=0x9c5dc0 <signal_mutex>, cond=0x7f156bdfd230) at mono-os-mutex.h:146
#2  mono_os_event_wait_multiple (events=events@entry=0x7f156bdfd2a8, nevents=nevents@entry=1, waitall=waitall@entry=1,
    timeout=timeout@entry=4294967295) at os-event-unix.c:186
#3  0x000000000064704f in mono_os_event_wait_one (event=event@entry=0x3d68900, timeout=timeout@entry=4294967295) at os-event-unix.c:93
---Type <return> to continue, or q <return> to quit---
#4  0x00000000005a571f in self_suspend_internal () at threads.c:4837
#5  0x00000000005a9d0d in mono_thread_execute_interruption () at threads.c:4320
#6  0x00000000005ac3a3 in ves_icall_System_Threading_Thread_Sleep_internal (ms=10000) at threads.c:1318
#7  0x000000004203ffd2 in ?? ()
#8  0x00007f15ab606780 in ?? ()
#9  0x00007f15ab606710 in ?? ()
#10 0x00007f15ab606710 in ?? ()
#11 0x00007f15ab606710 in ?? ()
#12 0x0000000000002710 in ?? ()
#13 0x00007f155c00c580 in ?? ()
#14 0x00007f156bdfd320 in ?? ()
#15 0x00007f156bdfd840 in ?? ()
#16 0x00007f156bdfd370 in ?? ()
#17 0x000000004203ff31 in ?? ()
#18 0x00007f15ab6067d0 in ?? ()
#19 0x00007f15ab6067d0 in ?? ()
#20 0x00007f156bdfd840 in ?? ()
#21 0x00000000422ea1ee in ?? ()
#22 0x00007f156bdfd5b0 in ?? ()
#23 0x00000000005d7238 in mono_class_vtable (domain=<optimized out>, klass=<optimized out>) at object.c:1765
#24 0x00007f15c09a6708 in ?? ()
#25 0x00000000011f85e0 in ?? ()
#26 0x00000000422e9eb0 in ?? ()
#27 0x00000000011f85e0 in ?? ()
#28 0x0000000000000000 in ?? ()
 
Thread 7 (Thread 0x7f15586ff700 (LWP 12225)):
#0  0x00007f15bfee76d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000646d7f in mono_os_cond_wait (mutex=0x9c5dc0 <signal_mutex>, cond=0x7f15586fe570) at mono-os-mutex.h:146
#2  mono_os_event_wait_multiple (events=events@entry=0x7f15586fe5e8, nevents=nevents@entry=1, waitall=waitall@entry=1,
    timeout=timeout@entry=4294967295) at os-event-unix.c:186
#3  0x000000000064704f in mono_os_event_wait_one (event=event@entry=0x445bb60, timeout=timeout@entry=4294967295) at os-event-unix.c:93
#4  0x00000000005a571f in self_suspend_internal () at threads.c:4837
#5  0x00000000005a9d0d in mono_thread_execute_interruption () at threads.c:4320
#6  0x00000000005ac3a3 in ves_icall_System_Threading_Thread_Sleep_internal (ms=5000) at threads.c:1318
#7  0x000000004203ffd2 in ?? ()
#8  0x00007f15aeab0a60 in ?? ()
#9  0x00007f15aeab09f0 in ?? ()
#10 0x00007f15aeab09f0 in ?? ()
---Type <return> to continue, or q <return> to quit---
#11 0x00007f15aeab09f0 in ?? ()
#12 0x0000000000001388 in ?? ()
#13 0x00007f1554001dd0 in ?? ()
#14 0x0000000000000000 in ?? ()
 
Thread 6 (Thread 0x7f15537fb700 (LWP 12259)):
#0  0x00007f15bfee76d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000646d7f in mono_os_cond_wait (mutex=0x9c5dc0 <signal_mutex>, cond=0x7f15537fa5d0) at mono-os-mutex.h:146
#2  mono_os_event_wait_multiple (events=events@entry=0x7f15537fa648, nevents=nevents@entry=1, waitall=waitall@entry=1,
    timeout=timeout@entry=4294967295) at os-event-unix.c:186
#3  0x000000000064704f in mono_os_event_wait_one (event=event@entry=0x7f153800dca0, timeout=timeout@entry=4294967295) at os-event-unix.c:93
#4  0x00000000005a571f in self_suspend_internal () at threads.c:4837
#5  0x00000000005a9d0d in mono_thread_execute_interruption () at threads.c:4320
#6  0x00000000005ac3a3 in ves_icall_System_Threading_Thread_Sleep_internal (ms=3000) at threads.c:1318
#7  0x000000004203ffd2 in ?? ()
#8  0x00007f15af41e9f8 in ?? ()
#9  0x00007f15537fa7e8 in ?? ()
#10 0x00007f15af41e8d0 in ?? ()
#11 0x00007f151b382a20 in ?? ()
#12 0x0000000000000bb8 in ?? ()
#13 0x00007f153c001d90 in ?? ()
#14 0x00007f15537fa6d0 in ?? ()
#15 0x00007f15537fa840 in ?? ()
#16 0x00007f15537fa710 in ?? ()
#17 0x000000004203ff31 in ?? ()
#18 0x00007f151b382a20 in ?? ()
#19 0x00000000423a8abb in ?? ()
#20 0x00000000423a8a8b in ?? ()
#21 0x00000000423a885a in ?? ()
#22 0x00007f15af41ea48 in ?? ()
#23 0x0000000041eb4f44 in ?? ()
#24 0x00ff000000000000 in ?? ()
#25 0x00007f15af41e8d0 in ?? ()
#26 0x00000000016ebb88 in ?? ()
#27 0x0000000000000000 in ?? ()
 
Thread 5 (Thread 0x7f15539fc700 (LWP 13541)):
#0  0x00007f15bf939972 in sigsuspend () from /lib64/libc.so.6
#1  0x000000000065a8a6 in suspend_signal_handler (_dummy=<optimized out>, info=<optimized out>, context=0x7f15539fb040)
---Type <return> to continue, or q <return> to quit---
    at mono-threads-posix-signals.c:164
#2  <signal handler called>
#3  0x00007f15bfee9790 in sem_wait () from /lib64/libpthread.so.0
#4  0x000000000041fa01 in mono_os_sem_wait (flags=<optimized out>, sem=<optimized out>) at ../../mono/utils/mono-os-semaphore.h:209
#5  mono_os_sem_timedwait (flags=MONO_SEM_FLAGS_ALERTABLE, timeout_ms=4294967295, sem=0x7f1548009e10)
    at ../../mono/utils/mono-os-semaphore.h:242
#6  mono_coop_sem_timedwait (sem=0x7f1548009e10, timeout_ms=timeout_ms@entry=4294967295, flags=MONO_SEM_FLAGS_ALERTABLE)
    at ../../mono/utils/mono-coop-semaphore.h:54
#7  0x00000000005d38d0 in mono_monitor_try_enter_inflated (id=79, allow_interruption=1, ms=4294967295, obj=0x7f15697bb4d0) at monitor.c:905
#8  ves_icall_System_Threading_Monitor_Monitor_wait (obj=0x7f15697bb4d0, ms=<optimized out>) at monitor.c:1358
#9  0x0000000041f3dce3 in ?? ()
#10 0x00007f1500000008 in ?? ()
#11 0x00007f15697bb4d0 in ?? ()
#12 0x0000000000000000 in ?? ()
 
Thread 4 (Thread 0x7f15535fa700 (LWP 13542)):
#0  0x00007f15bfee76d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000646d7f in mono_os_cond_wait (mutex=0x9c5dc0 <signal_mutex>, cond=0x7f15535f9360) at mono-os-mutex.h:146
#2  mono_os_event_wait_multiple (events=events@entry=0x7f15535f93d8, nevents=nevents@entry=1, waitall=waitall@entry=1,
    timeout=timeout@entry=4294967295) at os-event-unix.c:186
#3  0x000000000064704f in mono_os_event_wait_one (event=event@entry=0x7f1564003f70, timeout=timeout@entry=4294967295) at os-event-unix.c:93
#4  0x00000000005a571f in self_suspend_internal () at threads.c:4837
#5  0x00000000005a9d0d in mono_thread_execute_interruption () at threads.c:4320
#6  0x00000000005aa50c in mono_thread_interruption_checkpoint_request (bypass_abort_protection=bypass_abort_protection@entry=1)
    at threads.c:4440
#7  0x00000000005ab81a in mono_thread_force_interruption_checkpoint_noraise () at threads.c:4465
#8  0x000000000047fe39 in mono_interruption_checkpoint_from_trampoline () at jit-icalls.c:1909
#9  0x00000000004b76d4 in mono_magic_trampoline (regs=<optimized out>,
    code=0x41f3dd00 "H\213\310D\211}\260H\211M\270H\205\300t\020H\213E\270H\213\370\350\244\357\325\377\306E\250", arg=0xda1cf8,
    tramp=<optimized out>) at mini-trampolines.c:831
#10 0x0000000041c9b38c in ?? ()
#11 0x00007f15535f9608 in ?? ()
#12 0x00007f154002f275 in ?? ()
#13 0x0000000041f3dd00 in ?? ()
#14 0x0000000000708ab0 in ?? ()
#15 0x00007f15535f96d0 in ?? ()
#16 0x00007f15535f9588 in ?? ()
#17 0x00007f15535fa6c0 in ?? ()
#18 0x0000000000000002 in ?? ()
---Type <return> to continue, or q <return> to quit---
#19 0x0000000000000000 in ?? ()
 
Thread 3 (Thread 0x7f1553dfe700 (LWP 13543)):
#0  0x00007f15bfee9790 in sem_wait () from /lib64/libpthread.so.0
#1  0x000000000041fa01 in mono_os_sem_wait (flags=<optimized out>, sem=<optimized out>) at ../../mono/utils/mono-os-semaphore.h:209
#2  mono_os_sem_timedwait (flags=MONO_SEM_FLAGS_ALERTABLE, timeout_ms=4294967295, sem=0x7f1548009e10)
    at ../../mono/utils/mono-os-semaphore.h:242
#3  mono_coop_sem_timedwait (sem=0x7f1548009e10, timeout_ms=timeout_ms@entry=4294967295, flags=MONO_SEM_FLAGS_ALERTABLE)
    at ../../mono/utils/mono-coop-semaphore.h:54
#4  0x00000000005d38d0 in mono_monitor_try_enter_inflated (id=83, allow_interruption=1, ms=4294967295, obj=0x7f15697bb4d0) at monitor.c:905
#5  ves_icall_System_Threading_Monitor_Monitor_wait (obj=0x7f15697bb4d0, ms=<optimized out>) at monitor.c:1358
#6  0x0000000041f3dce3 in ?? ()
#7  0x00007f1500000008 in ?? ()
#8  0x00007f15697bb4d0 in ?? ()
#9  0x0000000000000000 in ?? ()
 
Thread 2 (Thread 0x7f1553fff700 (LWP 13545)):
#0  0x00007f15bfee9790 in sem_wait () from /lib64/libpthread.so.0
#1  0x000000000041fa01 in mono_os_sem_wait (flags=<optimized out>, sem=<optimized out>) at ../../mono/utils/mono-os-semaphore.h:209
#2  mono_os_sem_timedwait (flags=MONO_SEM_FLAGS_ALERTABLE, timeout_ms=4294967295, sem=0x7f1548009e10)
    at ../../mono/utils/mono-os-semaphore.h:242
#3  mono_coop_sem_timedwait (sem=0x7f1548009e10, timeout_ms=timeout_ms@entry=4294967295, flags=MONO_SEM_FLAGS_ALERTABLE)
    at ../../mono/utils/mono-coop-semaphore.h:54
#4  0x00000000005d38d0 in mono_monitor_try_enter_inflated (id=84, allow_interruption=1, ms=4294967295, obj=0x7f15697bb4d0) at monitor.c:905
#5  ves_icall_System_Threading_Monitor_Monitor_wait (obj=0x7f15697bb4d0, ms=<optimized out>) at monitor.c:1358
#6  0x0000000041f3dce3 in ?? ()
#7  0x00007f1500000008 in ?? ()
#8  0x00007f15697bb4d0 in ?? ()
#9  0x0000000000000000 in ?? ()
 
Thread 1 (Thread 0x7f15c0a06780 (LWP 12173)):
#0  0x00007f15bfee8741 in pthread_kill () from /lib64/libpthread.so.0
#1  0x000000000065a6c8 in mono_threads_pthread_kill (signum=<optimized out>, info=0x7f15380512f0) at mono-threads-posix.c:180
#2  mono_threads_suspend_begin_async_suspend (info=0x7f15380512f0, interrupt_kernel=<optimized out>) at mono-threads-posix.c:267
#3  0x000000000065876b in suspend_sync (interrupt_kernel=1, tid=139729574020864) at mono-threads.c:875
#4  suspend_sync_nolock (interrupt_kernel=1, id=139729574020864) at mono-threads.c:907
#5  mono_thread_info_safe_suspend_and_run (id=139729574020864, interrupt_kernel=1,
    callback=callback@entry=0x5a4da0 <async_suspend_critical>, user_data=user_data@entry=0x7ffdcc2404b0) at mono-threads.c:950
#6  0x00000000005a5656 in async_suspend_internal (thread=0x7f1521eeacc0, interrupt=<optimized out>) at threads.c:4808
---Type <return> to continue, or q <return> to quit---
#7  0x00000000005a8896 in mono_thread_suspend_all_other_threads () at threads.c:3281
#8  0x0000000000554ca6 in ves_icall_System_Environment_Exit (result=0) at icall.c:6706
#9  0x0000000041a611a2 in ?? ()
#10 0x00007f15a00007c0 in ?? ()
#11 0x00007f15a00c23f8 in ?? ()
#12 0x00007f15a00c31f0 in ?? ()
#13 0x0000000000000000 in ?? ()
 
2.       For the Reflection.Emit crashing  in the Castle Dynamic proxy , its still hanging in the same spot and the lost log I gave you is the same. Its interesting that the type that  dynamic proxy is bad. Im wondering if some breaking changes to Reflection is tripping up the Dynamic proxy Generator which in turn is giving something bad to the TypeBuilder. Maybe like what we found where Reflection wasn’t giving all the field information related to events. For me these are breaking changes since its changing original functionality  but I was wondering if its possible to find all breaking changes like this since it might help track down this problem?   this was the error anyhow:
This was the error message you wanted me to print:
 
klass: TypeBuilderInstantiation error: Cannot bind generic parameters
>of a non-generic type assembly:DynamicProxyGenAssembly2
>type:InvocationGetService_74 member:<none>* Assertion at sre.c:1557,
>condition `res' not met
 
This was the original stack :
 
Assertion at sre.c:1553, condition `res' not met
Stacktrace:
 
  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) System.Reflection.Emit.TypeBuilder.create_runtime_class (System.Reflection.Emit.TypeBuilder) <IL 0x00013, 0x0005a>
  at System.Reflection.Emit.TypeBuilder.CreateTypeInfo () [0x0039c] in /build/monoTrunk/mcs/class/corlib/System.Reflection.Emit/TypeBuilder.cs:818
  at System.Reflection.Emit.TypeBuilder.CreateType () [0x00000] in /build/monoTrunk/mcs/class/corlib/System.Reflection.Emit/TypeBuilder.cs:743
  at Castle.DynamicProxy.Generators.Emitters.AbstractTypeEmitter.BuildType () <IL 0x0000e, 0x00049>
  at Castle.DynamicProxy.Generators.InterfaceProxyWithTargetGenerator.GenerateCode (System.Type,System.Type[],Castle.DynamicProxy.ProxyGenerationOptions) <IL 0x007bf, 0x01c9f>
  at Castle.DynamicProxy.DefaultProxyBuilder.CreateInterfaceProxyTypeWithoutTarget (System.Type,System.Type[],Castle.DynamicProxy.ProxyGenerationOptions) <IL 0x0002b, 0x0008f>
  at Castle.DynamicProxy.ProxyGenerator.CreateInterfaceProxyTypeWithoutTarget (System.Type,System.Type[],Castle.DynamicProxy.ProxyGenerationOptions) <IL 0x0000a, 0x0003e>
  at Castle.DynamicProxy.ProxyGenerator.CreateInterfaceProxyWithoutTarget (System.Type,System.Type[],Castle.DynamicProxy.ProxyGenerationOptions,Castle.Core.Interceptor.IInterceptor[]) <IL 0x00072, 0x000d7>
  at Castle.DynamicProxy.ProxyGenerator.CreateInterfaceProxyWithoutTarget (System.Type,System.Type[],Castle.Core.Interceptor.IInterceptor[]) <IL 0x0000a, 0x00037>
  at Rhino.Mocks.MockRepository.MockInterface (Rhino.Mocks.MockRepository/CreateMockState,System.Type,System.Type[]) <IL 0x00069, 0x0025f>
  at Rhino.Mocks.MockRepository.CreateMockObject (System.Type,Rhino.Mocks.MockRepository/CreateMockState,System.Type[],object[]) <IL 0x00081, 0x00107>
  at Rhino.Mocks.MockRepository.CreateMock<T_REF> (object[]) <IL 0x00060, 0x001bf>
  at Drms.Server.ClientFacing.Services.Tests.CaseBrowsingServiceTests.TestGetScenarioManagersForCurrentStudy () [0x00006] in /build/COFLOW/Src/Server/ClientFacing/Services.Tests/CaseBrowsingServiceTests.cs:403
  at (wrapper runtime-invoke) object.runtime_invoke_void__this__ (object,intptr,intptr,intptr) <IL 0x0004f, 0x000b8>
  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) System.Reflection.MonoMethod.InternalInvoke (System.Reflection.MonoMethod,object,object[],System.Exception&) <IL 0x00016, 0x0006c>
  at System.Reflection.MonoMethod.Invoke (object,System.Reflection.BindingFlags,System.Reflection.Binder,object[],System.Globalization.CultureInfo) [0x00038] in /build/monoTrunk/mcs/class/cor
Comment 14 Neale Ferguson 2016-11-24 10:43:01 UTC
Just another observation about the finalizing,  I noticed this:
 
/* Try to abort the thread, in the hope that it is running managed code */
                                                                mono_thread_internal_stop (gc_thread);
 
during the finalize there is a very good chance it's in non managed code since during the shutdown finalize we do an interop call into c++ to do some memory cleanup and shutdown. Not sure if that also could have some relationship but Ill throw it out there anyhow.
Comment 15 Neale Ferguson 2016-11-28 01:12:59 UTC
In gc.c at line 816 its setting the amount of time it waits for the finalizer thread to finish to 2 seconds because as they said Microsoft seems to do that. If I bump it to 20 seconds we no longer get the finalizer crashing so it looks like bumping it up is causing the finalizers to finish properly. The question is first if after the 2 seconds  if the thread isn’t done then it will try and abort the thread which doesn’t seem to cause it to abort.  Given that is there any way to more forcefully kill the thread.  The second question is if there is any harm to increasing the time to wait for it to shutdown and if not could this be increased or configurable given if we know that for our finalizers we will need more time to exit  and no form of aborting will kill it so we need to give it the time it needs to finish?
Comment 16 Zoltan Varga 2016-11-30 21:47:28 UTC
There is no configuration variable currently to do that, finalizers are expected to finish quickly, they shouldn't take more than 2 seconds to run.
Comment 17 Neale Ferguson 2016-12-01 15:42:30 UTC
Is this 2 seconds for all finalizers or 2 seconds per finalizer? This application has around 6 million of them. It appears to need about 6 seconds to run them all. Under .NET it works fine, post mono 4.4 we crash. Similarly, the comment that "/* Try to abort the thread, in the hope that it is running managed code */" may be overly optimistic.

Also, some additional findings:

In mono_gc_cleanup where we were causing the assert regaurding the finalizer thread not exiting Im a little confused by the logic. It will first wait 2 seconds for the finalizer to exit because as they said that’s what Microsoft does. If it hasn’t finished then it will do this:
 
/* Wait for it to stop */
ret = guarded_wait (gc_thread->handle, 100, TRUE);
 
We wait for the period then if this return comes back with a timeout it will hit this code:
 
if (ret == WAIT_TIMEOUT) {
printf("guarded wait timeout failed!!");                               
/*
 * The finalizer thread refused to die. There is not much we
 * can do here, since the runtime is shutting down so the
 * state the finalizer thread depends on will vanish.
*/
g_warning ("Shutting down finalizer thread timed out.");
timed_out = TRUE;
}
if (!timed_out) {
int ret;
                printf("infinate wait for exit.");                 
                /* Wait for the thread to actually exit */
                ret = guarded_wait (gc_thread->handle, INFINITE, TRUE);
                g_assert (ret == WAIT_OBJECT_0);
                mono_thread_join (GUINT_TO_POINTER (gc_thread->tid));
}
 
From what I gather if there is NOT a timeout then  it will do another guarded wait indefinitely. My question would be wouldn’t it be if there was a timeout from the previous wait then what again indefinitely?  In my test that was failing and removed the NOT and my error went away .
 
Actually in my case I found it was taking a total of 6 seconds to finish the finalizers which was more than the 2 seconds it was waiting. It's interesting it was exiting finalizers for WeakReference objects and lots of them, probably around 100,000 of them. I'm still not sure why its taking so long in these finalizers and what its doing but that’s where it is anyhow.
 
Also, this is all coming from our current version of mono of 4.4.2, I was looking in the trunk and it's quite a bit different so I'm wondering if someone noticed strange things and changed it. I noticed now its looking for a return of “MONO_THREAD_INFO_WAIT_RET_TIMEOUT”. Is there something fishy about this code in mono 4.4.2?
Comment 18 Zoltan Varga 2016-12-01 21:32:19 UTC
The app really shouldn't have 6m finalizable objects.
About the 4.4 vs master, there were lots of changes, the master code should be better. The 4.4 code didn't actually abort the finalizer thread, it called mono_thread_stop () which does something else. So I'm not sure how that actually worked.
Comment 19 Neale Ferguson 2016-12-05 17:28:50 UTC
I am not sure I am reading the code correctly. Is the 2 seconds for ALL finalizers or EACH finalizer? .NET appears to allow 2 seconds per finalizer to a maximum of 40 seconds for all finalizers.
Comment 20 Zoltan Varga 2016-12-05 18:09:37 UTC
Those values were correct when the code was written I think, which was a long time ago, maybe .net changed them since.
Comment 21 Neale Ferguson 2016-12-06 16:36:44 UTC
So, to confirm. The 2 seconds mono uses is the time for all finalizers not each finalizer?
Comment 22 Zoltan Varga 2016-12-06 23:18:16 UTC
Yes
Comment 23 Neale Ferguson 2016-12-06 23:29:16 UTC
I am trying to look for the canonical reference for the 2 second / 40 second specification, but there are plenty of references in stackoverflow, MSDN fora, Microsoft blogs, programming tutorials, and books such as ".NET Essential Basics". I will look at the code to see how hard it would be to enforce a 2 second per finalizer and 40 seconds overall. In the interim, would just increasing the existing 2 seconds to 40 be viable?
Comment 24 Zoltan Varga 2016-12-06 23:39:46 UTC
Yes, it would probably be okay if MS does it.
Comment 25 Neale Ferguson 2016-12-07 17:26:02 UTC
Changing the 2 to 40 is easy. That will cover the ALL finalizer case. To implement the 2 second per finalizer appears a little more complex. I think it needs to be implemented in mono_gc_run_finalize() and, in particular, ensuring whatever is backing runtime_invoke() in this sequence:

        if (log_finalizers)
                g_log ("mono-gc-finalizers", G_LOG_LEVEL_MESSAGE, "<%s at %p> Calling finalizer.", o->vtable->klass->name, o);

        mono_profiler_gc_finalize_object_begin (o);

        runtime_invoke (o, NULL, &exc, NULL);

        mono_profiler_gc_finalize_object_end (o);

        if (log_finalizers)
                g_log ("mono-gc-finalizers", G_LOG_LEVEL_MESSAGE, "<%s at %p> Returned from finalizer.", o->vtable->klass->name, o);

Gets restricted to 2 seconds. Running it on a thread and then using a mutex/timeout to constrain it to the time limit?

Note You need to log in before you can comment on or make changes to this bug.