Bug 36809 - crash in System.Net.Sockets.Socket.cancel_blocking_socket_operation
Summary: crash in System.Net.Sockets.Socket.cancel_blocking_socket_operation
Status: RESOLVED NORESPONSE
Alias: None
Product: Runtime
Classification: Mono
Component: io-layer (show other bugs)
Version: 4.2.0 (C6)
Hardware: PC Linux
: --- normal
Target Milestone: ---
Assignee: Bugzilla
URL:
Depends on:
Blocks:
 
Reported: 2015-12-10 08:33 UTC by paul firth
Modified: 2017-10-11 17:14 UTC (History)
5 users (show)

See Also:
Tags:
Is this bug a regression?: ---
Last known good build:


Attachments

Description paul firth 2015-12-10 08:33:59 UTC
Got a crash in the managed to native socket code which was so severe that even gdb then crashed. I don't have a repo case at the moment. 

OS is centos 6.

Here is the callstack:

STATE CUE CARD: (? means a positive number, usually 1 or 2)
        0x0     - starting (GOOD, unless the thread is running managed code)
        0x1     - running (BAD, unless it's the gc thread)
        0x2     - detached (GOOD, unless the thread is running managed code)
        0x?03   - async suspended (GOOD)
        0x?04   - self suspended (GOOD)
        0x?05   - async suspend requested (BAD)
        0x?06   - self suspend requested (BAD)
        0x07    - blocking (GOOD)
        0x?08   - blocking with pending suspend (GOOD)
--thread 0x7f02ac0008e0 id 0x7f02c3dfe700 [(nil)] state 1
--thread 0x7f02b40008e0 id 0x7f02c3fff700 [(nil)] state 1  GC INITIATOR
--thread 0x7f02bc0008e0 id 0x7f02dc24e700 [(nil)] state 1
--thread 0x7f02c80008e0 id 0x7f02dcf32700 [(nil)] state 1
--thread 0x7f02c40008e0 id 0x7f02dd133700 [(nil)] state 1
--thread 0x7f02d00008e0 id 0x7f02df9ff700 [(nil)] state 1
--thread 0x7f02cc0008e0 id 0x7f02dfdfe700 [(nil)] state 1
--thread 0x7f02d40008e0 id 0x7f02dffff700 [(nil)] state 1
--thread 0x7f02d80008e0 id 0x7f030504b700 [(nil)] state 1
--thread 0x7f02e40008e0 id 0x7f030524c700 [(nil)] state 1
--thread 0x7f02e00008e0 id 0x7f030544d700 [(nil)] state 1
--thread 0x7f02ec0008e0 id 0x7f030564e700 [(nil)] state 1
--thread 0x7f02e80008e0 id 0x7f030584f700 [(nil)] state 1
--thread 0x7f02f40008e0 id 0x7f0305a50700 [(nil)] state 1
--thread 0x7f02f0000fe0 id 0x7f0305a91700 [(nil)] state 1
--thread 0x7f02fc0008e0 id 0x7f0305ad2700 [(nil)] state 1
--thread 0x7f02f80008e0 id 0x7f0305ee1700 [(nil)] state 1
--thread 0x7f03000008e0 id 0x7f0306f5b700 [(nil)] state 1
--thread 0x1b9ec20 id 0x7f030a914760 [(nil)] state 1
WAITING for 1 threads, got 0 suspended
suspend_thread suspend took 0 ms, which is more than the allowed 200 ms
Stacktrace:
at <unknown> <0xffffffff>
  at (wrapper managed-to-native) System.Net.Sockets.Socket.cancel_blocking_socket_operation (System.Threading.Thread) <IL 0x0000d, 0xffffff
ff>
  at System.Net.Sockets.SafeSocketHandle.AbortRegisteredThreads () [0x00036] in /usr/local/src/mono-4.2.1/mcs/class/System/System.Net.Socke
ts/SafeSocketHandle.cs:113
  at System.Net.Sockets.SafeSocketHandle.ReleaseHandle () [0x00094] in /usr/local/src/mono-4.2.1/mcs/class/System/System.Net.Sockets/SafeSo
cketHandle.cs:65
  at System.Runtime.InteropServices.SafeHandle.DangerousReleaseInternal (bool) [0x000b2] in /usr/local/src/mono-4.2.1/mcs/class/corlib/Syst
em.Runtime.InteropServices/SafeHandle.cs:215
  at System.Runtime.InteropServices.SafeHandle.DisposeInternal () [0x00000] in /usr/local/src/mono-4.2.1/mcs/class/corlib/System.Runtime.In
teropServices/SafeHandle.cs:160
  at System.Runtime.InteropServices.SafeHandle.InternalDispose () [0x00011] in /usr/local/src/mono-4.2.1/mcs/class/corlib/System.Runtime.In
teropServices/SafeHandle.cs:149
  at System.Runtime.InteropServices.SafeHandle.Dispose (bool) [0x00006] in /usr/local/src/mono-4.2.1/external/referencesource/mscorlib/syst
em/runtime/interopservices/safehandle.cs:272
  at System.Runtime.InteropServices.SafeHandle.Dispose () [0x00000] in /usr/local/src/mono-4.2.1/external/referencesource/mscorlib/system/r
untime/interopservices/safehandle.cs:264
  at System.Net.Sockets.Socket.Dispose (bool) [0x00047] in /usr/local/src/mono-4.2.1/mcs/class/System/System.Net.Sockets/Socket.cs:3006
  at System.Net.Sockets.Socket.Dispose () [0x00000] in /usr/local/src/mono-4.2.1/mcs/class/System/System.Net.Sockets/Socket.cs:3012
  at System.Net.Sockets.Socket.Close () [0x00007] in /usr/local/src/mono-4.2.1/mcs/class/System/System.Net.Sockets/Socket.cs:2941
  at System.Net.HttpConnection.CloseSocket () [0x0000c] in /usr/local/src/mono-4.2.1/mcs/class/System/System.Net/HttpConnection.cs:436
  at System.Net.HttpConnection.OnTimeout (object) [0x00000] in /usr/local/src/mono-4.2.1/mcs/class/System/System.Net/HttpConnection.cs:164
  at System.Threading.Timer/Scheduler.TimerCB (object) [0x00007] in /usr/local/src/mono-4.2.1/mcs/class/corlib/System.Threading/Timer.cs:327
  at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem () [0x00019] in /usr/local/src/mono-4.2.1/external/referencesource/mscorlib/system/threading/threadpool.cs:1264
  at System.Threading.ThreadPoolWorkQueue.Dispatch () [0x00096] in /usr/local/src/mono-4.2.1/external/referencesource/mscorlib/system/threading/threadpool.cs:859
  at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback () [0x00000] in /usr/local/src/mono-4.2.1/external/referencesource/mscorlib/system/threading/threadpool.cs:1196
  at (wrapper runtime-invoke) <Module>.runtime_invoke_bool (object,intptr,intptr,intptr) <IL 0x00060, 0xffffffff>

  
  Native stacktrace:

        mono() [0x4a1038]
        /lib64/libpthread.so.0(+0xf500) [0x7f0309e6a500]
        /lib64/libc.so.6(gsignal+0x35) [0x7f0309afa8a5]
        /lib64/libc.so.6(abort+0x175) [0x7f0309afc085]
        mono() [0x628796]
        mono() [0x628583]
        mono() [0x628639]
        mono() [0x61ec14]
        mono() [0x61f374]
        [0x41836795]

Debug info from gdb:

Mono support loaded.
[New LWP 10578]
[New LWP 7559]
[New LWP 7558]
[New LWP 7560]
[New LWP 7452]
[New LWP 7450]
[New LWP 7409]
[New LWP 7408]
[New LWP 7406]
[New LWP 7375]
[New LWP 7374]
[New LWP 7373]
[New LWP 7372]
[New LWP 7371]
[New LWP 7370]
[New LWP 7368]
[New LWP 7366]
[New LWP 7365]
[New LWP 7364]
[Thread debugging using libthread_db enabled]
0x00007f0309e6954d in read () from /lib64/libpthread.so.0
  20 Thread 0x7f03093ff700 (LWP 7364)  0x00007f0309e6643c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  19 Thread 0x7f0306f5b700 (LWP 7365)  0x00007f0309e68720 in sem_wait () from /lib64/libpthread.so.0
  18 Thread 0x7f0305ee1700 (LWP 7366)  0x00007f0309e667bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  17 Thread 0x7f0305ad2700 (LWP 7368)  0x00007f0309ba7253 in poll () from /lib64/libc.so.6
  16 Thread 0x7f0305a50700 (LWP 7370)  0x00007f0309e6643c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  15 Thread 0x7f030584f700 (LWP 7371)  0x00007f0309e6643c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  14 Thread 0x7f030564e700 (LWP 7372)  0x00007f0309e6643c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  13 Thread 0x7f030544d700 (LWP 7373)  0x00007f0309e6643c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  12 Thread 0x7f030524c700 (LWP 7374)  0x00007f0309e6643c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  11 Thread 0x7f030504b700 (LWP 7375)  0x00007f0309e6643c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  10 Thread 0x7f02dffff700 (LWP 7406)  0x00007f0309e6643c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  9 Thread 0x7f02dfdfe700 (LWP 7408)  0x00007f0309e6643c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  8 Thread 0x7f02df9ff700 (LWP 7409)  0x00007f0309e6643c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  7 Thread 0x7f02dd133700 (LWP 7450)  0x00007f030a27ff94 in clock_nanosleep () from /lib64/librt.so.1
  6 Thread 0x7f02dcf32700 (LWP 7452)  0x00007f0309e6643c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5 Thread 0x7f02c3dfe700 (LWP 7560)  0x00007f0309e6643c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  4 Thread 0x7f02c3fff700 (LWP 7558)  0x00007f0309e6a09d in waitpid () from /lib64/libpthread.so.0
  3 Thread 0x7f02dc24e700 (LWP 7559)  0x00007f0309e6643c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2 Thread 0x7f0305a91700 (LWP 10578)  0x00007f030a27ff94 in clock_nanosleep () from /lib64/librt.so.1
* 1 Thread 0x7f030a914760 (LWP 7363)  0x00007f0309e6954d in read () from /lib64/libpthread.so.0

Thread 20 (Thread 0x7f03093ff700 (LWP 7364)):
#0  0x00007f0309e6643c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000005f72bc in thread_func (thread_data=0x0) at sgen-thread-pool.c:118
#2  0x00007f0309e62851 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f0309bb090d in clone () from /lib64/libc.so.6
Thread 19 (Thread 0x7f0306f5b700 (LWP 7365)):
#0  0x00007f0309e68720 in sem_wait () from /lib64/libpthread.so.0
#1  0x000000000061a437 in mono_sem_wait (sem=0x952740, alertable=1) at mono-semaphore.c:107
#2  0x00000000005a11a6 in finalizer_thread (unused=Unhandled dwarf expression opcode 0xf3
) at gc.c:1096
#3  0x0000000000583cfb in start_wrapper_internal (data=Unhandled dwarf expression opcode 0xf3
) at threads.c:723
#4  start_wrapper (data=Unhandled dwarf expression opcode 0xf3
) at threads.c:770
#5  0x0000000000620a26 in inner_start_thread (arg=0x7fff7e980110) at mono-threads-posix.c:97
#6  0x00007f0309e62851 in start_thread () from /lib64/libpthread.so.0
#7  0x00007f0309bb090d in clone () from /lib64/libc.so.6

Thread 18 (Thread 0x7f0305ee1700 (LWP 7366)):
#0  0x00007f0309e667bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000005fc397 in _wapi_handle_timedwait_signal_handle (handle=0x408, timeout=0x7f0305ee0670, alertable=1, poll=Unhandled dwarf expression opcode 0xf3
) at handles.c:1607
#2  0x000000000060daf2 in wapi_WaitForSingleObjectEx (handle=0x408, timeout=3565, alertable=1) at wait.c:196
#3  0x0000000000583b7f in mono_wait_uninterrupted (thread=0x7f030a86c330, multiple=0, numhandles=1, handles=0x7f0305ee0718, waitall=0, ms=3565, alertable=1) at threads.c:1447
#4  0x00000000005850b9 in ves_icall_System_Threading_WaitHandle_WaitOne_internal (this=Unhandled dwarf expression opcode 0xf3
) at threads.c:1581
#5  0x000000004169da7d in ?? ()
#6  0x0000000000000038 in ?? ()
#7  0x0000000002129d78 in ?? ()
#8  0x0000000000000ded in ?? ()
#9  0x00007f0309534ab0 in ?? ()
#10 0x0000000000000ded in ?? ()
#11 0x00007f02f8000e00 in ?? ()
#12 0x00007f0309534ab0 in ?? ()
#13 0x00007f0305ee07d0 in ?? ()
#14 0x00007f0305ee0740 in ?? ()
#15 0x00007f030729ae8a in System.Threading.WaitHandle:WaitOne (this=../../gdb/dwarf2-frame.c:694: internal-error: Unknown CFI encountered.
A problem internal to GDB has been detected,
Comment 1 Ludovic Henry 2015-12-15 21:05:53 UTC
The gdb crash is due to its inability to unwind the stack. Unfortunately that prevent us from having access to the remaining 17 threads stack.

Otherwise, can you please provide us a minimal test case that reproduce this crash? Thank you!
Comment 2 paul firth 2015-12-15 22:08:28 UTC
Briefly looking into the problem it appears that the semaphore which gets waited upon (timeout) for the native threads to terminate actually fails right away, which is why it prints "suspend_thread suspend took 0 ms, which is more than the allowed 200 ms", because 0ms is the amount of time it actually waits for the semaphore.

I don't know if that helps at all, it might just mean the threads are stuck due to some unrelated failure.

Creating a repro is very hard because this only happens after my site has been running for a day or so.
Comment 3 Rodrigo Kumpera 2017-10-11 17:14:19 UTC
We have not received the requested information. If you are still experiencing this issue please provide all the requested information and reopen the bug report.

Thank you!

Note You need to log in before you can comment on or make changes to this bug.