Bug 29177

Summary: Mono-hosted OWIN server crashes under load.
Product: [Mono] Runtime Reporter: Shawn Burke <shawn>
Component: DebuggerAssignee: Zoltan Varga <vargaz>
Status: RESOLVED FIXED    
Severity: normal CC: adrian, mark, miguel, mono-bugs+mono, mono-bugs+runtime
Priority: ---    
Version: 3.12.0   
Target Milestone: ---   
Hardware: Other   
OS: Linux   
Tags: Is this bug a regression?: ---
Last known good build:
Attachments: Simple OWIN web-service that will respond to any path.
Thread.cs with Principal Workaround
Timer.cs with ManualResetEvent Native Handler Workaround

Description Shawn Burke 2015-04-17 04:07:20 UTC
Created attachment 10797 [details]
Simple OWIN web-service that will respond to any path.

When trying to run a OWIN-based webservice code to run on Linux, however mono crashes under concurrent load.

Unfortunately, under any concurrent load on Linux the process faults.  It seems much more stable on OSX.

Repro steps:

* Unzip the attached file on Linux
* Build
* Execute "mono OwinSimple.exe"
  - Note the "listening on http://+:9001, you can then call curl http://localhost:9001
* Run AB against the server to generate load:
  -  ab -n 10000 -c 100 http://localhost:9001/foo

Result:

@@@@@@@@ Usually I get:

Listening on 
	http://*:9001
CTRL+C to quit.
Stacktrace:


Native stacktrace:

	mono() [0x4accac]
	mono() [0x50451f]
	mono() [0x42a7c7]
	/lib/x86_64-linux-gnu/libpthread.so.0(+0xf0a0) [0x7f4d35bb80a0]

Debug info from gdb:


=================================================================
Got a SIGSEGV while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries 
used by your application.
=================================================================


@@@@@@ Occasionally I get an exception from SortedList:


Listening on 
	http://*:9001
CTRL+C to quit.

Unhandled Exception:
System.InvalidOperationException: Operation is not valid due to the current state of the object
  at System.Collections.SortedList.IndexOfKey (System.Object key) [0x00000] in <filename unknown>:0 
  at System.Threading.Timer+Scheduler.InternalRemove (System.Threading.Timer timer) [0x00000] in <filename unknown>:0 
  at System.Threading.Timer+Scheduler.Change (System.Threading.Timer timer, Int64 new_next_run) [0x00000] in <filename unknown>:0 
  at System.Threading.Timer.Change (Int64 dueTime, Int64 period, Boolean first) [0x00000] in <filename unknown>:0 
  at System.Threading.Timer.Change (Int32 dueTime, Int32 period) [0x00000] in <filename unknown>:0 
  at (wrapper remoting-invoke-with-check) System.Threading.Timer:Change (int,int)
  at System.Net.HttpConnection.BeginReadRequest () [0x00000] in <filename unknown>:0 
  at System.Net.EndPointListener.OnAccept (System.Object sender, System.EventArgs e) [0x00000] in <filename unknown>:0 
  at System.Net.Sockets.SocketAsyncEventArgs.OnCompleted (System.Net.Sockets.SocketAsyncEventArgs e) [0x00000] in <filename unknown>:0 
  at System.Net.Sockets.SocketAsyncEventArgs.AcceptCallback (IAsyncResult ares) [0x00000] in <filename unknown>:0 
  at System.Net.Sockets.SocketAsyncEventArgs.DispatcherCB (IAsyncResult ares) [0x00000] in <filename unknown>:0 
vagrant@vagrant-ubuntu-trusty-64:~/OwinSimple/Docker$ sudo docker run -p 8080:9001 owinsimple



@@@@@@@@@@@  I don't know if that's the same issue

Also, it's easy to repro this with the Mono Docker container 


1) Copy the following text into a Dockerfile, then build and place the bin/Debug contents into a subfolder called files:

   FROM mono:3.12.0

   COPY files /owin
   WORKDIR /owin

   EXPOSE 9001
   ENTRYPOINT ["mono", "--debug",  "OwinSimple.exe"]


2) Build the container: docker build -t owinsimple .
3) Run the container: sudo docker run -p 8088:9001 owinsimple
4) Run AB against it: ab -n 10000 -c 100 http://localhost:8088/foo
Comment 1 Miguel de Icaza [MSFT] 2015-04-17 11:58:03 UTC
On Linux, on a VM running Ubuntu 14.04.2 LTS I am unable to reproduce this issue with Mono 3.12.1:

Mono JIT compiler version 3.12.1 (tarball Fri Mar  6 19:12:47 UTC 2015)

I ran the described process multiple times without being able to reproduce.   Then I also tried the server with over a million requests, with concurrency set to 600 without any errors either.

I tried Mono 4.0.0 and Mono/master (as of yesterday), and could not reproduce.   It might have been a bug in Mono 3.12.0 that we fixed later.

I am also unable to reproduce this with a Mac with Mono 4.0.
Comment 2 Shawn Burke 2015-04-17 13:59:34 UTC
Thanks for looking at this so quickly.  Still seeing it.  I built these binaries using OSX, not sure if that matters.

==> I'm on a Virtual Box (Vagrant) Ubuntu 14.04.2 LTX VM as well:

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.2 LTS"
NAME="Ubuntu"
VERSION="14.04.2 LTS, Trusty Tahr"


==> I upgraded to 3.12.1 via apt-get mono-complete:


vagrant@vagrant-ubuntu-trusty-64:~/OwinSimple/OwinSimple/bin/Debug$ mono --version
Mono JIT compiler version 3.12.1 (tarball Fri Mar  6 19:12:47 UTC 2015)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
	TLS:           __thread
	SIGSEGV:       altstack
	Notifications: epoll
	Architecture:  amd64
	Disabled:      none
	Misc:          softdebug 
	LLVM:          supported, not enabled.
	GC:            sgen
vagrant@vagrant-ubuntu-trusty-64:~/OwinSimple/OwinSimple/bin/Debug$ mono OwinSimple.exe -p 8081
Listening on 
	http://*:8081
CTRL+C to quit.


vagrant@vagrant-ubuntu-trusty-64:~/OwinSimple/OwinSimple/bin/Debug$ mono OwinSimple.exe -p 8081
Listening on 
	http://*:8081
CTRL+C to quit.

==>  ab -n 10000 -c 100  http://localhost:8081/1418

Stacktrace:


Native stacktrace:

	mono() [0x4accac]
	mono() [0x50451f]
	mono() [0x42a7c7]
	/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340) [0x7fd5a3e29340]

Debug info from gdb:

Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No threads.

=================================================================
Got a SIGSEGV while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries 
used by your application.
=================================================================

Aborted (core dumped)
vagrant@vagrant-ubuntu-trusty-64:~/OwinSimple/OwinSimple/bin/Debug$ 

==> Sometimes I get...

 ==> ab -n 10000 -c 100  http://localhost:8081/1418

Unhandled Exception:
System.NullReferenceException: Object reference not set to an instance of an object
  at System.Net.HttpConnection.Close (Boolean force_close) [0x00000] in <filename unknown>:0 
  at System.Net.HttpListenerResponse.Close (Boolean force) [0x00000] in <filename unknown>:0 
  at System.Net.HttpListenerResponse.Abort () [0x00000] in <filename unknown>:0 
  at Microsoft.Owin.Host.HttpListener.RequestProcessing.OwinHttpListenerResponse.End () [0x00000] in <filename unknown>:0 
  at Microsoft.Owin.Host.HttpListener.RequestProcessing.OwinHttpListenerContext.End () [0x00000] in <filename unknown>:0 
  at Microsoft.Owin.Host.HttpListener.RequestProcessing.OwinHttpListenerContext.End (System.Exception ex) [0x00000] in <filename unknown>:0 
  at Microsoft.Owin.Host.HttpListener.OwinHttpListener+<ProcessRequestAsync>d__5.MoveNext () [0x00000] in <filename unknown>:0 
--- End of stack trace from previous location where exception was thrown ---
  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw () [0x00000] in <filename unknown>:0 
  at System.Runtime.CompilerServices.TaskAwaiter.GetResult () [0x00000] in <filename unknown>:0 
  at Microsoft.Owin.Host.HttpListener.OwinHttpListener+<ProcessRequestsAsync>d__0.MoveNext () [0x00000] in <filename unknown>:0
Comment 3 Miguel de Icaza [MSFT] 2015-04-17 14:42:47 UTC
Using Docker on the same Ubuntu configuration, with the 3.12.0 package does not exhibit this problem either.

I started the test at 1:24pm, and the process continues running without a hitch at 2:41pm.

Using this script:

while true; do ab -n 10000 -c 100 http://localhost:8088/foo; done
Comment 4 Miguel de Icaza [MSFT] 2015-04-17 14:51:35 UTC
Binaries you meant he "Owin.exe" app, or the Mono runtime itself?

The exe file should not matter where it was built.    The Mono runtime cross-compiled might be tricky.
Comment 5 Miguel de Icaza [MSFT] 2015-04-17 16:45:40 UTC
With Mono/Master (last update, from 072ab0db8356347fcceb5e41c80ab9441e3b6605 + a one line build fix), on Mac/64, I now get this:

Thread 14 (process 97552):
#0  0x00007fff87608582 in semaphore_timedwait_trap ()
#1  0x000000010783b872 in mono_sem_timedwait (sem=0x1079178c8, timeout_ms=<value temporarily unavailable, due to optimizations>, alertable=1) at mono-semaphore.c:64
#2  0x0000000107794dde in async_invoke_thread (data=<value temporarily unavailable, due to optimizations>) at threadpool.c:1665
#3  0x0000000107793eab in start_wrapper (data=<value temporarily unavailable, due to optimizations>) at threads.c:683
#4  0x000000010784308e in inner_start_thread (arg=<value temporarily unavailable, due to optimizations>) at mono-threads-posix.c:96
#5  0x00007fff861972fc in _pthread_body ()
#6  0x00007fff86197279 in _pthread_start ()
#7  0x00007fff861954b1 in thread_start ()

Thread 13 (process 97552):
#0  0x00007fff8760d162 in __psynch_mutexwait ()
#1  0x00007fff8619581e in _pthread_mutex_lock ()
#2  0x000000010780a482 in mono_gc_alloc_obj (vtable=0x7ff41b11da18, size=80) at sgen-alloc.c:503
#3  0x0000000107bb41c0 in ?? ()
#4  0x00000001094db890 in System_Threading_Tasks_Task_ContinueWith_System_Action_1_System_Threading_Tasks_Task_System_Threading_Tasks_TaskContinuationOptions (this=@0x107ff8990, continuationAction=@0x10cf6a930, continuationOptions=OnlyOnFaulted) at Task.cs:3865
#5  0x0000000109461590 in System_Runtime_CompilerServices_AsyncMethodBuilderCore_MoveNextRunner_InvokeMoveNext_object (stateMachine=0x50) at AsyncMethodBuilder.cs:1076
#6  0x000000010957d637 in System_Threading_ExecutionContext_Run_System_Threading_ExecutionContext_System_Threading_ContextCallback_object_bool (executionContext=@0x107e306a0, callback=@0x10c8de6b0, state=0x107c5bca8) at ExecutionContext.cs:229
#7  0x00000001094e2fe2 in System_Threading_Tasks_AwaitTaskContinuation_RunOrScheduleAction_System_Action_bool_System_Threading_Tasks_Task_ (action=@0x7ff41b11da18, allowInlining=true, currentTask=140686460464792) at TaskContinuation.cs:835
#8  0x00000001094db28a in System_Threading_Tasks_Task_FinishContinuations (this=@0x107e30550) at Task.cs:3650
#9  0x00000001094d9964 in System_Threading_Tasks_Task_FinishStageThree (this=@0x107e30550) at Task.cs:2387
#10 0x00000001096fb5a8 in System_Threading_Tasks_TaskFactory_1__FromAsyncImplc__AnonStorey1__m__0_System_IAsyncResult (this=@0x107e305a0, iar=@0x107e30640) at FutureFactory.cs:860
#11 0x0000000107bb1c43 in ?? ()
#12 0x0000000107602609 in mono_jit_runtime_invoke (method=<value temporarily unavailable, due to optimizations>, obj=<value temporarily unavailable, due to optimizations>, params=0x107ff8928, exc=0x107e30550) at mini-runtime.c:2369
#13 0x00000001077bc87e in mono_runtime_invoke (method=0x7ff41b120878, obj=0x10c8df530, params=0x107ff8928, exc=0x10fb00e00) at object.c:2836
#14 0x00000001077c4ff2 in mono_async_result_invoke (ares=0x107ff8918, exc=0x10fb00e00) at object.c:6025
#15 0x0000000107794fb6 in async_invoke_thread (data=<value temporarily unavailable, due to optimizations>) at threadpool.c:591
#16 0x0000000107793eab in start_wrapper (data=<value temporarily unavailable, due to optimizations>) at threads.c:683
#17 0x000000010784308e in inner_start_thread (arg=<value temporarily unavailable, due to optimizations>) at mono-threads-posix.c:96
#18 0x00007fff861972fc in _pthread_body ()
#19 0x00007fff86197279 in _pthread_start ()
#20 0x00007fff861954b1 in thread_start ()

Thread 12 (process 97552):
#0  0x00007fff8760d162 in __psynch_mutexwait ()
#1  0x00007fff8619581e in _pthread_mutex_lock ()
#2  0x000000010780aeea in mono_gc_alloc_vector (vtable=0x7ff41d037938, size=48, max_length=14) at sgen-alloc.c:532
#3  0x0000000107bb2d36 in ?? ()
#4  0x0000000109551475 in Mono_Globalization_Unicode_SortKeyBuffer_Initialize_System_Globalization_CompareOptions_int_string_bool (this=@0x107ff59a0, options=IgnoreCase, lcid=486766904, s=0x30, frenchSort=false) at SortKeyBuffer.cs:86
#5  0x000000010c7de183 in ?? ()

Thread 11 (process 97552):
#0  0x00007fff8760d8fe in __wait4 ()
#1  0x000000010769f351 in mono_handle_native_sigsegv (signal=<value temporarily unavailable, due to optimizations>, ctx=0x10caa515c, info=0x0) at mini-exceptions.c:2362
#2  0x00000001076faeaa in mono_arch_handle_altstack_exception (sigctx=0x10caa5f48, siginfo=<value temporarily unavailable, due to optimizations>, fault_addr=0x0, stack_ovf=0) at exceptions-amd64.c:856
#3  0x0000000107600054 in mono_sigsegv_signal_handler (_dummy=<value temporarily unavailable, due to optimizations>, _info=<value temporarily unavailable, due to optimizations>, context=<value temporarily unavailable, due to optimizations>) at mini-runtime.c:2496
#4  <signal handler called>
#5  0x0000000000000000 in ?? ()
#6  0x000000010c7dd308 in ?? ()
#7  0x0000000109461590 in System_Runtime_CompilerServices_AsyncMethodBuilderCore_MoveNextRunner_InvokeMoveNext_object (stateMachine=0x107ffd400) at AsyncMethodBuilder.cs:1076
#8  0x000000010957d637 in System_Threading_ExecutionContext_Run_System_Threading_ExecutionContext_System_Threading_ContextCallback_object_bool (executionContext=@0x107c10eb8, callback=@0x10c8de6b0, state=0x107c13000) at ExecutionContext.cs:229
#9  0x00000001094e2fe2 in System_Threading_Tasks_AwaitTaskContinuation_RunOrScheduleAction_System_Action_bool_System_Threading_Tasks_Task_ (action=@0x107ffd308, allowInlining=true, currentTask=140686437138584) at TaskContinuation.cs:835
#10 0x00000001094db28a in System_Threading_Tasks_Task_FinishContinuations (this=@0x107c10e28) at Task.cs:3650
#11 0x00000001094d9964 in System_Threading_Tasks_Task_FinishStageThree (this=@0x107c10e28) at Task.cs:2387
#12 0x0000000109461590 in System_Runtime_CompilerServices_AsyncMethodBuilderCore_MoveNextRunner_InvokeMoveNext_object (stateMachine=0x107c10e28) at AsyncMethodBuilder.cs:1076
#13 0x000000010957d637 in System_Threading_ExecutionContext_Run_System_Threading_ExecutionContext_System_Threading_ContextCallback_object_bool (executionContext=@0x107c10d70, callback=@0x10c8de6b0, state=0x107c10e78) at ExecutionContext.cs:229
#14 0x00000001094e2fe2 in System_Threading_Tasks_AwaitTaskContinuation_RunOrScheduleAction_System_Action_bool_System_Threading_Tasks_Task_ (action=@0x0, allowInlining=true, currentTask=140686437138584) at TaskContinuation.cs:835
#15 0x00000001094db28a in System_Threading_Tasks_Task_FinishContinuations (this=@0x107c10cc0) at Task.cs:3650
#16 0x00000001094d9964 in System_Threading_Tasks_Task_FinishStageThree (this=@0x107c10cc0) at Task.cs:2387
#17 0x0000000109461590 in System_Runtime_CompilerServices_AsyncMethodBuilderCore_MoveNextRunner_InvokeMoveNext_object (stateMachine=0x107c10cc0) at AsyncMethodBuilder.cs:1076
#18 0x000000010957d637 in System_Threading_ExecutionContext_Run_System_Threading_ExecutionContext_System_Threading_ContextCallback_object_bool (executionContext=@0x107c10c08, callback=@0x10c8de6b0, state=0x107c10d10) at ExecutionContext.cs:229
#19 0x00000001094e2fe2 in System_Threading_Tasks_AwaitTaskContinuation_RunOrScheduleAction_System_Action_bool_System_Threading_Tasks_Task_ (action=@0x0, allowInlining=true, currentTask=140686437138584) at TaskContinuation.cs:835
#20 0x00000001094db28a in System_Threading_Tasks_Task_FinishContinuations (this=@0x107c02a58) at Task.cs:3650
#21 0x00000001094d9964 in System_Threading_Tasks_Task_FinishStageThree (this=@0x107c02a58) at Task.cs:2387
#22 0x000000010973b107 in System_Threading_Tasks_TaskFactory_1_FromAsyncTrimPromise_1_System_Threading_Tasks_VoidTaskResult_System_IO_Stream_CompleteFromAsyncResult_System_IAsyncResult (asyncResult=@0x107c02a58) at FutureFactory.cs:1433
#23 0x0000000107602609 in mono_jit_runtime_invoke (method=<value temporarily unavailable, due to optimizations>, obj=<value temporarily unavailable, due to optimizations>, params=0x107c1a088, exc=0x107c02a58) at mini-runtime.c:2369
#24 0x00000001077bc87e in mono_runtime_invoke (method=0x7ff41d1ace30, obj=0x10c8dea30, params=0x107c1a088, exc=0x10f000e00) at object.c:2836
#25 0x00000001077c4ff2 in mono_async_result_invoke (ares=0x107c1a078, exc=0x10f000e00) at object.c:6025
#26 0x0000000107794fb6 in async_invoke_thread (data=<value temporarily unavailable, due to optimizations>) at threadpool.c:591
#27 0x0000000107793eab in start_wrapper (data=<value temporarily unavailable, due to optimizations>) at threads.c:683
#28 0x000000010784308e in inner_start_thread (arg=<value temporarily unavailable, due to optimizations>) at mono-threads-posix.c:96
#29 0x00007fff861972fc in _pthread_body ()
#30 0x00007fff86197279 in _pthread_start ()
#31 0x00007fff861954b1 in thread_start ()

Thread 10 (process 97552):
#0  0x00007fff8760d486 in __semwait_signal ()
#1  0x00007fff8b082f7d in nanosleep ()
#2  0x0000000107854855 in monoeg_g_usleep (microseconds=<value temporarily unavailable, due to optimizations>) at gdate-unix.c:53
#3  0x0000000107816cac in sgen_stop_world (generation=0) at sgen-stw.c:161
#4  0x00000001077e7200 in sgen_perform_collection (requested_size=4096, generation_to_collect=0, reason=0x10787ccd1 "Nursery full", wait_to_finish=0) at sgen-gc.c:3087
#5  0x000000010780ac00 in mono_gc_alloc_obj_nolock (vtable=0x7ff41d037938, size=48) at sgen-alloc.c:329
#6  0x000000010780aef5 in mono_gc_alloc_vector (vtable=0x7ff41d037938, size=48, max_length=16) at sgen-alloc.c:534
#7  0x0000000107bb2d36 in ?? ()
#8  0x0000000109551434 in Mono_Globalization_Unicode_SortKeyBuffer_Initialize_System_Globalization_CompareOptions_int_string_bool (this=@0x107c8e0e8, options=IgnoreCase, lcid=486766904, s=0x30, frenchSort=false) at SortKeyBuffer.cs:84
#9  0x000000010c2df88a in ?? ()
#10 0x000000010c2def63 in ?? ()
#11 0x000000010c2ded5f in ?? ()
#12 0x0000000107602609 in mono_jit_runtime_invoke (method=<value temporarily unavailable, due to optimizations>, obj=<value temporarily unavailable, due to optimizations>, params=0x107ff2698, exc=0x10) at mini-runtime.c:2369
#13 0x00000001077bc87e in mono_runtime_invoke (method=0x7ff41d1ace30, obj=0x10d31c1b0, params=0x107ff2698, exc=0x10d300e00) at object.c:2836
#14 0x00000001077c4ff2 in mono_async_result_invoke (ares=0x107ff2688, exc=0x10d300e00) at object.c:6025
#15 0x0000000107794fb6 in async_invoke_thread (data=<value temporarily unavailable, due to optimizations>) at threadpool.c:591
#16 0x0000000107793eab in start_wrapper (data=<value temporarily unavailable, due to optimizations>) at threads.c:683
#17 0x000000010784308e in inner_start_thread (arg=<value temporarily unavailable, due to optimizations>) at mono-threads-posix.c:96
#18 0x00007fff861972fc in _pthread_body ()
#19 0x00007fff86197279 in _pthread_start ()
#20 0x00007fff861954b1 in thread_start ()

Thread 9 (process 97552):
#0  0x00007fff8760d132 in __psynch_cvwait ()
#1  0x00007fff86197ea0 in _pthread_cond_wait ()
#2  0x000000010781d595 in timedwait_signal_poll_cond [inlined] () at /cvs/mono/mono/io-layer/handles.c:1514
#3  0x000000010781d595 in _wapi_handle_timedwait_signal_handle (handle=0xa12, timeout=0xb2a00, alertable=1, poll=<value temporarily unavailable, due to optimizations>) at handles.c:1604
#4  0x000000010782d083 in wapi_WaitForSingleObjectEx (handle=<value temporarily unavailable, due to optimizations>, timeout=89999, alertable=1) at wait.c:196
#5  0x000000010778f550 in mono_wait_uninterrupted [inlined] () at /cvs/mono/mono/metadata/threads.c:1409
#6  0x000000010778f550 in ves_icall_System_Threading_WaitHandle_WaitOne_internal (this=<value temporarily unavailable, due to optimizations>, handle=0xa12, ms=Cannot access memory at address 0xffffffffffffffff
) at threads.c:1543
#7  0x000000010c2dca74 in ?? ()
#8  0x00000001096b935c in System_Threading_WaitHandle_WaitOne_int (this=@0x107cf7848, millisecondsTimeout=89999) at WaitHandle.cs:388
#9  0x00000001096b45ee in System_Threading_Thread_StartInternal (this=@0x0) at Thread.cs:702
#10 0x0000000107602609 in mono_jit_runtime_invoke (method=<value temporarily unavailable, due to optimizations>, obj=<value temporarily unavailable, due to optimizations>, params=0x10c7bee68, exc=0x0) at mini-runtime.c:2369
#11 0x00000001077bc87e in mono_runtime_invoke (method=0x7ff41d806860, obj=0x107cf79c0, params=0x10c7bee68, exc=0x0) at object.c:2836
#12 0x0000000107793f3f in start_wrapper (data=<value temporarily unavailable, due to optimizations>) at threads.c:689
#13 0x000000010784308e in inner_start_thread (arg=<value temporarily unavailable, due to optimizations>) at mono-threads-posix.c:96
#14 0x00007fff861972fc in _pthread_body ()
#15 0x00007fff86197279 in _pthread_start ()
#16 0x00007fff861954b1 in thread_start ()

Thread 8 (process 97552):
#0  0x00007fff8760e9de in shutdown ()
#1  0x000000010782a166 in _wapi_shutdown (fd=11, how=2) at sockets.c:771
#2  0x0000000107782794 in ves_icall_System_Net_Sockets_Socket_Shutdown_internal (sock=11, how=2, error=0x10c5b2dd8) at socket-io.c:2192
#3  0x000000010c7e1c2c in ?? ()
#4  0x000000010c7e1a40 in ?? ()
#5  0x000000010c7e152b in ?? ()
#6  0x0000000109461590 in System_Runtime_CompilerServices_AsyncMethodBuilderCore_MoveNextRunner_InvokeMoveNext_object (stateMachine=0x10c5b2dd8) at AsyncMethodBuilder.cs:1076
#7  0x000000010957d637 in System_Threading_ExecutionContext_Run_System_Threading_ExecutionContext_System_Threading_ContextCallback_object_bool (executionContext=@0x107ffc9d0, callback=@0x10c8de6b0, state=0x107ffcad8) at ExecutionContext.cs:229
#8  0x00000001094e2fe2 in System_Threading_Tasks_AwaitTaskContinuation_RunOrScheduleAction_System_Action_bool_System_Threading_Tasks_Task_ (action=@0x2, allowInlining=true, currentTask=140686403130520) at TaskContinuation.cs:835
#9  0x00000001094db28a in System_Threading_Tasks_Task_FinishContinuations (this=@0x107ffc940) at Task.cs:3650
#10 0x00000001094d9964 in System_Threading_Tasks_Task_FinishStageThree (this=@0x107ffc940) at Task.cs:2387
#11 0x0000000109461590 in System_Runtime_CompilerServices_AsyncMethodBuilderCore_MoveNextRunner_InvokeMoveNext_object (stateMachine=0x107ffc940) at AsyncMethodBuilder.cs:1076
#12 0x000000010957d637 in System_Threading_ExecutionContext_Run_System_Threading_ExecutionContext_System_Threading_ContextCallback_object_bool (executionContext=@0x107ffe960, callback=@0x10c8de6b0, state=0x107ffc990) at ExecutionContext.cs:229
#13 0x00000001094e2fe2 in System_Threading_Tasks_AwaitTaskContinuation_RunOrScheduleAction_System_Action_bool_System_Threading_Tasks_Task_ (action=@0x0, allowInlining=true, currentTask=140686403130520) at TaskContinuation.cs:835
#14 0x00000001094db28a in System_Threading_Tasks_Task_FinishContinuations (this=@0x107ffe8b0) at Task.cs:3650
#15 0x00000001094d9964 in System_Threading_Tasks_Task_FinishStageThree (this=@0x107ffe8b0) at Task.cs:2387
#16 0x0000000109461590 in System_Runtime_CompilerServices_AsyncMethodBuilderCore_MoveNextRunner_InvokeMoveNext_object (stateMachine=0x107ffe8b0) at AsyncMethodBuilder.cs:1076
#17 0x000000010957d637 in System_Threading_ExecutionContext_Run_System_Threading_ExecutionContext_System_Threading_ContextCallback_object_bool (executionContext=@0x107ffe7f8, callback=@0x10c8de6b0, state=0x107ffe900) at ExecutionContext.cs:229
#18 0x00000001094e2fe2 in System_Threading_Tasks_AwaitTaskContinuation_RunOrScheduleAction_System_Action_bool_System_Threading_Tasks_Task_ (action=@0x0, allowInlining=true, currentTask=140686403130520) at TaskContinuation.cs:835
#19 0x00000001094db28a in System_Threading_Tasks_Task_FinishContinuations (this=@0x107fe9d48) at Task.cs:3650
#20 0x00000001094d9964 in System_Threading_Tasks_Task_FinishStageThree (this=@0x107fe9d48) at Task.cs:2387
#21 0x000000010973b107 in System_Threading_Tasks_TaskFactory_1_FromAsyncTrimPromise_1_System_Threading_Tasks_VoidTaskResult_System_IO_Stream_CompleteFromAsyncResult_System_IAsyncResult (asyncResult=@0x107fe9d48) at FutureFactory.cs:1433
#22 0x0000000107602609 in mono_jit_runtime_invoke (method=<value temporarily unavailable, due to optimizations>, obj=<value temporarily unavailable, due to optimizations>, params=0x107ff1848, exc=0x107fe9d48) at mini-runtime.c:2369
#23 0x00000001077bc87e in mono_runtime_invoke (method=0x7ff41d1ace30, obj=0x10c8dea30, params=0x107ff1848, exc=0x10c5b3e00) at object.c:2836
#24 0x00000001077c4ff2 in mono_async_result_invoke (ares=0x107ff1838, exc=0x10c5b3e00) at object.c:6025
#25 0x0000000107794fb6 in async_invoke_thread (data=<value temporarily unavailable, due to optimizations>) at threadpool.c:591
#26 0x0000000107793eab in start_wrapper (data=<value temporarily unavailable, due to optimizations>) at threads.c:683
#27 0x000000010784308e in inner_start_thread (arg=<value temporarily unavailable, due to optimizations>) at mono-threads-posix.c:96
#28 0x00007fff861972fc in _pthread_body ()
#29 0x00007fff86197279 in _pthread_start ()
#30 0x00007fff861954b1 in thread_start ()

Thread 7 (process 97552):
#0  0x00007fff8760d486 in __semwait_signal ()
#1  0x00007fff8b082f7d in nanosleep ()
#2  0x000000010782ed5a in wapi_SleepEx (ms=<value temporarily unavailable, due to optimizations>, alertable=1) at wthreads.c:289
#3  0x00000001077967d4 in monitor_thread (unused=<value temporarily unavailable, due to optimizations>) at threadpool.c:837
#4  0x0000000107793eab in start_wrapper (data=<value temporarily unavailable, due to optimizations>) at threads.c:683
#5  0x000000010784308e in inner_start_thread (arg=<value temporarily unavailable, due to optimizations>) at mono-threads-posix.c:96
#6  0x00007fff861972fc in _pthread_body ()
#7  0x00007fff86197279 in _pthread_start ()
#8  0x00007fff861954b1 in thread_start ()

Thread 6 (process 97552):
#0  0x00007fff87608582 in semaphore_timedwait_trap ()
#1  0x000000010783b872 in mono_sem_timedwait (sem=0x1079178c8, timeout_ms=<value temporarily unavailable, due to optimizations>, alertable=1) at mono-semaphore.c:64
#2  0x0000000107794dde in async_invoke_thread (data=<value temporarily unavailable, due to optimizations>) at threadpool.c:1665
#3  0x0000000107793eab in start_wrapper (data=<value temporarily unavailable, due to optimizations>) at threads.c:683
#4  0x000000010784308e in inner_start_thread (arg=<value temporarily unavailable, due to optimizations>) at mono-threads-posix.c:96
#5  0x00007fff861972fc in _pthread_body ()
#6  0x00007fff86197279 in _pthread_start ()
#7  0x00007fff861954b1 in thread_start ()

Thread 5 (process 97552):
#0  0x00007fff8760d162 in __psynch_mutexwait ()
#1  0x00007fff8619581e in _pthread_mutex_lock ()
#2  0x00000001077ed044 in mono_gc_set_skip_thread (skip=0) at sgen-gc.c:5242
#3  0x000000010779c18b in tp_poll_wait (p=0x107917840) at tpool-poll.c:200
#4  0x0000000107793eab in start_wrapper (data=<value temporarily unavailable, due to optimizations>) at threads.c:683
#5  0x000000010784308e in inner_start_thread (arg=<value temporarily unavailable, due to optimizations>) at mono-threads-posix.c:96
#6  0x00007fff861972fc in _pthread_body ()
#7  0x00007fff86197279 in _pthread_start ()
#8  0x00007fff861954b1 in thread_start ()

Thread 4 (process 97552):
#0  0x00007fff8760e22e in kevent64 ()
#1  0x00007fff87b20d91 in _dispatch_mgr_invoke ()
#2  0x00007fff87b20a6a in _dispatch_mgr_thread ()

Thread 3 (process 97552):
#0  0x00007fff8760856a in semaphore_wait_trap ()
#1  0x000000010783b8d7 in mono_sem_wait (sem=0x107917d18, alertable=1) at mono-semaphore.c:103
#2  0x00000001077ba802 in finalizer_thread (unused=0x1703) at gc.c:1105
#3  0x0000000107793eab in start_wrapper (data=<value temporarily unavailable, due to optimizations>) at threads.c:683
#4  0x000000010784308e in inner_start_thread (arg=<value temporarily unavailable, due to optimizations>) at mono-threads-posix.c:96
#5  0x00007fff861972fc in _pthread_body ()
#6  0x00007fff86197279 in _pthread_start ()
#7  0x00007fff861954b1 in thread_start ()

Thread 2 (process 97552):
#0  0x00007fff8760d132 in __psynch_cvwait ()
#1  0x00007fff86197ea0 in _pthread_cond_wait ()
#2  0x00000001078189ff in continue_idle_job [inlined] () at /cvs/mono/mono/metadata/sgen-thread-pool.c:116
#3  0x00000001078189ff in thread_func (thread_data=0x0) at sgen-thread-pool.c:107
#4  0x00007fff861972fc in _pthread_body ()
#5  0x00007fff86197279 in _pthread_start ()
#6  0x00007fff861954b1 in thread_start ()

Thread 1 (process 97552):
#0  0x00007fff8760d132 in __psynch_cvwait ()
#1  0x00007fff86197ea0 in _pthread_cond_wait ()
#2  0x000000010781d5a2 in _wapi_handle_timedwait_signal_handle (handle=<value temporarily unavailable, due to optimizations>, timeout=<value temporarily unavailable, due to optimizations>, alertable=<value temporarily unavailable, due to optimizations>, poll=<value temporarily unavailable, due to optimizations>) at handles.c:1609
#3  0x000000010782d090 in wapi_WaitForSingleObjectEx (handle=<value temporarily unavailable, due to optimizations>, timeout=4294967295, alertable=<value temporarily unavailable, due to optimizations>) at wait.c:194
#4  0x00000001077bb6c6 in ves_icall_System_Threading_Monitor_Monitor_wait (obj=0x107cf92a8, ms=4294967295) at monitor.c:1092
#5  0x000000010c2d87cf in ?? ()
#6  0x00000001094dac1b in System_Threading_Tasks_Task_SpinThenBlockingWait_int_System_Threading_CancellationToken (this=@0x107cf59d0, millisecondsTimeout=-1, cancellationToken=0) at Task.cs:3356
#7  0x00000001094dab43 in System_Threading_Tasks_Task_InternalWait_int_System_Threading_CancellationToken (this=@0x107cf59d0, millisecondsTimeout=-1, cancellationToken=0) at Task.cs:3295
#8  0x0000000107bb2471 in ?? ()
#9  0x0000000107602609 in mono_jit_runtime_invoke (method=<value temporarily unavailable, due to optimizations>, obj=<value temporarily unavailable, due to optimizations>, params=0x7fff5860a858, exc=0x10a3e8130) at mini-runtime.c:2369
#10 0x00000001077bc87e in mono_runtime_invoke (method=0x7ff41c8038a0, obj=0x0, params=0x7fff5860a858, exc=0x0) at object.c:2836
#11 0x00000001077c203b in mono_runtime_exec_main (method=0x7ff41c8038a0, args=<value temporarily unavailable, due to optimizations>, exc=0x0) at object.c:4093
#12 0x000000010766ecc7 in mono_main (argc=<value temporarily unavailable, due to optimizations>, argv=<value temporarily unavailable, due to optimizations>) at driver.c:1066
#13 0x00007fff869dd5c9 in start ()

I have a couple of observations here: the segfault handler seems to be treating what looks like a managed NullReferenceException as a native exception (This is Thread 11, frames 5 and 6) which correlate with some managed crashes that I managed to get, and they look like this:

Unhandled Exception:
System.NullReferenceException: Object reference not set to an instance of an object
  at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run () <0x1084e43b0 + 0x000ec> in <filename unknown>:0 
  at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction (System.Action action, Boolean allowInlining, System.Threading.Tasks.Task& currentTask) <0x108565f70 + 0x00071> in <filename unknown>:0 
--- End of stack trace from previous location where exception was thrown ---
  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw () <0x1084e6a50 + 0x00029> in <filename unknown>:0 
  at System.Threading.Tasks.AwaitTaskContinuation.<ThrowAsyncIfNecessary>m__0 (System.Object s) <0x108566200 + 0x00041> in <filename unknown>:0 
mac$ /tmp/mono-gdb-commands.Rak10l:1: Error in sourced command file:
Unable to access task for process-id 97527: (os/kern) failure.

mac$ xMONO_LOG_LEVEL=debug /mono/bin/mono --debug OwinSimple.exe
Listening on 
	http://*:9001
CTRL+C to quit.

Unhandled Exception:
System.InvalidOperationException: Cannot be changed after headers are sent.
  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw () [0x0000c] in /cvs/mono/external/referencesource/mscorlib/system/runtime/exceptionservices/exceptionservicescommon.cs:137 
  at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Threading.Tasks.Task task) [0x00047] in /cvs/mono/external/referencesource/mscorlib/system/runtime/compilerservices/TaskAwaiter.cs:201 
  at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Threading.Tasks.Task task) [0x0002e] in /cvs/mono/external/referencesource/mscorlib/system/runtime/compilerservices/TaskAwaiter.cs:170 
  at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd (System.Threading.Tasks.Task task) [0x0000b] in /cvs/mono/external/referencesource/mscorlib/system/runtime/compilerservices/TaskAwaiter.cs:142 
  at System.Runtime.CompilerServices.TaskAwaiter.GetResult () [0x00000] in /cvs/mono/external/referencesource/mscorlib/system/runtime/compilerservices/TaskAwaiter.cs:124 
  at Microsoft.Owin.Host.HttpListener.OwinHttpListener+<ProcessRequestsAsync>d__0.MoveNext () <0x108094d40 + 0x006db> in <filename unknown>:0 
--- End of stack trace from previous location where exception was thrown ---
  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw () [0x0000c] in /cvs/mono/external/referencesource/mscorlib/system/runtime/exceptionservices/exceptionservicescommon.cs:137 
  at System.Runtime.CompilerServices.AsyncMethodBuilderCore.<ThrowAsync>m__1 (System.Object state) [0x00000] in /cvs/mono/external/referencesource/mscorlib/system/runtime/compilerservices/AsyncMethodBuilder.cs:1022
Comment 6 Miguel de Icaza [MSFT] 2015-04-17 18:20:22 UTC
With master, if I apply this patch, the unmanaged crashes go away.

We probably should get this patch anyways, since when you assign null to Thread.CurrentPrincipal in .NET, we get back a GenericPrincipal (which btw, is different than the original one):

diff --git a/mcs/class/corlib/System.Threading/Thread.cs b/mcs/class/corlib/System.Threading/Thread.cs
index 3774614..02ae161 100644
--- a/mcs/class/corlib/System.Threading/Thread.cs
+++ b/mcs/class/corlib/System.Threading/Thread.cs
@@ -291,6 +291,8 @@ namespace System.Threading {
                        set {
                                Thread th = CurrentThread;
 
+                               if (value == null)
+                                       value = new GenericPrincipal (new GenericIdentity ("", ""), new string [1] { "" });
                                if (value != GetDomain ().DefaultPrincipal) {
                                        ++th.Internal._serialized_principal_version;
                                        try {
Comment 7 Miguel de Icaza [MSFT] 2015-04-17 18:21:13 UTC
With the above patch, on Mac/master I was able to keep the process running for at least an hour, before I had to go home.
Comment 8 Shawn Burke 2015-04-17 18:25:28 UTC
OK great thanks so much!  I'll see if I get the same results.
Comment 9 Zoltan Varga 2015-04-21 16:16:04 UTC
The crash seems to happen in Microsoft.Owin.Host.HttpListener.RequestProcessing.OwinHttpListenerResponse:NotifyOnSendingHeaders ()

at the get_Count () call:

    IL_0000:  ldarg.0
    IL_0001:  ldflda     class [mscorlib]System.Collections.Generic.IList`1<class [mscorlib]System.Tuple`2<class [mscorlib]System.Action`1<object>,object>> Microsoft.Owin.Host.HttpListener.RequestProcessing.OwinHttpListenerResponse::_onSendingHeadersActions
    IL_0006:  ldnull
    IL_0007:  call       !!0 [mscorlib]System.Threading.Interlocked::Exchange<class [mscorlib]System.Collections.Generic.IList`1<class [mscorlib]System.Tuple`2<class [mscorlib]System.Action`1<object>,object>>>(!!0&,
                                                                                                                                                                                                                  !!0)
    IL_000c:  stloc.0
    IL_000d:  ldloc.0
    IL_000e:  callvirt   instance int32 class [mscorlib]System.Collections.Generic.ICollection`1<class [mscorlib]System.Tuple`2<class [mscorlib]System.Action`1<object>,object>>::get_Count()

Looking at the assembly, it looks like we replace the Exchange() call with an intrinsics, but we don't emit a write barrier.
Comment 10 Zoltan Varga 2015-04-21 17:11:08 UTC
It looks like we do emit a wbarrier, but the object we call at IL_000e seems to be bogus.
Comment 11 Adrian 2015-04-23 12:03:26 UTC
thank you shawn for posting this and miguel for your supper fast feedback.

we currently integrate a high traffic owin/web api based solution at a customer's cloud foundry deployment and see the the same and other issues on 3.12.1 on ubuntu 14.04 LTS.

we tested miguel's patch, but still saw strange errors initiated in the System.Threading.Timer. as far as we understand, owin (we use Nowin as the Server Factory, because Mono's HttpListener crashes when cancelling a TCP connection client-side) executes some time jobs (maybe cleanups), and during that, references disappeared magically: example was the changed mutex in the Timer while(true) loop. if check for that, errors appeared in the mutex-base-class (null exceptions)...

we worked around by inserting a try/catch in the while(true) loop of the Timer Class (ScheduleThread) and break the loop in the catch. stability is way better now.

let me know how I can help solving this issue cleanly.
Comment 12 Adrian 2015-04-23 15:01:37 UTC
Okay this was little stupid :) We changed:

As a workaround
- We removed the ManualResetEvent completely; because we have a Null Point Reference to the Native Handler

- We changed the changed.WaitOne to Thread.Sleep(5);
Comment 13 Shawn Burke 2015-04-23 15:21:30 UTC
Adrian - this is great.  

Just to clarify, you decided the try/catch wasn't the best solution and instead made the two changes in Comment 12, in addition to Miguel's fix?  If so, is there a PR I could look at/merge?

Zoltan, is there an additional fix coming from you as well?

trying to figure out how to get this all into one branch.

Thanks,

Shawn
Comment 14 Adrian 2015-04-23 15:26:24 UTC
Created attachment 10884 [details]
Thread.cs with Principal Workaround
Comment 15 Adrian 2015-04-23 15:26:55 UTC
Created attachment 10885 [details]
Timer.cs with ManualResetEvent Native Handler Workaround
Comment 16 Adrian 2015-04-23 15:29:33 UTC
Ciao Shawn,

I put two attachments (sorry to not provide a propper diff - was a very long day with debugging)

As you see, we decided to modify the Timer.cs with a Thread.Sleep instead of the failing ManualResetEvent. As far as we analysed, this works pretty good. CPU and Memory is fine.

Cheers
Comment 17 Zoltan Varga 2015-04-24 16:25:11 UTC
To continue comment 9:
The crash in comment 9 happens on amd64 with mono master, the crashing code is always

IP 0x1079514c8 at offset 0x78 of method Microsoft.Owin.Host.HttpListener.RequestProcessing.OwinHttpListenerResponse:NotifyOnSendingHeaders () (0x107951450 0x107951550)[domain 0x7fd928602f90 - OwinSimple.exe]

That method looks like the following:
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
0x107951450:	sub    $0x28,%rsp
0x107951454:	mov    %r13,(%rsp)
0x107951458:	mov    %r14,0x8(%rsp)
0x10795145d:	mov    %r15,0x10(%rsp)
0x107951462:	mov    %rdi,%r15
0x107951465:	test   %r15,%r15
0x107951468:	je     0x10795152e
x10795146e:	lea    0x28(%r15),%rax
0x107951472:	xor    %ecx,%ecx
0x107951474:	mov    %rcx,%r11
0x107951477:	xchg   %r11,(%rax)
0x10795147a:	mov    %r11,%rdx
0x10795147d:	mov    %rdx,0x18(%rsp)
0x107951482:	mov    %rcx,%rdx
0x107951485:	shr    $0x16,%rdx
0x107951489:	cmp    $0x40b,%rdx
0x107951490:	jne    0x1079514aa
0x107951492:	mov    %rax,%rdx
0x107951495:	shr    $0x9,%rdx
0x107951499:	and    $0x7fffff,%rdx
0x1079514a0:	add    0xa1(%rip),%rdx        # 0x107951548
0x1079514a7:	movb   $0x1,(%rdx)
0x1079514aa:	mov    0x18(%rsp),%rax
0x1079514af:	mov    %rax,%r15
0x1079514b2:	mov    %rax,%rdi
0x1079514b5:	mov    (%rax),%rax
0x1079514b8:	mov    $0x7fd92863e1e0,%r10
0x1079514c2:	callq  *-0x98(%rax)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

The crash happens at the last call, which branches to 0. In the signal handler, the relevant registers are:
rax - random value
rdi/r15 - points to the heap to empty memory, interior to some object etc.

Based on the code above, rax should contain the vtable pointer of the object at rdi/r15, but it doesn't, which probably means that the memory pointed to by rdi/r15 was changed by another thread after it was loaded into rax.

There is always another thread doing a gc:

Thread 8 (process 21552):
#0  0x00007fff84c37a3a in __semwait_signal ()
#1  0x00007fff8530ddc0 in nanosleep ()
#2  0x000000010855dc83 in monoeg_g_usleep (microseconds=12380) at gdate-unix.c:53
#3  0x00000001084f8c0f in restart_threads_until_none_in_managed_allocator () at sgen-stw.c:161
#4  0x00000001084f7f0c in sgen_stop_world (generation=0) at sgen-stw.c:242
Comment 18 Zoltan Varga 2015-04-24 18:29:22 UTC
So running MONO_GC_DEBUG=verify-before-collections mono OwinSimple/bin/Debug/OwinSimple.exe

on mono master on amd64 leads to this:

Listening on 
	http://*:9001
CTRL+C to quit.
Oldspace->newspace reference 0x1030d34f8 at offset 264 in object 0x109730e00 (Microsoft.Owin.Host.HttpListener.RequestProcessing.CallEnvironment) not found in remsets.
Oldspace->newspace reference 0x1030d34f8 at offset 56 in object 0x1077c61d0 (.<ProcessRequestsAsync>d__0) not found in remsets.
Oldspace->newspace reference 0x1030d34f8 at offset 56 in object 0x1077c6230 (.<ProcessRequestsAsync>d__0) not found in remsets.
Oldspace->newspace reference 0x1030d34f8 at offset 56 in object 0x1077c6290 (.<ProcessRequestsAsync>d__0) not found in remsets.

When generating load using: ab -n 100000 -c 100 http://localhost:9001/foo
Comment 19 Zoltan Varga 2015-04-24 18:40:46 UTC
The field in the first message is _OwinHttpListener. Offset 264 is pretty big, so this might be be an object scanning problem.
Comment 20 Mark Probst 2015-04-28 18:03:32 UTC
Fixed on master at 23372959cfe004fb1d47d25307871ece82c3f396.