Bug 30148 - Threadpool is segfaulting
Summary: Threadpool is segfaulting
Alias: None
Product: Runtime
Classification: Mono
Component: General ()
Version: 4.0.0
Hardware: PC Linux
: Normal normal
Target Milestone: Future Cycle (TBD)
Assignee: Bugzilla
: 32719 ()
Depends on:
Reported: 2015-05-15 16:35 UTC by xamarin
Modified: 2017-10-11 17:25 UTC (History)
8 users (show)

Is this bug a regression?: ---
Last known good build:

stacktrace (12.41 KB, text/plain)
2015-05-15 16:35 UTC, xamarin
--debug (5.86 KB, application/octet-stream)
2015-05-19 08:48 UTC, xamarin

Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.

Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:

Description xamarin 2015-05-15 16:35:26 UTC
Created attachment 11221 [details]

This happens on 4.0.1, 4.0.0, 3.12.0. Seems to happen every time there are certain amount of threads in the threadpool. I'm on Debian Jessie.

Stacktraces on different mono versions:
Comment 1 Ludovic Henry 2015-05-18 14:05:32 UTC

Do you have any sample to reproduce these crashes?

Also, I can't seem to figure out the thread that is causing the SIGSEGV as nearly all of them call the signal handler, or GDB cannot unwind the stack and does not resolve the symbols for the mono runtime executable.

Thank you,
Comment 2 xamarin 2015-05-18 14:07:28 UTC
Sadly I do not have a very simple test case, as it happens in one of our backend applications which isn't trivial to setup: https://github.com/SteamDatabase/SteamDatabaseBackend

I`ll see if I can come up with something simple.
Comment 3 Ludovic Henry 2015-05-19 08:43:54 UTC
If you can just get a stacktrace with mono symbols, that would already be pretty awesome. Thank you!
Comment 4 xamarin 2015-05-19 08:48:03 UTC
Created attachment 11242 [details]
Comment 5 xamarin 2015-05-19 08:48:25 UTC
I ran it with mono --debug, and mono-dbg is installed, not sure if that's enough.
Comment 6 xamarin 2015-05-25 08:19:30 UTC
Sometimes instead of a segfault I get this exception:
16:41:18 [ERROR] Unhandled Exception: Object reference not set to an instance
of an object
  at System.Threading.EventWaitHandle.Reset () [0x00000] in <filename
  at (wrapper remoting-invoke-with-check)
System.Threading.EventWaitHandle:Reset ()
  at System.Threading.Timer+Scheduler.SchedulerThread () [0x00000] in <filename
  at System.Threading.Thread.StartInternal () [0x00000] in <filename unknown>:0

which again happens in Timer-Scheduler like seen in the segfault stacktrace.

Which makes me think the issue is somewhere around here: https://github.com/mono/mono/blob/master/mcs/class/corlib/System.Threading/Timer.cs#L338
Comment 7 xamarin 2015-09-02 14:00:51 UTC
Ludovic, any news on this?
Comment 8 Ludovic Henry 2015-09-03 04:54:25 UTC
No news on that, as I could never reproduce it. What it looks like is a memory corruption bug as the NullPointerException suggests. Can you still trigger it nowadays?
Comment 9 xamarin 2015-09-03 04:56:22 UTC
Yep I can, even on mono 4.2.0.
Comment 10 xamarin 2015-09-03 05:49:56 UTC
So I tried different versions, and now different gc, and it turns out it does not crash when using boehm.

Bugs #23714, #30010 and #23401 seem directly related to the issue.
Comment 11 Ludovic Henry 2015-09-03 06:47:22 UTC
Reassigning to Mark, as it looks like a SGen issue.
Comment 12 TheGreatCO 2015-10-16 10:16:31 UTC
*** Bug 32719 has been marked as a duplicate of this bug. ***
Comment 13 Christian Hüning 2016-03-11 14:17:21 UTC
Are there any updates on this bug? I am running into the same issues with sgen. 
--gc=boehm seems to work just fine, but as I understand the documentation and the current state of mono, SGEN is meant to be the default and faster GC. 

I am running Mono and have a very high load application utilizing a lot of threads and about 6 million objects (around 47 to 50 GB or RAM consumption). So the GC is quite busy with it.
Comment 14 Christian Hüning 2016-03-11 14:39:46 UTC
Actually in my case it does also crash with mono-boem:

Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS

  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) object.__icall_wrapper_mono_object_new_fast (intptr) <0xffffffff>
  at System.Collections.Concurrent.ConcurrentDictionary`2<System.Guid, SpatialAPI.Entities.ISpatialEntity>.GrowTable (System.Collections.Concurrent.ConcurrentDictionary`2/Tables<System.Guid, SpatialAPI.Entities.ISpatialEntity>,System.Collections.Generic.IEqualityComparer`1<System.Guid>,bool,int) <0x005c7>
  at System.Collections.Concurrent.ConcurrentDictionary`2<System.Guid, SpatialAPI.Entities.ISpatialEntity>.TryAddInternal (System.Guid,SpatialAPI.Entities.ISpatialEntity,bool,bool,SpatialAPI.Entities.ISpatialEntity&) <0x00417>
  at System.Collections.Concurrent.ConcurrentDictionary`2<System.Guid, SpatialAPI.Entities.ISpatialEntity>.TryAdd (System.Guid,SpatialAPI.Entities.ISpatialEntity) <0x0006f>
  at System.Collections.Concurrent.ConcurrentDictionary`2<System.Guid, SpatialAPI.Entities.ISpatialEntity>.System.Collections.Generic.IDictionary<TKey,TValue>.Add (System.Guid,SpatialAPI.Entities.ISpatialEntity) <0x00027>
  at EnvironmentServiceComponent.Implementation.NoCollisionESC.Add (SpatialAPI.Entities.ISpatialEntity,SpatialAPI.Entities.Transformation.Vector3,SpatialAPI.Entities.Transformation.Direction) <0x00117>
  at DalskiAgent.Agents.SpatialAgent..ctor (LifeAPI.Layer.ILayer,LifeAPI.Layer.RegisterAgent,LifeAPI.Layer.UnregisterAgent,SpatialAPI.Environment.IEnvironment,System.Guid,SpatialAPI.Shape.IShape,SpatialAPI.Entities.Transformation.Vector3,SpatialAPI.Entities.Transformation.Vector3,System.Enum) <0x00369>
  at MarulaLayer.Agents.MarulaTree..ctor (MarulaLayer.Layers.IKNPMarulaLayer,LifeAPI.Layer.RegisterAgent,LifeAPI.Layer.UnregisterAgent,SpatialAPI.Environment.IEnvironment,MarulaLayer.Layers.ITemperatureTimeSeriesLayer,System.Guid,double,double,double,int,double,int) <0x00307>
  at (wrapper runtime-invoke) <Module>.runtime_invoke_void__this___object_object_object_object_object_Guid_double_double_double_int_double_int (object,intptr,intptr,intptr) <0xffffffff>
  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) System.Reflection.MonoCMethod.InternalInvoke (System.Reflection.MonoCMethod,object,object[],System.Exception&) <0xffffffff>
  at System.Reflection.MonoCMethod.InternalInvoke (object,object[]) <0x0003f>
  at System.Reflection.MonoCMethod.DoInvoke (object,System.Reflection.BindingFlags,System.Reflection.Binder,object[],System.Globalization.CultureInfo) <0x000d3>
  at System.Reflection.MonoCMethod.Invoke (System.Reflection.BindingFlags,System.Reflection.Binder,object[],System.Globalization.CultureInfo) <0x00037>
  at System.Reflection.ConstructorInfo.Invoke (object[]) <0x00056>
  at AgentManagerService.Implementation.AgentManager`1/<GetAgentsByAgentInitConfig>c__AnonStorey0<T_REF>.<>m__1 (int) <0x00c07>
  at System.Threading.Tasks.Parallel/<ForWorker>c__AnonStorey3`1<T_REF>.<>m__1 () <0x0038a>
  at System.Threading.Tasks.Task.InnerInvoke () <0x0004a>
  at System.Threading.Tasks.Task.InnerInvokeWithArg (System.Threading.Tasks.Task) <0x00013>
  at System.Threading.Tasks.Task/<ExecuteSelfReplicating>c__AnonStorey0.<>m__0 (object) <0x001a7>
  at System.Threading.Tasks.Task.InnerInvoke () <0x00076>
  at System.Threading.Tasks.Task.Execute () <0x0005b>
  at System.Threading.Tasks.Task.ExecutionContextCallback (object) <0x0004f>
  at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) <0x001b1>
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) <0x00023>
  at System.Threading.Tasks.Task.ExecuteWithThreadLocal (System.Threading.Tasks.Task&) <0x00117>
  at System.Threading.Tasks.Task.ExecuteEntry (bool) <0x000bf>
  at System.Threading.Tasks.Task.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem () <0x0000f>
  at System.Threading.ThreadPoolWorkQueue.Dispatch () <0x001f0>
  at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback () <0x0000b>
  at (wrapper runtime-invoke) <Module>.runtime_invoke_bool (object,intptr,intptr,intptr) <0xffffffff>

Native stacktrace:

	mono-boehm() [0x4a4afc]
	/lib/x86_64-linux-gnu/libpthread.so.0(+0xf0a0) [0x7fc6b4f8f0a0]
	/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7fc6b4c25125]
	/lib/x86_64-linux-gnu/libc.so.6(abort+0x180) [0x7fc6b4c283a0]
	mono-boehm() [0x60b489]
	mono-boehm(GC_add_to_heap+0xca) [0x60340a]
	mono-boehm(GC_expand_hp_inner+0xd4) [0x603614]
	mono-boehm(GC_collect_or_expand+0x10a) [0x60387a]
	mono-boehm(GC_allocobj+0xd9) [0x603a09]
	mono-boehm(GC_generic_malloc_inner+0xb7) [0x606c37]
	mono-boehm(GC_generic_malloc_many+0x3d7) [0x607a47]
	mono-boehm(GC_local_gcj_malloc+0xec) [0x60fccc]
	mono-boehm() [0x5cfc94]

Debug info from gdb:

Got a SIGABRT while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries 
used by your application.
Comment 15 Alex Rønne Petersen 2016-08-29 06:54:04 UTC

Without a repro, it is next to impossible to debug a GC issue like this. If anyone could provide a repro, we'd be happy to look into it.

@Christian: FWIW, that Boehm crash is unrelated. It's just a limitation in Boehm.
Comment 16 TheGreatCO 2016-08-29 12:19:30 UTC
I'll try and whip up a repro this week.
Comment 17 Rodrigo Kumpera 2017-10-11 17:25:48 UTC
We have not received the requested information. If you are still experiencing this issue please provide all the requested information and reopen the bug report.

Thank you!