Bug 30148 - Threadpool is segfaulting
Summary: Threadpool is segfaulting
Status: RESOLVED NORESPONSE
Alias: None
Product: Runtime
Classification: Mono
Component: General (show other bugs)
Version: 4.0.0
Hardware: PC Linux
: Normal normal
Target Milestone: Future Cycle (TBD)
Assignee: Bugzilla
URL:
: 32719 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-05-15 16:35 UTC by xamarin
Modified: 2017-10-11 17:25 UTC (History)
8 users (show)

See Also:
Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
stacktrace (12.41 KB, text/plain)
2015-05-15 16:35 UTC, xamarin
Details
--debug (5.86 KB, application/octet-stream)
2015-05-19 08:48 UTC, xamarin
Details

Description xamarin 2015-05-15 16:35:26 UTC
Created attachment 11221 [details]
stacktrace

This happens on 4.0.1, 4.0.0, 3.12.0. Seems to happen every time there are certain amount of threads in the threadpool. I'm on Debian Jessie.

Stacktraces on different mono versions:
https://gist.github.com/xPaw/0b818cd3693084e72f21
https://gist.github.com/xPaw/64d2e1e4ec3d81368d02
https://gist.github.com/xPaw/d222d66cbd87abf14aa4
https://gist.github.com/xPaw/b62bec9f576d0008fdb5
Comment 1 Ludovic Henry 2015-05-18 14:05:32 UTC
Hi,

Do you have any sample to reproduce these crashes?

Also, I can't seem to figure out the thread that is causing the SIGSEGV as nearly all of them call the signal handler, or GDB cannot unwind the stack and does not resolve the symbols for the mono runtime executable.

Thank you,
Ludovic
Comment 2 xamarin 2015-05-18 14:07:28 UTC
Sadly I do not have a very simple test case, as it happens in one of our backend applications which isn't trivial to setup: https://github.com/SteamDatabase/SteamDatabaseBackend

I`ll see if I can come up with something simple.
Comment 3 Ludovic Henry 2015-05-19 08:43:54 UTC
If you can just get a stacktrace with mono symbols, that would already be pretty awesome. Thank you!
Comment 4 xamarin 2015-05-19 08:48:03 UTC
Created attachment 11242 [details]
--debug
Comment 5 xamarin 2015-05-19 08:48:25 UTC
I ran it with mono --debug, and mono-dbg is installed, not sure if that's enough.
Comment 6 xamarin 2015-05-25 08:19:30 UTC
Sometimes instead of a segfault I get this exception:
16:41:18 [ERROR] Unhandled Exception: Object reference not set to an instance
of an object
  at System.Threading.EventWaitHandle.Reset () [0x00000] in <filename
unknown>:0
  at (wrapper remoting-invoke-with-check)
System.Threading.EventWaitHandle:Reset ()
  at System.Threading.Timer+Scheduler.SchedulerThread () [0x00000] in <filename
unknown>:0
  at System.Threading.Thread.StartInternal () [0x00000] in <filename unknown>:0

which again happens in Timer-Scheduler like seen in the segfault stacktrace.

Which makes me think the issue is somewhere around here: https://github.com/mono/mono/blob/master/mcs/class/corlib/System.Threading/Timer.cs#L338
Comment 7 xamarin 2015-09-02 14:00:51 UTC
Ludovic, any news on this?
Comment 8 Ludovic Henry 2015-09-03 04:54:25 UTC
No news on that, as I could never reproduce it. What it looks like is a memory corruption bug as the NullPointerException suggests. Can you still trigger it nowadays?
Comment 9 xamarin 2015-09-03 04:56:22 UTC
Yep I can, even on mono 4.2.0.
Comment 10 xamarin 2015-09-03 05:49:56 UTC
So I tried different versions, and now different gc, and it turns out it does not crash when using boehm.

Bugs #23714, #30010 and #23401 seem directly related to the issue.
Comment 11 Ludovic Henry 2015-09-03 06:47:22 UTC
Reassigning to Mark, as it looks like a SGen issue.
Comment 12 TheGreatCO 2015-10-16 10:16:31 UTC
*** Bug 32719 has been marked as a duplicate of this bug. ***
Comment 13 Christian Hüning 2016-03-11 14:17:21 UTC
Are there any updates on this bug? I am running into the same issues with sgen. 
--gc=boehm seems to work just fine, but as I understand the documentation and the current state of mono, SGEN is meant to be the default and faster GC. 

I am running Mono 4.2.2.30 and have a very high load application utilizing a lot of threads and about 6 million objects (around 47 to 50 GB or RAM consumption). So the GC is quite busy with it.
Comment 14 Christian Hüning 2016-03-11 14:39:46 UTC
Actually in my case it does also crash with mono-boem:

Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS
Stacktrace:

  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) object.__icall_wrapper_mono_object_new_fast (intptr) <0xffffffff>
  at System.Collections.Concurrent.ConcurrentDictionary`2<System.Guid, SpatialAPI.Entities.ISpatialEntity>.GrowTable (System.Collections.Concurrent.ConcurrentDictionary`2/Tables<System.Guid, SpatialAPI.Entities.ISpatialEntity>,System.Collections.Generic.IEqualityComparer`1<System.Guid>,bool,int) <0x005c7>
  at System.Collections.Concurrent.ConcurrentDictionary`2<System.Guid, SpatialAPI.Entities.ISpatialEntity>.TryAddInternal (System.Guid,SpatialAPI.Entities.ISpatialEntity,bool,bool,SpatialAPI.Entities.ISpatialEntity&) <0x00417>
  at System.Collections.Concurrent.ConcurrentDictionary`2<System.Guid, SpatialAPI.Entities.ISpatialEntity>.TryAdd (System.Guid,SpatialAPI.Entities.ISpatialEntity) <0x0006f>
  at System.Collections.Concurrent.ConcurrentDictionary`2<System.Guid, SpatialAPI.Entities.ISpatialEntity>.System.Collections.Generic.IDictionary<TKey,TValue>.Add (System.Guid,SpatialAPI.Entities.ISpatialEntity) <0x00027>
  at EnvironmentServiceComponent.Implementation.NoCollisionESC.Add (SpatialAPI.Entities.ISpatialEntity,SpatialAPI.Entities.Transformation.Vector3,SpatialAPI.Entities.Transformation.Direction) <0x00117>
  at DalskiAgent.Agents.SpatialAgent..ctor (LifeAPI.Layer.ILayer,LifeAPI.Layer.RegisterAgent,LifeAPI.Layer.UnregisterAgent,SpatialAPI.Environment.IEnvironment,System.Guid,SpatialAPI.Shape.IShape,SpatialAPI.Entities.Transformation.Vector3,SpatialAPI.Entities.Transformation.Vector3,System.Enum) <0x00369>
  at MarulaLayer.Agents.MarulaTree..ctor (MarulaLayer.Layers.IKNPMarulaLayer,LifeAPI.Layer.RegisterAgent,LifeAPI.Layer.UnregisterAgent,SpatialAPI.Environment.IEnvironment,MarulaLayer.Layers.ITemperatureTimeSeriesLayer,System.Guid,double,double,double,int,double,int) <0x00307>
  at (wrapper runtime-invoke) <Module>.runtime_invoke_void__this___object_object_object_object_object_Guid_double_double_double_int_double_int (object,intptr,intptr,intptr) <0xffffffff>
  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) System.Reflection.MonoCMethod.InternalInvoke (System.Reflection.MonoCMethod,object,object[],System.Exception&) <0xffffffff>
  at System.Reflection.MonoCMethod.InternalInvoke (object,object[]) <0x0003f>
  at System.Reflection.MonoCMethod.DoInvoke (object,System.Reflection.BindingFlags,System.Reflection.Binder,object[],System.Globalization.CultureInfo) <0x000d3>
  at System.Reflection.MonoCMethod.Invoke (System.Reflection.BindingFlags,System.Reflection.Binder,object[],System.Globalization.CultureInfo) <0x00037>
  at System.Reflection.ConstructorInfo.Invoke (object[]) <0x00056>
  at AgentManagerService.Implementation.AgentManager`1/<GetAgentsByAgentInitConfig>c__AnonStorey0<T_REF>.<>m__1 (int) <0x00c07>
  at System.Threading.Tasks.Parallel/<ForWorker>c__AnonStorey3`1<T_REF>.<>m__1 () <0x0038a>
  at System.Threading.Tasks.Task.InnerInvoke () <0x0004a>
  at System.Threading.Tasks.Task.InnerInvokeWithArg (System.Threading.Tasks.Task) <0x00013>
  at System.Threading.Tasks.Task/<ExecuteSelfReplicating>c__AnonStorey0.<>m__0 (object) <0x001a7>
  at System.Threading.Tasks.Task.InnerInvoke () <0x00076>
  at System.Threading.Tasks.Task.Execute () <0x0005b>
  at System.Threading.Tasks.Task.ExecutionContextCallback (object) <0x0004f>
  at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) <0x001b1>
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) <0x00023>
  at System.Threading.Tasks.Task.ExecuteWithThreadLocal (System.Threading.Tasks.Task&) <0x00117>
  at System.Threading.Tasks.Task.ExecuteEntry (bool) <0x000bf>
  at System.Threading.Tasks.Task.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem () <0x0000f>
  at System.Threading.ThreadPoolWorkQueue.Dispatch () <0x001f0>
  at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback () <0x0000b>
  at (wrapper runtime-invoke) <Module>.runtime_invoke_bool (object,intptr,intptr,intptr) <0xffffffff>

Native stacktrace:

	mono-boehm() [0x4a4afc]
	/lib/x86_64-linux-gnu/libpthread.so.0(+0xf0a0) [0x7fc6b4f8f0a0]
	/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7fc6b4c25125]
	/lib/x86_64-linux-gnu/libc.so.6(abort+0x180) [0x7fc6b4c283a0]
	mono-boehm() [0x60b489]
	mono-boehm(GC_add_to_heap+0xca) [0x60340a]
	mono-boehm(GC_expand_hp_inner+0xd4) [0x603614]
	mono-boehm(GC_collect_or_expand+0x10a) [0x60387a]
	mono-boehm(GC_allocobj+0xd9) [0x603a09]
	mono-boehm(GC_generic_malloc_inner+0xb7) [0x606c37]
	mono-boehm(GC_generic_malloc_many+0x3d7) [0x607a47]
	mono-boehm(GC_local_gcj_malloc+0xec) [0x60fccc]
	mono-boehm() [0x5cfc94]
	[0x40a81fc2]

Debug info from gdb:


=================================================================
Got a SIGABRT while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries 
used by your application.
=================================================================
Comment 15 Alex Rønne Petersen 2016-08-29 06:54:04 UTC
Hi,

Without a repro, it is next to impossible to debug a GC issue like this. If anyone could provide a repro, we'd be happy to look into it.

@Christian: FWIW, that Boehm crash is unrelated. It's just a limitation in Boehm.
Comment 16 TheGreatCO 2016-08-29 12:19:30 UTC
I'll try and whip up a repro this week.
Comment 17 Rodrigo Kumpera 2017-10-11 17:25:48 UTC
We have not received the requested information. If you are still experiencing this issue please provide all the requested information and reopen the bug report.

Thank you!

Note You need to log in before you can comment on or make changes to this bug.