Bug 19367 - mono sgen hang/deadlock on mono_sem_wait/sgen_ensure_free_space during nursery full collection and when large nursery size used it hangs on mono_sem_wait/mono_domain_finalize with fsharp deep recursions
Summary: mono sgen hang/deadlock on mono_sem_wait/sgen_ensure_free_space during nurser...
Alias: None
Product: Runtime
Classification: Mono
Component: GC ()
Version: 3.2.x
Hardware: PC Linux
: --- normal
Target Milestone: ---
Assignee: Bugzilla
Depends on:
Reported: 2014-04-29 08:08 UTC by KaptOc6obnuCac0Bluc+bugzilla.xamarin.com
Modified: 2014-05-22 10:46 UTC (History)
3 users (show)

Is this bug a regression?: ---
Last known good build:

mono-sgen-hang-nursery-full-default-nursery-size.txt (41.29 KB, text/plain)
2014-04-29 08:08 UTC, KaptOc6obnuCac0Bluc+bugzilla.xamarin.com
mono-sgen-hang-domain-finalize-1024m-nursery-size.txt (27.24 KB, text/plain)
2014-04-29 08:10 UTC, KaptOc6obnuCac0Bluc+bugzilla.xamarin.com

Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.

Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:

Description KaptOc6obnuCac0Bluc+bugzilla.xamarin.com 2014-04-29 08:08:54 UTC
Created attachment 6674 [details]

Environment: Ubuntu 12.04 x86_64
Mono version: 3.4.0 
I have a rebuilt package without optimization and with debug symbols from tpokorra deb source package, compiled without llvm support: http://software.opensuse.org/download/package?project=home:tpokorra:mono&package=mono-opt)

Mono JIT compiler version 3.4.0 (tarball Tue Apr 29 11:32:44 CEST 2014)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
	TLS:           __thread
	SIGSEGV:       altstack
	Notifications: epoll
	Architecture:  amd64
	Disabled:      none
	Misc:          softdebug 
	LLVM:          supported, not enabled.
	GC:            sgen

It seems that a deeply recursive fsharp function will hang due in memory allocation. The software is a compiler, and it use the llvm 3.3 shared library (with llvm-fs binding), compiled to target the .NET 4.0 api. Please see the attached mono-sgen-hang-nursery-full-default-nursery-size.txt

When I set the MONO_GC_PARAMS="nursery-size=1024m" the software finished, but it will hang on the final gc collection: mini_cleanup->mono_domain_finalize->mono_gc_collect. Please see the mono-sgen-hang-domain-finalize-1024m-nursery-size.txt

I also tested it with the patch that is created for #15695 ( https://github.com/mono/mono/commit/71ad74dc11c5fa4bc8c178a5457d2cab732fdb01 ) without success.

I have managed to test it on a windows vista 64bit, but only in 32bit process mode (!), due the missing llvm3.3 64bit dll component. Using the same binary, but in 32bit mode it works fine with both microsoft .net and mono 3.2.3 (default windows download from mono-project.com). I'll update the ticket when I also tested it on 32bit linux.
Comment 1 KaptOc6obnuCac0Bluc+bugzilla.xamarin.com 2014-04-29 08:10:21 UTC
Created attachment 6675 [details]
Comment 2 Rodrigo Kumpera 2014-05-20 23:32:22 UTC
One very common problem with people embedding LLVM is that it highjacks signals and don't chain them.

Could you check if llvm is not doing it?

And could you attach a full dump of all threads? With just one backtrace it's not possible to see the deadlock.
Comment 3 KaptOc6obnuCac0Bluc+bugzilla.xamarin.com 2014-05-22 04:45:12 UTC
The attached backtrace contains information about every thread. Please search for "info threads" around the middle of the file. I also printed every stack frame arguments and local variables.

You are right. The LLVM does install some signal handler for PrettyStackTrace feature. It looks like this messed up the Mono GC.

Unfortunately the llvm-fs bindig not yet expose this functionality, so I created a small shared library, which is called at program start. Mono GC now works properly.

For future record, this function will disable the LLVM default signal handlers:
extern "C" void
	llvm::DisablePrettyStackTrace = true;

Thanks for your help!
Comment 4 Rodrigo Kumpera 2014-05-22 10:46:13 UTC
Yes, you're right, the full backtrace was inside.

I'm happy that you managed to fix it.