Bug 59371 - Cannot build Mono on Linux/Fedora: SIGSEGV
Summary: Cannot build Mono on Linux/Fedora: SIGSEGV
Alias: None
Product: Runtime
Classification: Mono
Component: GC ()
Version: unspecified
Hardware: PC Linux
: --- normal
Target Milestone: ---
Assignee: Bugzilla
Depends on:
Reported: 2017-09-09 10:13 UTC by Armin
Modified: 2017-09-21 15:43 UTC (History)
3 users (show)

Is this bug a regression?: ---
Last known good build:

stack trace (14.38 KB, text/plain)
2017-09-09 10:13 UTC, Armin

Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.

Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:

Description Armin 2017-09-09 10:13:07 UTC
Created attachment 24668 [details]
stack trace

https://github.com/mono/mono/blob/d96de323da8d7d9561ecbb5c2ebcdeb23ee1ee2e/mono/metadata/threads.c#L3040 (introduced in d96de323da8d7d9561ecbb5c2ebcdeb23ee1ee2e) is somehow connected to a `SIGSEGV` (please find the stack trace attached) that occurs on my Linux/Fedora machine (every time - 100%). Removing that line lets Mono compile as expected. d96de323da8d7d9561ecbb5c2ebcdeb23ee1ee2e is labeled as revert of 7db0fb0c886f5157066e26c2e2ae2d39c338cf6b, however, this line was not part of that earlier commit. I would suggest to remove #L3040 for now.
Comment 1 Armin 2017-09-09 11:22:27 UTC
Dug around a bit: the SIGSEGV is definitely connected to https://github.com/luhenry/mono/blob/6928fda088c549648b0483e4e88c1d3dcdfdda30/mono/metadata/gc.c#L998 - `gc_thread->tid` is (always) `0` on my machine (haven't gotten into details about the "why") which ultimately calls `pthread_join(0, ...)`. A check if `thread != 0` in `mono_threads_add_joinable_thread ()` or `mono_threads_join_threads ()` seems to be the better option.
Comment 2 Armin 2017-09-09 13:08:00 UTC
Just found out why `gc_thread->tid` is `0` in #L998: `finalizer_thread ()` is started by `mono_thread_create_internal ()` and through `create_thread ()` and `start_wrapper ()` it ends up in `start_wrapper_internal ()` which set `internal->tid = 0;` in https://github.com/luhenry/mono/blob/6928fda088c549648b0483e4e88c1d3dcdfdda30/mono/metadata/threads.c#L1026

I suppose that something like `pthread_join(0, ...)` will be called on every OS but it does not seem to explode anywhere else like it does on Fedora (26)?

Anyways: the lines `mono_threads_add_joinable_thread (GUINT_TO_POINTER (gc_thread->tid));` seem obsolete (and potentially dangerous) to me.
Comment 3 Armin 2017-09-09 16:22:18 UTC
One tiny addition: `pthread_join ()` seems to segfault with every invalid thread (id) - not just with 0 (NULL).
Comment 4 Bernhard Urban 2017-09-21 15:43:31 UTC
fixed by https://github.com/mono/mono/pull/5615