Bug 43099 - [watchOS] Cannot enter GC safe region if the thread is not attached
Summary: [watchOS] Cannot enter GC safe region if the thread is not attached
Alias: None
Product: iOS
Classification: Xamarin
Component: Mono runtime / AOT compiler ()
Version: master
Hardware: PC Mac OS
: --- normal
Target Milestone: 10.0.0 (C8)
Assignee: Aleksey Kliger
Depends on:
Reported: 2016-08-05 16:50 UTC by Rolf Bjarne Kvinge [MSFT]
Modified: 2016-08-22 10:08 UTC (History)
5 users (show)

Is this bug a regression?: ---
Last known good build:

Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.

Please create a new report on Developer Community or GitHub with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:

Description Rolf Bjarne Kvinge [MSFT] 2016-08-05 16:50:22 UTC
This is an assertion we're randomly hitting in our tests, so there is no repro (for now).

> Thread 9 Crashed:: tid_6e23
> 0   libsystem_c.dylib             	0x05890724 __abort + 230
> 1   libsystem_c.dylib             	0x0589063e abort + 173
> 2   com.xamarin.monotouch-test.watchkitapp.watchkitextension	0x003875f8 log_callback(char const*, char const*, char const*, int, void*) + 88 (runtime.m:1112)
> 3   com.xamarin.monotouch-test.watchkitapp.watchkitextension	0x0035922a log_adapter + 154 (mono-logger.c:259)
> 4   com.xamarin.monotouch-test.watchkitapp.watchkitextension	0x0036ba67 monoeg_g_log + 103 (goutput.c:114)
> 5   com.xamarin.monotouch-test.watchkitapp.watchkitextension	0x0035c9c0 check_info + 96 (mono-threads-coop.c:93)
> 6   com.xamarin.monotouch-test.watchkitapp.watchkitextension	0x0035c6fd mono_threads_enter_gc_safe_region_unbalanced_with_info + 45 (mono-threads-coop.c:226)
> 7   com.xamarin.monotouch-test.watchkitapp.watchkitextension	0x0035c63a mono_threads_enter_gc_safe_region_with_info + 26 (mono-threads-coop.c:200)
> 8   com.xamarin.monotouch-test.watchkitapp.watchkitextension	0x0035c60b mono_threads_enter_gc_safe_region + 27 (mono-threads-coop.c:189)
> 9   com.xamarin.monotouch-test.watchkitapp.watchkitextension	0x00311a5f sgen_deregister_root + 47 (mono-coop-mutex.h:53)
> 10  com.xamarin.monotouch-test.watchkitapp.watchkitextension	0x002d2d0c sgen_thread_detach + 44 (sgen-mono.c:2324)
> 11  com.xamarin.monotouch-test.watchkitapp.watchkitextension	0x00360492 unregister_thread + 114 (mono-threads.c:430)
> 12  libsystem_pthread.dylib       	0x05b0129d _pthread_tsd_cleanup + 537
> 13  libsystem_pthread.dylib       	0x05b00e22 _pthread_exit + 108
> 14  libsystem_pthread.dylib       	0x05aff354 _pthread_wqthread + 1298
> 15  libsystem_pthread.dylib       	0x05afcf56 start_wqthread + 34

Complete crash report: https://gist.github.com/rolfbjarne/a109afe32bc0157ab34724669ae1efda

This is from jenkins: https://jenkins.mono-project.com/job/xamarin-macios-pr-builder/884/ (monotouch-test/watchOS), on top of https://github.com/xamarin/xamarin-macios/commit/c3748631732fc716584132ce24a6d4be7897db98.
Comment 1 Rodrigo Kumpera 2016-08-05 19:17:54 UTC
Hey Aleksey,

This looks like a coop bug.
Comment 2 Aleksey Kliger 2016-08-08 19:44:09 UTC
So this is pretty interesting. unregister_thread can be called from two different places:
  (1) explicitly when we call mono_thread_info_detach
  (2) implicitly when the thread_info tls key is destroyed if it's non-NULL (posix)
In (1) we first call unregister_thread and then set the tls key to NULL
In (2) the way posix works is it first sets the key to NULL and then passes the old value to unregister_thread.

So anything function called by unregister_thread shouldn't use mono_thread_info_current and mono_thread_info_current_unchecked.

(The immediate cause of the crash is  MONO_GC_ENTER_SAFE  which ought to be a MONO_GC_ENTER_SAFE_WITH_INFO(info); it's a bit hard to hit because the gc_mutex has to be contested and you have to be one of these native threads that get indirectly attached because they call a delegate)

I'm going to stick some garbage value in the thread_info tls key in mono_thread_info_detach and see how much crashy code I get in unregister_thread.


A separate question is whether the GC mutex needs to be a MonoCoopMutex - we crash because we're trying to change the coop state of a thread that's being detached; in fact we want to unregister it as a root.

Maybe sgen_gc_lock/unlock calls should be wrapped in MONO_GC_ENTER/EXIT_SAFE calls, not their internals.  (And then the detach code could take the mutex without changing the thred state)
Comment 3 Aleksey Kliger 2016-08-11 16:27:22 UTC
Fixed on mono master a232eb3b84104dfc5cb04481285fa9458db67761.
Fixed on mono-4.6.0-branch e571108acf8c56bb28a4c6169abfdc10abb1fc51.

Interestingly, without the fix my test crashes reliably on the 4.6 branch, but not on master; I wonder if we hold the locks longer and are more likely to contest.
Comment 4 Sebastien Pouliot 2016-08-16 14:05:30 UTC
PR https://github.com/xamarin/xamarin-macios/pull/609 was merged. Closing for QA verification
Comment 6 Rolf Bjarne Kvinge [MSFT] 2016-08-22 10:08:26 UTC
There's no known repro, since the issue is random, so I'll just mark as verified and then I'll reopen if I see the issue again.