This is Xamarin's bug tracking system. For product support, please use the support links listed in your Xamarin Account.
Bug 43099 - [watchOS] Cannot enter GC safe region if the thread is not attached
Summary: [watchOS] Cannot enter GC safe region if the thread is not attached
Status: VERIFIED FIXED
Alias: None
Product: iOS
Classification: Xamarin
Component: Mono runtime / AOT compiler (show other bugs)
Version: master
Hardware: PC Mac OS
: --- normal
Target Milestone: 10.0.0 (C8)
Assignee: Aleksey Kliger
URL:
Depends on:
Blocks:
 
Reported: 2016-08-05 16:50 UTC by Rolf Bjarne Kvinge
Modified: 2016-08-22 10:08 UTC (History)
5 users (show)

See Also:
Tags:
Is this bug a regression?: ---
Last known good build:


Attachments

Description Rolf Bjarne Kvinge 2016-08-05 16:50:22 UTC
This is an assertion we're randomly hitting in our tests, so there is no repro (for now).

> Thread 9 Crashed:: tid_6e23
> 0   libsystem_c.dylib             	0x05890724 __abort + 230
> 1   libsystem_c.dylib             	0x0589063e abort + 173
> 2   com.xamarin.monotouch-test.watchkitapp.watchkitextension	0x003875f8 log_callback(char const*, char const*, char const*, int, void*) + 88 (runtime.m:1112)
> 3   com.xamarin.monotouch-test.watchkitapp.watchkitextension	0x0035922a log_adapter + 154 (mono-logger.c:259)
> 4   com.xamarin.monotouch-test.watchkitapp.watchkitextension	0x0036ba67 monoeg_g_log + 103 (goutput.c:114)
> 5   com.xamarin.monotouch-test.watchkitapp.watchkitextension	0x0035c9c0 check_info + 96 (mono-threads-coop.c:93)
> 6   com.xamarin.monotouch-test.watchkitapp.watchkitextension	0x0035c6fd mono_threads_enter_gc_safe_region_unbalanced_with_info + 45 (mono-threads-coop.c:226)
> 7   com.xamarin.monotouch-test.watchkitapp.watchkitextension	0x0035c63a mono_threads_enter_gc_safe_region_with_info + 26 (mono-threads-coop.c:200)
> 8   com.xamarin.monotouch-test.watchkitapp.watchkitextension	0x0035c60b mono_threads_enter_gc_safe_region + 27 (mono-threads-coop.c:189)
> 9   com.xamarin.monotouch-test.watchkitapp.watchkitextension	0x00311a5f sgen_deregister_root + 47 (mono-coop-mutex.h:53)
> 10  com.xamarin.monotouch-test.watchkitapp.watchkitextension	0x002d2d0c sgen_thread_detach + 44 (sgen-mono.c:2324)
> 11  com.xamarin.monotouch-test.watchkitapp.watchkitextension	0x00360492 unregister_thread + 114 (mono-threads.c:430)
> 12  libsystem_pthread.dylib       	0x05b0129d _pthread_tsd_cleanup + 537
> 13  libsystem_pthread.dylib       	0x05b00e22 _pthread_exit + 108
> 14  libsystem_pthread.dylib       	0x05aff354 _pthread_wqthread + 1298
> 15  libsystem_pthread.dylib       	0x05afcf56 start_wqthread + 34

Complete crash report: https://gist.github.com/rolfbjarne/a109afe32bc0157ab34724669ae1efda

This is from jenkins: https://jenkins.mono-project.com/job/xamarin-macios-pr-builder/884/ (monotouch-test/watchOS), on top of https://github.com/xamarin/xamarin-macios/commit/c3748631732fc716584132ce24a6d4be7897db98.
Comment 1 Rodrigo Kumpera 2016-08-05 19:17:54 UTC
Hey Aleksey,

This looks like a coop bug.
Comment 2 Aleksey Kliger 2016-08-08 19:44:09 UTC
So this is pretty interesting. unregister_thread can be called from two different places:
  (1) explicitly when we call mono_thread_info_detach
  (2) implicitly when the thread_info tls key is destroyed if it's non-NULL (posix)
In (1) we first call unregister_thread and then set the tls key to NULL
In (2) the way posix works is it first sets the key to NULL and then passes the old value to unregister_thread.

So anything function called by unregister_thread shouldn't use mono_thread_info_current and mono_thread_info_current_unchecked.

(The immediate cause of the crash is  MONO_GC_ENTER_SAFE  which ought to be a MONO_GC_ENTER_SAFE_WITH_INFO(info); it's a bit hard to hit because the gc_mutex has to be contested and you have to be one of these native threads that get indirectly attached because they call a delegate)

I'm going to stick some garbage value in the thread_info tls key in mono_thread_info_detach and see how much crashy code I get in unregister_thread.


----

A separate question is whether the GC mutex needs to be a MonoCoopMutex - we crash because we're trying to change the coop state of a thread that's being detached; in fact we want to unregister it as a root.

Maybe sgen_gc_lock/unlock calls should be wrapped in MONO_GC_ENTER/EXIT_SAFE calls, not their internals.  (And then the detach code could take the mutex without changing the thred state)
Comment 3 Aleksey Kliger 2016-08-11 16:27:22 UTC
Fixed on mono master a232eb3b84104dfc5cb04481285fa9458db67761.
Fixed on mono-4.6.0-branch e571108acf8c56bb28a4c6169abfdc10abb1fc51.

Interestingly, without the fix my test crashes reliably on the 4.6 branch, but not on master; I wonder if we hold the locks longer and are more likely to contest.
Comment 4 Sebastien Pouliot 2016-08-16 14:05:30 UTC
PR https://github.com/xamarin/xamarin-macios/pull/609 was merged. Closing for QA verification
Comment 6 Rolf Bjarne Kvinge 2016-08-22 10:08:26 UTC
There's no known repro, since the issue is random, so I'll just mark as verified and then I'll reopen if I see the issue again.

Note You need to log in before you can comment on or make changes to this bug.