Bug 7013 - Make libmonosgen-2.0.so a "mixed-ABI" native library
Summary: Make libmonosgen-2.0.so a "mixed-ABI" native library
Status: ASSIGNED
Alias: None
Product: Android
Classification: Xamarin
Component: Mono runtime / AOT Compiler (show other bugs)
Version: 4.2.x
Hardware: PC Mac OS
: Lowest minor
Target Milestone: master
Assignee: Marek Habersack
URL:
: 7167 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-09-08 20:13 UTC by Alex Corrado
Modified: 2016-09-21 19:45 UTC (History)
5 users (show)

See Also:
Tags: XATriaged
Is this bug a regression?: ---
Last known good build:


Attachments
Testcase project (27.73 KB, application/zip)
2012-09-10 15:01 UTC, Alex Corrado
Details

Description Alex Corrado 2012-09-08 20:13:30 UTC
On a Samsung S3 (only), I am seeing some weird exceptions trying to post to the main thread from a dispatch thread. The code for the dispatch thread is at https://github.com/nirvanai/cirrus/blob/master/src/client/DispatchThread.cs

For the OnDispatch event handler, I've tried both a simple Android.OS.Handler.Post and Android.App.Activity.RunOnUiThread, both times supplying an Action delegate parameter.

Here are the exceptions I'm seeing:

* Assertion at /Users/builder/data/lanes/monodroid-mac-monodroid-4.2.5-branch/adcec4e0/source/mono/mono/metadata/monitor.c:528, condition `mon->nest == 1' not met
Stacktrace:

  at System.Threading.Monitor.TryEnter (object,int,bool&) <0x00053>
  at System.Threading.Monitor.Enter (object,bool&) <0x0001f>
  at Java.Lang.Object.GetObject (intptr,Android.Runtime.JniHandleOwnership,System.Type) <0x00083>
  at Java.Lang.Object._GetObject<T> (intptr,Android.Runtime.JniHandleOwnership) <0x0004f>
  at Java.Lang.Object.GetObject<T> (intptr,Android.Runtime.JniHandleOwnership) <0x0002b>
  at Java.Lang.IRunnableInvoker.n_Run (intptr,intptr) <0x00023>
  at (wrapper dynamic-method) object.18c9fc12-0a3f-4800-a5f2-1e6fee31555f (intptr,intptr) <0x0002b>
  at (wrapper native-to-managed) object.18c9fc12-0a3f-4800-a5f2-1e6fee31555f (intptr,intptr) <0xffffffff>
UNHANDLED EXCEPTION: System.NullReferenceException: Object reference not set to an instance of an object
at System.Threading.Monitor.TryEnter (object,int,bool&) <0x00053>
at System.Threading.Monitor.Enter (object,bool&) <0x0001f>
at Java.Lang.Object.GetObject (intptr,Android.Runtime.JniHandleOwnership,System.Type) <0x00083>
at Java.Lang.Object._GetObject<Java.Lang.IRunnable> (intptr,Android.Runtime.JniHandleOwnership) <0x0004f>
at Java.Lang.Object.GetObject<Java.Lang.IRunnable> (intptr,Android.Runtime.JniHandleOwnership) <0x0002b>
at Java.Lang.IRunnableInvoker.n_Run (intptr,intptr) <0x00023>
at (wrapper dynamic-method) object.18c9fc12-0a3f-4800-a5f2-1e6fee31555f (intptr,intptr) <0x0002b>


And:


Error destroying handle 0x448 mutex due to 16
_wapi_handle_unref_full: Attempting to unref unused handle 0x448
Stacktrace:

  at System.Threading.Monitor.TryEnter (object,int,bool&) <0x00053>
  at System.Threading.Monitor.Enter (object,bool&) <0x0001f>
  at Java.Lang.Object.GetObject (intptr,Android.Runtime.JniHandleOwnership,System.Type) <0x00083>
  at Java.Lang.Object._GetObject<T> (intptr,Android.Runtime.JniHandleOwnership) <0x0004f>
  at Java.Lang.Object.GetObject<T> (intptr,Android.Runtime.JniHandleOwnership) <0x0002b>
  at Java.Lang.IRunnableInvoker.n_Run (intptr,intptr) <0x00023>
  at (wrapper dynamic-method) object.fa97596d-5765-4327-8d1e-f4cfac0f4890 (intptr,intptr) <0x0002b>
  at (wrapper native-to-managed) object.fa97596d-5765-4327-8d1e-f4cfac0f4890 (intptr,intptr) <0xffffffff>
UNHANDLED EXCEPTION: System.NullReferenceException: Object reference not set to an instance of an object
at System.Threading.Monitor.TryEnter (object,int,bool&) <0x00053>
at System.Threading.Monitor.Enter (object,bool&) <0x0001f>
at Java.Lang.Object.GetObject (intptr,Android.Runtime.JniHandleOwnership,System.Type) <0x00083>
at Java.Lang.Object._GetObject<Java.Lang.IRunnable> (intptr,Android.Runtime.JniHandleOwnership) <0x0004f>
at Java.Lang.Object.GetObject<Java.Lang.IRunnable> (intptr,Android.Runtime.JniHandleOwnership) <0x0002b>
at Java.Lang.IRunnableInvoker.n_Run (intptr,intptr) <0x00023>
at (wrapper dynamic-method) object.fa97596d-5765-4327-8d1e-f4cfac0f4890 (intptr,intptr) <0x0002b>

Unhandled Exception:
System.NullReferenceException: Object reference not set to an instance of an object
  at System.Threading.Monitor.TryEnter (System.Object obj, Int32 millisecondsTimeout, System.Boolean& lockTaken) [0x00000] in <filename unknown>:0 
  at System.Threading.Monitor.Enter (System.Object obj, System.Boolean& lockTaken) [0x00000] in <filename unknown>:0 
  at Java.Lang.Object.GetObject (IntPtr handle, JniHandleOwnership transfer, System.Type type) [0x00000] in <filename unknown>:0 
  at Java.Lang.Object._GetObject[IRunnable] (IntPtr handle, JniHandleOwnership transfer) [0x00000] in <filename unknown>:0 
  at Java.Lang.IRunnableInvoker.n_Run (IntPtr jnienv, IntPtr native__this) [0x00000] in <filename unknown>:0 
  at (wrapper dynamic-method) object:fa97596d-5765-4327-8d1e-f4cfac0f4890 (intptr,intptr)
Comment 1 Alex Corrado 2012-09-09 04:17:42 UTC
Oh I forgot a stack trace that also looks related:

09-06 22:29:39.340 E/mono    (11130): Unhandled Exception: System.IndexOutOfRangeException: Array index is out of range.
09-06 22:29:39.340 E/mono    (11130):   at System.Collections.Generic.Dictionary`2[System.IntPtr,System.WeakReference].set_Item (IntPtr key, System.WeakReference value) [0x00000] in <filename unknown>:0 
09-06 22:29:39.340 E/mono    (11130):   at Java.Lang.Object.RegisterInstance (IJavaObject instance, IntPtr value, JniHandleOwnership transfer) [0x00000] in <filename unknown>:0 
09-06 22:29:39.340 E/mono    (11130):   at Java.Lang.Object.SetHandle (IntPtr value, JniHandleOwnership transfer) [0x00000] in <filename unknown>:0 
09-06 22:29:39.340 E/mono    (11130):   at Java.Lang.Object..ctor (IntPtr handle, JniHandleOwnership transfer) [0x00000] in <filename unknown>:0 
09-06 22:29:39.340 E/mono    (11130):   at Java.Lang.Thread+RunnableImplementor..ctor (System.Action handler, Boolean removable) [0x00000] in <filename unknown>:0 
09-06 22:29:39.340 E/mono    (11130):   at Android.OS.Handler.Post (System.Action action) [0x00000] in <filename unknown>:0
Comment 2 Jonathan Pryor 2012-09-09 15:17:14 UTC
Is this a Debug or a Release build? If a Release build, are you using the armeabi or armeabi-v7a runtimes?
Comment 3 Alex Corrado 2012-09-09 22:24:37 UTC
It was a release build targeting armeabi for API level 8.
Comment 4 Alex Corrado 2012-09-10 15:01:10 UTC
Created attachment 2495 [details]
Testcase project

Leave this running for a bit
Comment 5 Alex Corrado 2012-09-10 15:02:28 UTC
Witnessed this output running the above testcase on a Xoom (so I guess it isn't just the S3, unless this exception is unrelated?)

WARNING: The runtime version supported by this application is unavailable.
Using default runtime: v2.0.50727
GREF GC Threshold: 46800
_wapi_handle_unref_full: Attempting to unref unused handle 0x633
_wapi_handle_unref_full: Attempting to unref unused handle 0x408
_wapi_handle_unref_full: Attempting to unref unused handle 0x408

Unhandled Exception:
System.ObjectDisposedException: The object was used after being disposed.
  at System.Runtime.InteropServices.SafeHandle.DangerousAddRef (System.Boolean& success) [0x00000] in <filename unknown>:0
  at System.Threading.WaitHandle.WaitOne () [0x00000] in <filename unknown>:0
  at Cirrus.DispatchThread.RunDispatch () [0x00000] in <filename unknown>:0
  at System.Threading.Thread.StartInternal () [0x00000] in <filename unknown>:0
Comment 6 Jonathan Pryor 2012-09-10 21:45:04 UTC
The problem is that the fix for #6654 only calls sched_setaffinity() from one place, for one thread, on the (mistaken) assumption that:

    int mask = 1;
    sched_setaffinity (getpid(), sizeof (mask), &mask);

would alter every thread in the process. To be fair, the Linux docs aren't entirely clear:

    http://www.kernel.org/doc/man-pages/online/pages/man2/sched_setaffinity.2.html

> If the process specified by pid is not currently running on one of the CPUs
> specified in mask, then that process is migrated to one of the CPUs specified
> in mask.

While this seems straightforward, it's apparently also wrong, if stackoverflow is to be believed:

    http://stackoverflow.com/a/5673275/83444

> A call to sched_setaffinity() affects only a single thread. ...
> 
> This means that if you change the affinity of the current thread after
> creating other threads, their affinity will remain the default;

Oops.

To verify, Run Attachment #2495 [details]. While the screen is flashing, things are fine. Once the screen stops flashing, things are NOT fine. Either way, while running we have a process:

    # Verification of #6654
    $ adb shell cat /proc/14474/status | grep Cpus_allowed_list
    Cpus_allowed_list:	0

Looks good, right? But:

    $ adb shell cat '/proc/14474/task/*/status' | grep Cpus_allowed_list
    Cpus_allowed_list:	0
    Cpus_allowed_list:	0-1
    Cpus_allowed_list:	0-1
    Cpus_allowed_list:	0-1
    Cpus_allowed_list:	0-1
    Cpus_allowed_list:	0-1
    Cpus_allowed_list:	0-1
    Cpus_allowed_list:	0-1
    Cpus_allowed_list:	0-1
    Cpus_allowed_list:	0
    Cpus_allowed_list:	0
    Cpus_allowed_list:	0
    Cpus_allowed_list:	0
    Cpus_allowed_list:	0

Oops. Some of those threads aren't tied to a single core, including the "GC" task, the "Signal Catcher" task, the Compiler task, and others.

Thus, the problem is that a single sched_setaffinity() call is inadequate; we also need to call it on every thread that enters managed code!
Comment 7 Jonathan Pryor 2012-09-11 11:30:52 UTC
This may be very well unfixable. :-(

Experiment: Move the sched_setaffinity() call from Runtime.init() into JNI_OnLoad() (i.e. earlier in the process' lifetime, though not significantly earlier), then add sched_setaffinity() calls to the Mono thread creation and Java thread Attach codepaths.

The idea being that we'll "force/ensure" sched_setaffinity() on every thread that enters Mono.

The problem? It still suffers from "undesirable behavior" (the screen stops flashing and/or crashes with inconsistent stack traces).

Using sched_setaffinity() + sched_yield() is no better.
Comment 9 Jonathan Pryor 2012-09-14 10:53:39 UTC
*** Bug 7167 has been marked as a duplicate of this bug. ***
Comment 10 Jason Steele 2012-09-20 09:48:16 UTC
Hi, We are also having problems with the S3. We are using Mono for Android 4.2.4. There seems to be ageneral instability following is the log for one of the more frequent exceptions: 

09-20 11:27:51.695 E/mono    (5072): 
09-20 11:27:51.695 E/mono    (5072): Unhandled Exception: System.ObjectDisposedException: The object was used after being disposed.
09-20 11:27:51.695 E/mono    (5072):   at System.Runtime.InteropServices.SafeHandle.DangerousGetHandle () [0x00000] in <filename unknown>:0 
09-20 11:27:51.695 E/mono    (5072):   at System.Threading.WaitHandle.get_Handle () [0x00000] in <filename unknown>:0 
09-20 11:27:51.695 E/mono    (5072):   at System.Threading.EventWaitHandle.Reset () [0x00000] in <filename unknown>:0 
09-20 11:27:51.695 E/mono    (5072):   at (wrapper remoting-invoke-with-check) System.Threading.EventWaitHandle:Reset ()
09-20 11:27:51.695 E/mono    (5072):   at System.Threading.Timer+Scheduler.SchedulerThread () [0x00000] in <filename unknown>:0 
09-20 11:27:51.695 E/mono    (5072):   at System.Threading.Thread.StartInternal () [0x00000] in <filename unknown>:0 

Is there an understanding now of what is causing the problem?
Is it known when there is likely to be a fix?

Thanks
Comment 11 Jonathan Pryor 2012-09-20 11:41:45 UTC
> Is there an understanding now of what is causing the problem?

Yes: things are...complicated.

As far as I can determine, it is (nearly) IMPOSSIBLE to SAFELY use an armeabi library on a SMP armeabi-v7a device. This is because armeabi lacks the CPU instructions necessary to safely lock data on SMP devices, so if the armeabi library contains data that must be protected against access from multiple threads, it's busted, and libmonodroid.so is such a library. This may be fixable by creating a libmonodroid.so which dynamically determines the runtime CPU, allowing it to use either armeabi or armeabi-v7a lock instructions accordingly, but this has not been done  yet, and the implementation timeframe is unknown.

Thus, if your app will be running on SMP hardware, you should include the armeabi-v7a runtime with your app. This can be done in the Project Options dialog.

Of course, things can't be THAT simple: if your app includes both the armeabi and armeabi-v7a runtimes, AND your app is installed on a device running Android 4.0-4.0.3, Android will install the armeabi runtime, not the armeabi-v7a runtime:

http://code.google.com/p/android/issues/detail?id=25321

Fortunately (for you), the Galaxy S3 is running at least Android 4.0.4, which has a fix for the above Android bug. Unfortunately (for everyone), Google's Android Platform Versions dashboard doesn't differentiate between 4.0.3 and 4.0.4, so I have no idea how many users are on a "weird" (broken) Android:

http://developer.android.com/about/dashboards/index.html

Furthermore, if you're including additional native libraries, you need to keep in mind that Android prior to 4.0 will only extract native libraries for a SINGLE ABI:

http://code.google.com/p/android/issues/detail?id=9089
Comment 12 Jason Steele 2012-09-20 12:13:40 UTC
Hi Jonanthan, thanks for your prompt reply...

>>Fortunately (for you), the Galaxy S3 is running at least Android 4.0.4, which
has a fix for the above Android bug

Presumably this means that as long as the armeabi-v7a box is ticked the build should work on an S3 without these problems?

>>Of course, things can't be THAT simple: if your app includes both the armeabi
and armeabi-v7a runtimes, AND your app is installed on a device running Android
4.0-4.0.3, Android will install the armeabi runtime, not the armeabi-v7a
runtime:

So it might be safer to just include armeabi-v7a and exclude armeabi. Do you have any idea of which devices this would exclude?

Thnaks, Jason
Comment 13 Jonathan Pryor 2012-09-20 15:25:35 UTC
> Presumably this means that as long as the armeabi-v7a box is ticked the build
> should work on an S3 without these problems?

Correct.

> So it might be safer to just include armeabi-v7a and exclude armeabi. Do you
> have any idea of which devices this would exclude?

I agree, that would be safer. Unfortunately I don't know how many devices this would exclude. However:

1. The NDK r4b released in 2010 added support for armeabi-v7a
2. http://mashable.com/2012/05/16/android-fragmentation-graphic

With (2) we could start getting a feel for which devices are present, and then figure out which CPU architectures they support:

GT-i9100: Exynos/OMAP4=armeabi-v7a
GT-i9000: Exynos=armeabi-v7a
GT-S5830: "800 MHz ARM 11" == armv6=armeabi
Desire HD: "Qualcomm 8255" = armeabi-v7a
HTC Desire: "Qualcomm QSD8250" = armeabi-v7a
GT N7000: Exynos = armeabi-v7a

That could take forever, so I focused on just the largest phone models in (2), making up ~25% of the market, and only one of them appears to have an armeabi core.

Unfortunately I can't find a larger version of the graphic in (2), so finding additional model numbers would be difficult, and there are ~4000+ models...

All that said, I'd be inclined to just require armeabi-v7a, and only worry about armeabi if you get enough demand (at which point it might be worthwhile to investigate Google's multi-.apk support...)
Comment 14 Jason Steele 2012-09-24 14:30:56 UTC
Hi Jonathan,

I started working through the next 25% with the help of http://pdadb.net/ to find what ARM version a device is and found that there were still quite a few v6s still out there.

So we went for a combined armeabi and armeabi-v7a APK and released this to two S3s which got a thorough work out over the weekend. No crashes at all, so very grateful for your steer on this issue. :)

So as I see it, now we can either go for the combined APK or release separate APKs (one armeabi and one armeabi-v7a) to Google Play under the same listing.

The latter choice reduces the size by 2MB but is obviously a bit more fiddly. Is it worth doing? Do you know whether the correct version gets downloaded for each version of Android? 

Many thanks,
Jason
Comment 15 Jonathan Pryor 2012-09-24 14:44:18 UTC
WORKAROUND for the Android 4.0 bug mentioned in Comment #11:

    http://code.google.com/p/android/issues/detail?id=25321

If you want/need to include both armeabi and armeabi-v7a support in your .apk, you need to ensure that the armeabi-v7a runtime is added to the .apk BEFORE the armeabi one.

How do you do that?

By manually editing your .csproj, and changing:

    <AndroidSupportedAbis>armeabi,armeabi-v7a</AndroidSupportedAbis>

to:

    <AndroidSupportedAbis>armeabi-v7a,armeabi</AndroidSupportedAbis>

i.e. ensure that armeabi-v7a comes before armeabi.

We will be updating Mono for Android 4.2.7 so that .apk creation will add the armeabi-v7a runtime before the armeabi runtime, but the above fix will be required for Mono for Android <= 4.2.6.
Comment 16 Jonathan Pryor 2012-09-25 18:25:38 UTC
Comment #15 workaround added in master/727c6bdb and monodroid-4.2-series/75342f9d.
Comment 17 Jonathan Pryor 2012-11-02 16:35:57 UTC
Update summary so it's useful.

The long-term fix is to provide an armeabi libmonosgen-2.0.so native library which detects which architecture it's running on and uses armeabi-v7a-aware instructions for thread-safety when running on armeabi-v7a hardware.

Note You need to log in before you can comment on or make changes to this bug.