Bug 23987

Summary: Unable to debug on Lollipop
Product: Android Reporter: Dylan <dylanvdm>
Component: DebuggerAssignee: Zoltan Varga <vargaz>
Status: VERIFIED FIXED    
Severity: normal CC: arpitj, jeremie.laval, mohitk, mono-bugs+monodroid, peter.collins, pj.beaman, tj, vargaz
Priority: Normal    
Version: 4.18.0   
Target Milestone: 4.20   
Hardware: PC   
OS: Windows   
Tags: Is this bug a regression?: ---
Last known good build:
Attachments: device logging output
log1
log2
log3

Description Dylan 2014-10-22 10:35:27 UTC
I seem to be unable to debug apps through Visual Studio on my Nexus 5 running the latest Android 5 (Lollipop) developer release. Running the app that was installed without the debugger attaching works perfectly. 

Note that this occurs in both Visual Studio and XS. Debugger works as expected on Android 4.4 devices.

The app builds and copies onto the device, but crashes almost instantly with very little written to the log file:
10-22 11:32:59.545 F/libc    (30972): Fatal signal 6 (SIGABRT), code -6 in tid 30972 (m.Dealership.UI)

Sometimes I manage to get this in the log:
10-22 11:37:53.464 E/mono-rt (32100): 
10-22 11:37:53.464 E/mono-rt (32100): =================================================================
10-22 11:37:53.464 E/mono-rt (32100): Got a SIGSEGV while executing native code. This usually indicates
10-22 11:37:53.464 E/mono-rt (32100): a fatal error in the mono runtime or one of the native libraries 
10-22 11:37:53.464 E/mono-rt (32100): used by your application.
10-22 11:37:53.464 E/mono-rt (32100): =================================================================
10-22 11:37:53.464 E/mono-rt (32100): 
10-22 11:37:53.465 F/libc    (32100): Fatal signal 11 (SIGSEGV), code 1, fault addr 0x0 in tid 32198 (RenderThread)


I have attached what was output to the device logging window from when the app was installed to just after it crashed.


Xamarin.Android 4.18
Comment 1 Dylan 2014-10-22 10:36:06 UTC
Created attachment 8469 [details]
device logging output
Comment 2 Dylan 2014-10-26 08:43:39 UTC
A new fatal signal I just spotted...


10-26 14:22:42.097 E/mono-rt (19539): =================================================================
10-26 14:22:42.097 E/mono-rt (19539): Got a SIGSEGV while executing native code. This usually indicates
10-26 14:22:42.097 E/mono-rt (19539): a fatal error in the mono runtime or one of the native libraries 
10-26 14:22:42.097 E/mono-rt (19539): used by your application.
10-26 14:22:42.097 E/mono-rt (19539): =================================================================
10-26 14:22:42.097 E/mono-rt (19539): 
10-26 14:22:42.097 F/libc    (19539): Fatal signal 11 (SIGSEGV), code 2, fault addr 0x12c814e0 in tid 19624 (RenderThread)
Comment 3 T.J. Purtell 2014-11-05 00:19:54 UTC
My team also experiences this issue.
Comment 4 T.J. Purtell 2014-11-05 00:45:22 UTC
I should mention that, for a simple test project, it does work fine.  I debugged one today.  I'll follow up with my regular L developer to see what's going on in a proper app.
Comment 5 Dylan 2014-11-05 06:44:19 UTC
I'll try on final Lollipop build when the factory image for Nexus 5 is available. 

Any news from Xamarin on this at all? Any idea what the issue is?
Comment 6 Jérémie Laval 2014-11-05 08:59:11 UTC
Hey folks,

Yes we are aware of the issue (we also see it) and are investigating.

The issue (as you also see in your stacktraces) seems to come from the new RenderThread mechanism in Lollipop. It could be some race to access uninitialized state, a standard thread-safety problem or anything else due to that new thread being thrown in the mix.
Comment 7 Zoltan Varga 2014-11-05 13:52:40 UTC
I can't reproduce this with a simple app.
Comment 8 Jérémie Laval 2014-11-05 14:13:10 UTC
For me it doesn't happen 100%, you need to launch debugging a few times for it to fail.
Comment 9 T.J. Purtell 2014-11-05 15:08:18 UTC
FYI, even with a full big app, it happens ~1/3 times for us.

> 11:12:39.286: I/mono-stdout(24711): Creating a new database connection.  This will take some time.
> 11:12:39.308: E/mono-rt(24711): =================================================================
> 11:12:39.308: E/mono-rt(24711): Got a SIGSEGV while executing native code. This usually indicates
> 11:12:39.308: E/mono-rt(24711): a fatal error in the mono runtime or one of the native libraries 
> 11:12:39.308: E/mono-rt(24711): used by your application.
> 11:12:39.308: E/mono-rt(24711): =================================================================
> 11:12:39.308: A/libc(24711): Fatal signal 11 (SIGSEGV), code 1, fault addr 0x0 in tid 24788 (RenderThread)
> 11:12:39.336: I/art(24711): Explicit concurrent mark sweep GC freed 6697(321KB) AllocSpace objects, 1(12KB) LOS objects, 40% free, 16MB/27MB, paused 567us total 15.819ms
> 11:12:39.413: I/DEBUG(27306): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
> 11:12:39.413: I/DEBUG(27306): Build fingerprint: 'google/hammerhead/hammerhead:5.0/LPX13D/1519572:user/release-keys'
> 11:12:39.413: I/DEBUG(27306): Revision: '11'
> 11:12:39.413: I/DEBUG(27306): ABI: 'arm'
> 11:12:39.414: I/DEBUG(27306): pid: 24711, tid: 24788, name: RenderThread  >>> mobisocial.omlet <<<
> 11:12:39.414: I/DEBUG(27306): signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0
> 11:12:39.439: I/DEBUG(27306):     r0 00000000  r1 00000001  r2 00000044  r3 b6709ab8
> 11:12:39.439: I/DEBUG(27306):     r4 beed7eb8  r5 ffffffff  r6 9bd412a0  r7 00000001
> 11:12:39.439: I/DEBUG(27306):     r8 9bd412ac  r9 9b8efd08  sl b6efa2c5  fp 9b8efdb0
> 11:12:39.439: I/DEBUG(27306):     ip b670b5d4  sp 9b8efcd0  lr b66fe893  pc b6efa0aa  cpsr 000b0030
> 11:12:39.440: I/DEBUG(27306): backtrace:
> 11:12:39.440: I/DEBUG(27306):     #00 pc 000160aa  /system/lib/libc.so (__pthread_cond_pulse(pthread_cond_t*, int)+5)
> 11:12:39.440: I/DEBUG(27306):     #01 pc 0003d88f  /system/lib/libhwui.so
> 11:12:39.440: I/DEBUG(27306):     #02 pc 0003e27b  /system/lib/libhwui.so (android::uirenderer::renderthread::RenderThread::threadLoop()+66)
> 11:12:39.440: I/DEBUG(27306):     #03 pc 000104d5  /system/lib/libutils.so (android::Thread::_threadLoop(void*)+112)
> 11:12:39.440: I/DEBUG(27306):     #04 pc 0005df4d  /system/lib/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+72)
> 11:12:39.440: I/DEBUG(27306):     #05 pc 00010045  /system/lib/libutils.so
> 11:12:39.440: I/DEBUG(27306):     #06 pc 000162e3  /system/lib/libc.so (__pthread_start(void*)+30)
> 11:12:39.440: I/DEBUG(27306):     #07 pc 000142d3  /system/lib/libc.so (__start_thread+6)
Comment 10 T.J. Purtell 2014-11-05 15:24:19 UTC
I think that this EINTR problem is also the cause of some Lollipop issues

Android's mutex implementation can be interrupted by signals (see lock which returns an error code)

https://android.googlesource.com/platform/system/core/+/android-5.0.0_r2/include/utils/Mutex.h

Android's Looper implementation (used by RenderThread) does not check the result of this lock operation, so it proceeds to 

https://android.googlesource.com/platform/system/core/+/android-5.0.0_r2/libutils/Looper.cpp  (see line 229)

Although I originally saw that Qualcomm's driver had this issue where adding new signals to the normal lifetime of Android apps caused unexpected behavior, it now seems that Android itself has this problem.  

The Looper is not new code in Android and it seems to have never expected to be interrupted.  I suspect there are more instances of this across the entire Android code base, so its critical to work around with SA_RESTART for any new signals that Xamarin introduces.
Comment 11 T.J. Purtell 2014-11-05 17:27:26 UTC
Actually, pthread_mutex_lock never returns EINTR apparently, so this proposed explanation is wrong atleast for the exact mechanism. :(  Maybe there is some other android code in the render thread that doesn't handle EINTR well though.
Comment 12 T.J. Purtell 2014-11-05 22:05:23 UTC
Although the specific of pthread_mutex_lock returning EINTR makes no sense according to the POSIX spec.  It does appear that patching all the signal handlers to have the SA_RESTART flag make me able to always launch and debug my full android app without SIGSEGV/SIGABRT/SIGBUS/etc.

I had an old hacky library for patching signal handlers I modified to add the SA_RESTART flag to all registered handlers.

https://bitbucket.org/tpurtell/wrapsigs-restart/src/5e70a38390054a5660ab30522b3e41cee5c31a98/jni/wrapsigs.cpp?at=master

Then I ran the same process of starting our app for first sign in, trying to login, and a few basic events.  Without the signal handler patched to SA_RESTART, I continuously get random crashes.  With it patched, I do not get these random crashes. 

Unfortunately, I don't have a concrete explanation for the part of Android that is getting messed up by the extra signals.  It looks like the render thread does one part of its processing whenever it gets woken up.  See line 275 (https://android.googlesource.com/platform/frameworks/base/+/android-5.0.0_r2/libs/hwui/renderthread/RenderThread.cpp).  I think this means that the EINTR makes it run this code in a different order than it would otherwise.  But the details of what scheduling is going on is hard to understand.  The Looper itself definitely handles EINTR, but because it is precisely scheduling, it looks like it would never receive a spurious wake normally.  Android itself may be depending on the precise ordering of events wrapped around that Looper. 

FYI, I also tried just doing it for SIGINT, but Ithat apparently was not enough to eliminate all of the crashes, so some of the other ones may be important.
Comment 13 Zoltan Varga 2014-11-05 22:10:12 UTC
Thanks for the investigation. Unfortunately, parts of the runtime depends on EINTR to detect interruptions, so implementing this is going to be a bit more complicated.
Comment 14 Zoltan Varga 2014-11-07 14:07:42 UTC
This will hopefully be fixed in xamarin.android 4.20.
Comment 15 Zoltan Varga 2014-11-07 16:06:09 UTC
Fixed.
Comment 16 T.J. Purtell 2014-11-07 19:32:45 UTC
Hooray! Thanks.
Comment 17 Dylan 2014-11-08 05:08:59 UTC
Awesome news! :) Thank you.
Comment 18 Mohit Kheterpal 2014-11-13 10:16:58 UTC
@Peter, Could you please verify this issue and close it ? 

Thanks
Comment 19 T.J. Purtell 2014-11-13 14:10:06 UTC
I still have debugging problems with the Beta channel.  Occasional crashes while starting up.  EINTR returned on some calls.  And the random crashes with libhwui as well.

The log on the other EINTR ticket seemed to shows an error code of -4 (-EINTR).

Version Info

> Xamarin.Android   4.20.0.24 (812ca1351e7d7fecd7708f7831742c32047e5ab3)
> Visual Studio plugin to enable development for Xamarin.Android.

I note that the release notes for 4.20 say this issue is not resolved, so maybe the fix did not make it into the beta?

http://developer.xamarin.com/releases/android/xamarin.android_4/xamarin.android_4.20/


Logs for reference... EINTR on network call
> java.net.UnknownHostException: Unable to resolve host "omlet.me": No address associated with hostname
> 	at java.net.InetAddress.lookupHostByName(InetAddress.java:457)
> 	at java.net.InetAddress.getAllByNameImpl(InetAddress.java:252)
> 	at java.net.InetAddress.getAllByName(InetAddress.java:215)
> 	at com.squareup.okhttp.internal.Dns$1.getAllByName(Dns.java:28)
> 	at com.squareup.okhttp.internal.http.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:216)
> 	at com.squareup.okhttp.internal.http.RouteSelector.next(RouteSelector.java:122)
> 	at com.squareup.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:292)
> 	at com.squareup.okhttp.internal.http.HttpEngine.sendSocketRequest(HttpEngine.java:255)
> 	at com.squareup.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:206)
> 	at com.squareup.okhttp.internal.http.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:351)
> 	at com.squareup.okhttp.internal.http.HttpURLConnectionImpl.connect(HttpURLConnectionImpl.java:90)
> 	at com.squareup.okhttp.internal.http.HttpURLConnectionImpl.getOutputStream(HttpURLConnectionImpl.java:198)
> 	at com.squareup.okhttp.internal.http.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:254)
> Caused by: android.system.GaiException: android_getaddrinfo failed: EAI_NODATA (No address associated with hostname)
> 	at libcore.io.Posix.android_getaddrinfo(Native Method)
> 	at libcore.io.ForwardingOs.android_getaddrinfo(ForwardingOs.java:55)
> 	at java.net.InetAddress.lookupHostByName(InetAddress.java:438)
> 	... 12 more
> Caused by: android.system.ErrnoException: android_getaddrinfo failed: EINTR (Interrupted system call)
> 	... 15 more

Logs for reference... RANDOM crashes in libhwui
> 11-13 14:06:53.101: E/mono-rt(8157): =================================================================
> 11-13 14:06:53.101: E/mono-rt(8157): Got a SIGSEGV while executing native code. This usually indicates
> 11-13 14:06:53.101: E/mono-rt(8157): a fatal error in the mono runtime or one of the native libraries 
> 11-13 14:06:53.101: E/mono-rt(8157): used by your application.
> 11-13 14:06:53.101: E/mono-rt(8157): =================================================================
> 11-13 14:06:53.101: A/libc(8157): Fatal signal 11 (SIGSEGV), code 1, fault addr 0x438 in tid 8268 (RenderThread)
> 11-13 14:06:53.194: E/JHEAD(8157): Can't write back - didn't read all
> 11-13 14:06:53.211: I/DEBUG(5057): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
> 11-13 14:06:53.211: I/DEBUG(5057): Build fingerprint: 'google/hammerhead/hammerhead:5.0/LPX13D/1519572:user/release-keys'
> 11-13 14:06:53.211: I/DEBUG(5057): Revision: '11'
> 11-13 14:06:53.211: I/DEBUG(5057): ABI: 'arm'
> 11-13 14:06:53.211: I/DEBUG(5057): pid: 8157, tid: 8268, name: RenderThread  >>> mobisocial.omlet <<<
> 11-13 14:06:53.211: I/DEBUG(5057): signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x438
> 11-13 14:06:53.227: I/DEBUG(5057):     r0 00000438  r1 000000ec  r2 0000004a  r3 b677cab8
> 11-13 14:06:53.227: I/DEBUG(5057):     r4 00000438  r5 ffffffff  r6 9b9b40c0  r7 00000001
> 11-13 14:06:53.227: I/DEBUG(5057):     r8 9b9b40cc  r9 9b4ffd08  sl b6f6d2c5  fp 9b4ffdb0
> 11-13 14:06:53.227: I/DEBUG(5057):     ip b677e5c4  sp 9b4ffcd8  lr b677188d  pc b6f6ddb6  cpsr 80010030
> 11-13 14:06:53.228: I/DEBUG(5057): backtrace:
> 11-13 14:06:53.228: I/DEBUG(5057):     #00 pc 00016db6  /system/lib/libc.so (pthread_mutex_lock+7)
> 11-13 14:06:53.228: I/DEBUG(5057):     #01 pc 0003d889  /system/lib/libhwui.so
> 11-13 14:06:53.228: I/DEBUG(5057):     #02 pc 0003e27b  /system/lib/libhwui.so (android::uirenderer::renderthread::RenderThread::threadLoop()+66)
> 11-13 14:06:53.228: I/DEBUG(5057):     #03 pc 000104d5  /system/lib/libutils.so (android::Thread::_threadLoop(void*)+112)
> 11-13 14:06:53.228: I/DEBUG(5057):     #04 pc 0005df4d  /system/lib/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+72)
> 11-13 14:06:53.228: I/DEBUG(5057):     #05 pc 00010045  /system/lib/libutils.so
> 11-13 14:06:53.228: I/DEBUG(5057):     #06 pc 000162e3  /system/lib/libc.so (__pthread_start(void*)+30)
> 11-13 14:06:53.228: I/DEBUG(5057):     #07 pc 000142d3  /system/lib/libc.so (__start_thread+6)
> 11-13 14:06:54.440: I/DEBUG(5057): Tombstone written to: /data/tombstones/tombstone_02
> 11-13 14:06:54.448: I/BootReceiver(744): Copying /data/tombstones/tombstone_02 to DropBox (SYSTEM_TOMBSTONE)
Comment 20 Peter Collins 2014-11-13 14:24:50 UTC
The fix attempt from Zoltan did go out as part of the current Beta release (4.20-24), reopening based on comment #19.
Comment 21 T.J. Purtell 2014-11-13 17:43:23 UTC
Release note mention a base mono commit of https://github.com/mono/mono/commits/f28698eaf362e5e0033e19e251dd699e4f09668a  which seems to lack Zoltan's signal fixes...?
Comment 22 PJ 2014-11-13 17:51:38 UTC
Current 4.20.24 Beta release is based on https://github.com/mono/mono/commits/771f43fe6c9ef5670729aacae9b2c2d82b2f5236.
Comment 23 Peter Collins 2014-11-14 18:57:21 UTC
Hey T.J.

I think I've managed to find a repro here, but I'm not certain if the latest fix attempt has resolved the issue. I'd love for you to give it a whirl and see if you're still hitting these crashes. We should hopefully have some more information Monday, please let me know if you have any luck with the builds below:

OSX: https://files.xamarin.com/~pcollins/mono-android-4.20.0-25.pkg
Win: https://files.xamarin.com/~pcollins/Xamarin.VisualStudio_Setup-3.8.141.msi
Comment 24 T.J. Purtell 2014-11-14 21:50:22 UTC
:-( was able to get several crashes.

> Thread started: <Thread Pool> #33
> 11-14 21:16:43.627 E/mono-rt (18590): 
> 11-14 21:16:43.627 E/mono-rt (18590): =================================================================
> 11-14 21:16:43.627 E/mono-rt (18590): Got a SIGSEGV while executing native code. This usually indicates
> 11-14 21:16:43.627 E/mono-rt (18590): a fatal error in the mono runtime or one of the native libraries 
> 11-14 21:16:43.627 E/mono-rt (18590): used by your application.
> 11-14 21:16:43.627 E/mono-rt (18590): =================================================================
> 11-14 21:16:43.627 E/mono-rt (18590): 
> 11-14 21:16:43.627 F/libc    (18590): Fatal signal 7 (SIGBUS), code 1, fault addr 0xb6bc1dc9 in tid 18710 (RenderThread)

I modified my signal wrapper hack to log the signals instead of wrap them with SA_RESTART.  I will attach 3 example logs... they always crash after a batch of signal 31's

I also double checked again with the wrapper on restart and the logging and I didn't see the crashes.
Comment 25 T.J. Purtell 2014-11-14 21:52:19 UTC
Created attachment 8751 [details]
log1
Comment 26 T.J. Purtell 2014-11-14 21:52:38 UTC
Created attachment 8752 [details]
log2
Comment 27 T.J. Purtell 2014-11-14 21:53:00 UTC
Created attachment 8753 [details]
log3
Comment 28 T.J. Purtell 2014-11-15 02:08:19 UTC
According to this signal.h

http://osxr.org/android/source/bionic/libc/kernel/arch-arm/asm/signal.h

SIGUNUSED is 31
SIGUSR2 is 12

which is used as the "abort" signal here

https://github.com/mono/mono/blob/master/mono/metadata/threads.c#L1964

which is registered here

https://github.com/mono/mono/blob/master/mono/utils/mono-threads-posix.c#L362

I guess it purposely is not having SA_RESTART and there is some attempt to propagate a "should really interrupt kernel" flag in the new patches... but as you can see from the trace logs... The SIGUSR2 no interrupt handler is never registered, and it looks like that is supposed to be the no interrupt handler...

log of the registered signal handlers...

> 11-14 21:35:07.295 D/LOG_TAG (23067): wrapped signal 3
> 11-14 21:35:07.295 D/LOG_TAG (23067): wrapped signal 4
> 11-14 21:35:07.295 D/LOG_TAG (23067): wrapped signal 5
> 11-14 21:35:07.295 D/LOG_TAG (23067): wrapped signal 6
> 11-14 21:35:07.295 D/LOG_TAG (23067): wrapped signal 7
> 11-14 21:35:07.295 D/LOG_TAG (23067): wrapped signal 8
> 11-14 21:35:07.295 D/LOG_TAG (23067): wrapped signal 11
> 11-14 21:35:07.295 D/LOG_TAG (23067): wrapped signal 13
> 11-14 21:35:07.295 D/LOG_TAG (23067): wrapped signal 16
> 11-14 21:35:07.295 D/LOG_TAG (23067): wrapped signal 24
> 11-14 21:35:07.295 D/LOG_TAG (23067): wrapped signal 30
> 11-14 21:35:07.295 D/LOG_TAG (23067): wrapped signal 31
> 11-14 21:35:07.295 D/LOG_TAG (23067): wrapped signal 34

I was thinking that mono_thread_info_disable_new_interrupt (true) could have been called in the private android internal setup code, but the SIGUSR handler should still be registered ...

Maybe the build scripts/system is not defining the PLATFORM_ANDROID flag? 

Or there is some override in the private Xamarin.Android code?
Comment 29 T.J. Purtell 2014-11-15 02:20:48 UTC
Since it is getting down to the level of a potential build system issue... 

This is my version info after installing the package provided
> Xamarin.Android   4.20.0.25 (359d2714d8ca13d4fcad5501de3226f3b6d208d7)
> Visual Studio plugin to enable development for Xamarin.Android.

I also check the date/time stamp on the Mono.Android.DebugRuntime-* files and it is consistent with having been updated by the installer.  11/14 5:30pm.

Additionally, before running these tests, I manually uninstall the mono shared runtime and android API 19 apps on the test device (Nexus 5).

So, I think that means it is not likely to be some weird installer issue.
Comment 30 Zoltan Varga 2014-11-15 19:21:24 UTC
Third time the charm, please try this build:
https://files.xamarin.com/~vargaz/mono-android-4.20.0-26.pkg

It seems to fix the crashes with the test case I was using:
https://github.com/garuma/Moyeu
Comment 31 T.J. Purtell 2014-11-15 23:52:02 UTC
Ok... just installed this on my mac, did a superclean, and . . .

Third time is the charm... I could not reliably crash any install so far... and after several initial installs I have not seen an instacrash on start under the debugger.  I didn't run the signal logging code since all my NDK things are on my Windows machine.

Thanks Zoltan :)
Comment 32 Zoltan Varga 2014-11-16 00:26:41 UTC
Thanks for investigating the problem.
Comment 33 Zoltan Varga 2014-11-16 00:27:06 UTC
-> FIXED.
Comment 35 Zoltan Varga 2014-11-16 00:32:23 UTC
It was the SA_RESTART problem, the previous fixes had bugs in them so we were still using signals without the SA_RESTART flag.
Comment 36 Peter Collins 2014-11-17 14:21:24 UTC
I'm also no longer able to reproduce using 4.20-26 and the Moyeu solution, I think this can now be VERIFIED based on my result and Comment #31.