Bug 33767 - Got a SIGSEGV while garbage collection
Summary: Got a SIGSEGV while garbage collection
Status: REOPENED
Alias: None
Product: Android
Classification: Xamarin
Component: Mono runtime / AOT Compiler (show other bugs)
Version: unspecified
Hardware: PC Windows
: Highest normal
Target Milestone: ---
Assignee: Bugzilla
URL:
Depends on:
Blocks:
 
Reported: 2015-09-08 06:18 UTC by andreypavlov007
Modified: 2017-06-01 13:40 UTC (History)
16 users (show)

See Also:
Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
Logs with the error from LGE Nexus 5 and TCL 6014X (33.50 KB, application/zip)
2015-09-08 06:18 UTC, andreypavlov007
Details
Nexus 4 crash (5.48 KB, text/plain)
2015-11-24 10:19 UTC, Seifer
Details

Description andreypavlov007 2015-09-08 06:18:45 UTC
Created attachment 12808 [details]
Logs with the error from LGE Nexus 5 and TCL 6014X

## Overview

Sometimes the application crashes under heavy use of RAM with:

Fatal signal 11 (SIGSEGV) at 0xdeadbaad (code=1)
Abort message: 'monodroid-glue.c:1033: gc_cleanup_after_java_collection: assertion "!sccs [i]->is_alive" failed'

In our case, it happens when user quickly switch to different fragments and activities many times.

The bug was found on the following devices: Lenovo S820 (API 19), TCL 6014X (API 18) (Alcatel One Touch), LGE Nexus 5 (API 22).
Most likely the problem persists also on other devices.

## Error

Part of the error is below:
[ F/libc ] monodroid-glue.c:1033: gc_cleanup_after_java_collection: assertion "!sccs [i]->is_alive" failed
[ E/mono-rt ] Stacktrace:
[ E/mono-rt ] at <unknown> <0xffffffff>
[ E/mono-rt ] at (wrapper managed-to-native) object.__icall_wrapper_mono_array_new_specific (intptr,int) <IL 0x0002a, 0xffffffff>
[ E/mono-rt ] at System.Collections.Generic.List`1.set_Capacity (int) <IL 0x0002b, 0x0010b>
[ E/mono-rt ] at System.Collections.Generic.List`1.EnsureCapacity (int) <IL 0x00048, 0x00177>
[ E/mono-rt ] at System.Collections.Generic.List`1.Add (T) <IL 0x0001c, 0x00083>
[ E/mono-rt ] at System.Collections.Generic.List`1..ctor (System.Collections.Generic.IEnumerable`1<T>) <IL 0x00086, 0x004cb>
[ E/mono-rt ] at System.Linq.Enumerable.ToList<TSource> (System.Collections.Generic.IEnumerable`1<TSource>) <IL 0x00007, 0x00093>
. . .
[ E/mono-rt ] at System.Threading.Tasks.Task.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem () <IL 0x00002, 0x00047>
[ E/mono-rt ] at System.Threading.ThreadPool.<UnsafeQueueCustomWorkItem>m__0 (object) <IL 0x00006, 0x000ff>
[ E/mono-rt ] at (wrapper runtime-invoke) <Module>.runtime_invoke_void__this___object (object,intptr,intptr,intptr) <IL 0x00062, 0xffffffff>
. . .
=================================================================
Got a SIGSEGV while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries 
used by your application.
=================================================================
[ F/libc ] Fatal signal 11 (SIGSEGV) at 0xdeadbaad (code=1), thread 20862 (.arview.fraisys)
[ I/DEBUG ] *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
[ I/DEBUG ] Build fingerprint: 'TCL/6014X/Miata_3G:4.3/JLS36C/vFCL-0:user/release-keys'
[ I/DEBUG ] Revision: '0'
[ I/DEBUG ] pid: 19916, tid: 20862, name: .arview.fraisys  >>> com.arview.fraisys <<<
[ I/DEBUG ] signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr deadbaad
[ I/DEBUG ] Abort message: 'monodroid-glue.c:1033: gc_cleanup_after_java_collection: assertion "!sccs [i]->is_alive" failed'

## Steps to reproduce

According to the logs the error can caused by different sequences of function calls (in different logs - different sequence). 
The common factor is the intensive use of memory and activation of garbage collection.
In our application, the easiest way to reproduce the bug quickly moving from screen to another screen many times.
Comment 2 Jonathan Pryor 2015-09-08 16:55:04 UTC
@andreypavlov007: Would it be possible to get a repro project or a copy of your app and repro instructions?
Comment 4 David Barrett 2015-10-12 12:32:03 UTC
What's the status on this?  I am currently seeing this bug also and can reliably repro this using our application.
Comment 6 Mark Probst 2015-10-13 13:50:31 UTC
`is_alive` is set to `FALSE` by the bridge and expected to be set by the bridge callback (in monodroid) for objects to be kept alive.  What seems to happen here is that, after linking up the strongly connected component Java objects and running a Java GC, we discover that within a single SCC only some objects are alive, which means that (barring memory corruption or other anything-goes bugs) either they weren't linked up correctly in the first place, or something interfered with the links in the mean time or, much less likely I think, the Java GC is buggy.
Comment 11 John Miller [MSFT] 2015-10-27 14:47:55 UTC
With Ludovic's suggestion, I tried using the Tarjan bridge. 

MONO_GC_PARAMS=bridge-implementation=tarjan

Unfortunately, the app still crashes with the same reproduction steps. However, the logcat shows a possibly new issue: 

[libc] Fatal signal 5 (SIGTRAP), code -6 in tid 27215 (OkHttp Dispatch)`

I attached the full logcat from this.
Comment 13 Ludovic Henry 2015-11-02 16:18:37 UTC
This is both a bug in the user code, and Xamarin.Android. The latter is at fault by failing to warn the user of errors in his code.

*** This bug has been marked as a duplicate of bug 35471 ***
Comment 14 Seifer 2015-11-20 15:06:33 UTC
I get very often crash with the only line of error after upgrade to Xamarin 4:

[libc] Fatal signal 5 (SIGTRAP), code -6 in tid 28636 (OkHttp Dispatch)

Absolutely have no idea what is the reproduce and why it happening.
To be honest, I don't think anything related to OkHttp processing at the moment of the crash (though I'm using it).

The device: Nexus 4
Configuration: Debug
Comment 15 Jonathan Pryor 2015-11-23 17:16:44 UTC
Let's break that message down:

> [libc]

This is the "tag" provided to android.util.Log.

> Fatal signal 5 (SIGTRAP)

What went wrong: SIGTRAP.

What's SIGTRAP?

http://man7.org/linux/man-pages/man7/signal.7.html
http://linux.derkeiler.com/Newsgroups/comp.os.linux.development.apps/2008-10/msg00107.html

> code -6

No idea what the "code" is for.

> in tid 28636 (OkHttp Dispatch)

The ID (28636) and name ("OkHttp Dispatch") of the thread that generated the signal.

Putting it backwards, the "OkHttp Dispatch" thread triggered a SIGTRAP signal, which caused the process to abort.

What doesn't make sense is *why* it's triggering a SIGTRAP signal...

@Seifer: Does this happen if you run your app *without* debugging, or run a Release app?
Comment 16 Seifer 2015-11-24 10:18:25 UTC
@Jonathan, thank you for explanation and response.

I've noticed, that the crash does not happen on the Emulator (though it crashes a lot with weird stack-trace, but this is another story).

It feels, like the same Debug build launched *without* debugging does not crash on the device. But do crash when debugger connected.

Also, it usually crashes on the same screen (but not always).

In the attachment more info about the crash from actual logcat (for some reason, Xamarin Studio does not show this info)

Please, let me know if I can provide more information.
Comment 17 Seifer 2015-11-24 10:19:01 UTC
Created attachment 13972 [details]
Nexus 4 crash
Comment 18 Seifer 2015-11-24 10:20:58 UTC
According to the new log, the app crashed from another thread: (Filter)

But it crashes more often with (Ok HttpDispatch)
Comment 19 alex 2016-01-08 20:32:51 UTC
Reproducing now on latest Xamarin.Android Version: 6.0.0.34
[libc] Fatal signal 5 (???) at 0x00007a26 (code=-6), thread 31493 (OkHttp Dispatch)
Comment 20 alex 2016-01-08 20:33:02 UTC
Reproducing now on latest Xamarin.Android Version: 6.0.0.34
[libc] Fatal signal 5 (???) at 0x00007a26 (code=-6), thread 31493 (OkHttp Dispatch)
Comment 21 Cody Beyer (MSFT) 2016-02-23 02:55:15 UTC
Attached is new sample, which seems to reproduce the issue but does not seem to be a  duplicate of bug 35471
Comment 23 Denny 2017-06-01 13:40:19 UTC
Still got the problem.

Note You need to log in before you can comment on or make changes to this bug.