Bug 40239

Summary: SIGSEGV 27408 repro
Product: Android Reporter: Jonathan Pryor <jonp>
Component: Mono runtime / AOT CompilerAssignee: Rodrigo Kumpera <kumpera>
Status: VERIFIED FIXED    
Severity: normal CC: mono-bugs+monodroid, peter.collins, vargaz
Priority: ---    
Version: unspecified   
Target Milestone: 6.1 (C7)   
Hardware: PC   
OS: Mac OS   
Tags: Is this bug a regression?: ---
Last known good build:

Description Jonathan Pryor 2016-04-08 15:19:08 UTC
Xamarin.Android 6.1.0 (Cycle 7) uses Mono 4.4.0, and the repro from Bug #24708 is now causing a SIGSEGV.

Repro:

> curl -o Test.zip 'https://bugzilla.xamarin.com/attachment.cgi?id=10026'
> unzip Test.zip
> cd Test
> xbuild /t:Install
> xbuild /t:_Run
# -or-
> xbuild /t:_Gdb
...

Let the app run for ~10 minutes, and the app will crash. I have three separate gdb stack traces:

> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 21736]
> 0xb37232f8 in ?? () from /Volumes/Seagate4TB/work/bxc-27408/Test/gdb-symbols/libmonosgen-32bit-2.0.so
> (gdb) bt
> #0  0xb37232f8 in ?? () from /Volumes/Seagate4TB/work/bxc-27408/Test/gdb-symbols/libmonosgen-32bit-2.0.so
> #1  <signal handler called>
> #2  0xb6c866c0 in __memcpy_base () from /Volumes/Seagate4TB/work/bxc-27408/Test/gdb-symbols/libc.so
> #3  0xb6cbb884 in je_arena_ralloc () from /Volumes/Seagate4TB/work/bxc-27408/Test/gdb-symbols/libc.so
> #4  0xb6cc6a10 in je_realloc () from /Volumes/Seagate4TB/work/bxc-27408/Test/gdb-symbols/libc.so
> #5  0xb38cbaa0 in ?? () from /Volumes/Seagate4TB/work/bxc-27408/Test/gdb-symbols/libmonosgen-32bit-2.0.so
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)

> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 23475]
> 0xb37232f8 in mono_create_jump_trampoline (Cannot access memory at address 0x0
> domain=0xb6cedeac <usual>, method=0x1a, add_sync_wrapper=-1667502368)
>     at /Users/builder/data/lanes/1196/65564e92/source/mono/mono/mini/mini-trampolines.c:1436
> 1436	/Users/builder/data/lanes/1196/65564e92/source/mono/mono/mini/mini-trampolines.c: No such file or directory.
> (gdb) bt
> #0  0xb37232f8 in mono_create_jump_trampoline (domain=0xb6cedeac <usual>, method=0x1a, add_sync_wrapper=-1667502368)
>     at /Users/builder/data/lanes/1196/65564e92/source/mono/mono/mini/mini-trampolines.c:1436
> #1  0xa97b1300 in ?? ()
> Cannot access memory at address 0x0

> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 25276]
> 0xb37232f8 in mono_create_jump_trampoline (domain=0x9a4fefdc, domain@entry=<error reading variable: Cannot access memory at address 0xfffffff2>, method=0x12, 
>     method@entry=<error reading variable: Cannot access memory at address 0xfffffff2>, add_sync_wrapper=-1706039560, 
>     add_sync_wrapper@entry=<error reading variable: Cannot access memory at address 0xfffffff2>)
>     at /Users/builder/data/lanes/1196/65564e92/source/mono/mono/mini/mini-trampolines.c:1436
> 1436	/Users/builder/data/lanes/1196/65564e92/source/mono/mono/mini/mini-trampolines.c: No such file or directory.
> (gdb) bt
> #0  0xb37232f8 in mono_create_jump_trampoline (domain=0x9a4fefdc, domain@entry=<error reading variable: Cannot access memory at address 0xfffffff2>, method=0x12, 
>     method@entry=<error reading variable: Cannot access memory at address 0xfffffff2>, add_sync_wrapper=-1706039560, 
>     add_sync_wrapper@entry=<error reading variable: Cannot access memory at address 0xfffffff2>)
>     at /Users/builder/data/lanes/1196/65564e92/source/mono/mono/mini/mini-trampolines.c:1436
> Cannot access memory at address 0xfffffff2
Comment 1 Jonathan Pryor 2016-04-08 15:24:50 UTC
Ideally, once the SIGSEGV crash is fixed we can use the Bug #24708 repro to help track down Bug #40136...

We can be that lucky, right?
Comment 2 Jonathan Pryor 2016-04-08 19:24:07 UTC
Update: PeterC isn't able to repro this on Cycle7, but he *is* able to repro this with monodroid/master 392c71cdc, which uses Mono 4.4, on a Nexus 5.
Comment 3 Zoltan Varga 2016-04-12 00:41:10 UTC
I can reproduce a crash with this build monodroid/master build:
https://wrench.internalx.com/Wrench/ViewLane.aspx?lane_id=1196&host_id=163&revision_id=746758

It takes a lot of time to happen.
Comment 4 Zoltan Varga 2016-04-12 03:09:11 UTC
Some findings:
- The crash seems to happen on Thread 4, which is one of the threads started by the app.
- The stacktrace is the following:
Thread 4 (Thread 30401):
Cannot access memory at address 0x1
#0  mono_create_jump_trampoline (domain=0x1, method=0x9b4ff154, add_sync_wrapper=-1689260648)
    at /Users/builder/data/lanes/1196/6550f72a/source/mono/mono/mini/mini-trampolines.c:1436
#1  0x9c820ac0 in ?? ()
Cannot access memory at address 0x1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

The crash happens because some code seems to jump into the middle of this function:
=> 0xa95172f0 <mono_create_jump_trampoline+308>:	ldr	r0, [sp, #12]
   0xa95172f4 <mono_create_jump_trampoline+312>:	ldr	r1, [sp, #8]
   0xa95172f8 <mono_create_jump_trampoline+316>:	str	r1, [r0, #16]
   0xa95172fc <mono_create_jump_trampoline+320>:	ldr	r0, [sp, #12]

which means the arguments etc. are bogus, so the crash happens at +316. lr is also bogus, so its hard to determine what code jumps here.

There always seems to be a thread doing a gc:

Thread 5 (Thread 30402):
#0  0xb6cd45e4 in syscall () from /Users/vargaz/Projects/40239/Test/gdb-symbols/libc.so
#1  0xb6cd956c in sem_wait () from /Users/vargaz/Projects/40239/Test/gdb-symbols/libc.so
#2  0xa963d740 in mono_os_sem_wait (flags=MONO_SEM_FLAGS_NONE, sem=<optimized out>) at /Users/builder/data/lanes/1196/6550f72a/source/mono/mono/utils/mono-os-semaphore.h:163
#3  sgen_wait_for_suspend_ack (count=2) at /Users/builder/data/lanes/1196/6550f72a/source/mono/mono/metadata/sgen-os-posix.c:188
#4  0xa963d8dc in sgen_thread_handshake (suspend=<optimized out>) at /Users/builder/data/lanes/1196/6550f72a/source/mono/mono/metadata/sgen-os-posix.c:223
---Type <return> to continue, or q <return> to quit---
#5  0xa9649ec8 in sgen_client_stop_world (generation=0) at /Users/builder/data/lanes/1196/6550f72a/source/mono/mono/metadata/sgen-stw.c:233
#6  0xa965c3d0 in sgen_stop_world (generation=0) at /Users/builder/data/lanes/1196/6550f72a/source/mono/mono/sgen/sgen-gc.c:3198
#7  0xa965bc74 in sgen_perform_collection (requested_size=4096, generation_to_collect=0, reason=0xa96f0a03 "Nursery full", wait_to_finish=0)
    at /Users/builder/data/lanes/1196/6550f72a/source/mono/mono/sgen/sgen-gc.c:2218
#8  0xa9650258 in sgen_alloc_obj_nolock (vtable=0xb36b7070, size=32) at /Users/builder/data/lanes/1196/6550f72a/source/mono/mono/sgen/sgen-alloc.c:291
Comment 5 Zoltan Varga 2016-04-12 03:14:46 UTC
mono e3b4f547ec75ceb4113f07ef15428737c304deea could be related, its not in mono 4.4.
Comment 6 Jonathan Pryor 2016-04-12 18:53:09 UTC
> its not in mono 4.4.

That part confuses me, as monodroid/master is using mono-4.4.0-branch/1025cb85, not some mono/master commit.

Whatever is causing the problem is *something* in mono-4.4.0-branch.
Comment 7 Zoltan Varga 2016-04-12 21:25:01 UTC
Its in 4.4, it was a mistake.

We tracked it down, its a problem with that patch.
Comment 9 Zoltan Varga 2016-04-13 20:30:48 UTC
Fixed in mono-extensions 1f8fb48dac9c349ba150e8c55f52e3150f29ca08.
Comment 10 Peter Collins 2016-04-24 18:17:27 UTC
I was no longer able to reproduce this after letting the same test case from before run for a little over 30 minutes using monodroid/master/3e9342611b8d5ba19b256f4fe543abbed29ef79c