Bug 59182 - Crash/SIGSEGV with interop callback with struct parameter from native to managed on Xamarin.Android on armv7
Summary: Crash/SIGSEGV with interop callback with struct parameter from native to mana...
Status: IN_PROGRESS
Alias: None
Product: Runtime
Classification: Mono
Component: Interop (show other bugs)
Version: 5.2 (2017-04)
Hardware: PC Windows
: --- normal
Target Milestone: ---
Assignee: Zoltan Varga
URL:
Depends on:
Blocks:
 
Reported: 2017-09-01 16:24 UTC by floriang
Modified: 2017-10-05 12:49 UTC (History)
5 users (show)

See Also:
Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
Sample to reproduce the crash (163.41 KB, application/zip)
2017-09-01 16:24 UTC, floriang
Details
Sample; including windows sample project that shows this kind of interop works correctly in windows. (196.56 KB, application/zip)
2017-09-04 14:19 UTC, floriang
Details

Description floriang 2017-09-01 16:24:24 UTC
Created attachment 24525 [details]
Sample to reproduce the crash

Hello,

I get a crash with SIGSEGV in a very specific callback case of a native library doing a callback to a managed delegate and I could reproduce it in the attached sample.

There are other simpler cases in which it works even with a structure, but with one specific case it just crashes.

Please read carefully the explanations below.

The sample is very simple:
There is a native library (libcallback.so), for which source code can be found in callback.c.

This library implements three callbacks with different parameters as described below:

"callback": this is the one that crashes, it uses a complex structure with const sized array fields.
"other_callback": this is another example of callback with a structure pointer parameter. This one is a little simpler and it works just fine.
"func": this is a callback with a very simple parameter.

Then the C# code creates 3 button (CALLBACK, OTHER CB and FUNC CB).
Each button click will raise an indirect (asynchrone) call to the callback corresponding to its name.

So when a button is pressed a call to the native library is made.
The native library creates a new thread and the thread makes a callback to the managed code.
When callbacked the managed code then outputs a log message corresponding to the callback that was made.

When using this sample, the "other_callback" and "func" callbacks work fine.
However "callback" will crash (tested in Xamarin.Android 7.2 and 7.4).

Important: this same project WORKS FINE when compiled with Xamarin 6.0, or on Windows (c Code must be adapted to export the functions and not use pthread on windows but if we do this it works).

I know that the callback argument doesn't fully correspond to the C structure, but even though I consider that it should work because:
1) The C structure is larger so even if mono makes an internal copy it should be able to truncate the data it needs without access violation.
2) This project works fine if build using Xamarin.Android 6.0.
3) This project works fine on Windows.

I also noted that if I comment out the last 4 fields of the EST_E_DATA structure in C#, then Callback will be called without crash (see example below).

So questions:
- Why does it crash when built with Xamarin 7.2 and 7.4?
- Why doesn't it crash if I remove the array declarations?
- Why doesn't it crash on Windows or when built with Xamarin 6.0?

Below is the example of removing the arrays declaration for EST_E_DATA structure:
        public struct EST_E_DATA
        {
            public Int16 s;
            public Int16 v;
            public UInt32 p;
            public UInt32 e;
            public Int32 l;
            public Int32 ll;
            public UInt16 h;
            public Int16 r;
            public Int16 pp;
            public Int32 hh;
            public Int32 bn;
            public Int32 dn;
            public Int32 dr;
            public Int32 sh;
            public Int32 ra;
        }

Thank you in advance and best regards.
Comment 1 floriang 2017-09-04 07:47:26 UTC
Sorry I forgot to add the error message. 
It is not really helpful but crash is as follow:

================================================================= 01-01 14:33:35.343: E/mono-rt(4886): Got a SIGSEGV while executing native code. This usually indicates 01-01 14:33:35.343: E/mono-rt(4886): a fatal error in the mono runtime or one of the native libraries 01-01 14:33:35.343: E/mono-rt(4886): used by your application. 01-01 14:33:35.343: E/mono-rt(4886): ================================================================= 01-01 14:33:35.343: A/libc(4886): Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1), thread 4914 (ec.callbacktest)

01-01 14:33:35.443: I/DEBUG(2161): Revision: '405522' 01-01 14:33:35.443: I/DEBUG(2161): pid: 4886, tid: 4914, name: ec.callbacktest >>> fec.callbacktest <<< 01-01 14:33:35.443: I/DEBUG(2161): signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 00000000 01-01 14:33:35.583: I/DEBUG(2161): r0 00000000 r1 00001d50 r2 0000000f r3 00000012 01-01 14:33:35.583: I/DEBUG(2161): r4 76affd08 r5 59b20188 r6 63c884a4 r7 63cbe454 01-01 14:33:35.583: I/DEBUG(2161): r8 63cbaf05 r9 00100000 sl 00000000 fp 76affdd8 01-01 14:33:35.583: I/DEBUG(2161): ip 68925364 sp 76affd08 lr 40cbf084 pc 40cbf088 cpsr 80000010
Comment 2 floriang 2017-09-04 14:19:45 UTC
Created attachment 24551 [details]
Sample; including windows sample project that shows this kind of interop works correctly in windows.
Comment 3 floriang 2017-09-05 13:30:00 UTC
Hello again,

I noted something strange/interesting that may help understanding what is going on:

In the declaration of EST_E_DATA, if the total size of all fields above the array is less than 28 bytes the program won't crash.
Then if there is more than 28 bytes it will crash.

Example of declaration that works:
        public struct EST_E_DATA
        {
            public Int32 s;
            public Int32 v;
            public Int32 p;
            public Int32 e;
            public Int32 l;
            public Int32 ll;
            public Int32 h;

            [MarshalAs(UnmanagedType.ByValArray, SizeConst = 1024)]
            public byte[] echo;
        }

Example of declaration that crashes:
        public struct EST_E_DATA
        {
            public Int32 s;
            public Int32 v;
            public Int32 p;
            public Int32 e;
            public Int32 l;
            public Int32 ll;
            public Int32 h;
            public Int32 hh;

            [MarshalAs(UnmanagedType.ByValArray, SizeConst = 1024)]
            public byte[] echo;
        }

Crazy isn't it???
Comment 4 Ludovic Henry 2017-09-21 23:36:55 UTC
I can reproduce on armv7 simulator with XA 7.5.0.15 which is Mono 5.2 (2017-04/54108f9522e)
Comment 5 Zoltan Varga 2017-09-26 16:14:32 UTC
Couldn't reproduce on linux/arm with a similar testcase seems to be an android only problem.
Comment 6 Zoltan Varga 2017-10-01 23:20:33 UTC
This happens because we emit a call to memset in the method prolog before we attach to the runtime, and the call is done on an unattached thread.
Comment 7 Zoltan Varga 2017-10-02 18:39:41 UTC
https://github.com/mono/mono/pull/5678
Comment 8 Zoltan Varga 2017-10-05 12:17:44 UTC
Should be fixed by mono master 012f8610237dfe44034c9d8daea0b90fac75ca9f. Thanks for the testcase.
Comment 9 floriang 2017-10-05 12:49:09 UTC
You're welcome.
Good to see it's going to be fixed (but too bad for us we already implemented a workaround).

Note You need to log in before you can comment on or make changes to this bug.