Bug 15524 - segfault when embedding in java and using java threads
Summary: segfault when embedding in java and using java threads
Alias: None
Product: Runtime
Classification: Mono
Component: Interop ()
Version: unspecified
Hardware: PC Linux
: --- normal
Target Milestone: ---
Assignee: Bugzilla
Depends on:
Reported: 2013-10-20 17:12 UTC by Chris
Modified: 2017-07-08 03:18 UTC (History)
4 users (show)

Is this bug a regression?: ---
Last known good build:

actor.c and actor.cs (3.12 KB, application/x-zip-compressed)
2013-10-20 17:12 UTC, Chris

Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.

Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:

Description Chris 2013-10-20 17:12:31 UTC
Created attachment 5186 [details]
actor.c and actor.cs

I can reproduce this on 2.10 and master.

I'm delaying adding a complete test case until I know it will actually be useful.

The test environment is jruby running Akka.  C function calls are via the jruby ffi. 

I am calling into mono from various threads in java. The application uses Akka and so it's java threads using the fork/join framework where this originally became an issue.  I tried various thread dispatchers in akka, including pinned dispatchers, but that didn't make a different.  I then tried running the code under java thread pools and executors I created myself, and it still failed.  Interestingly enough, the only version that works is when I create a group of threads manually, with one mono object being accessed from each thread.  Actually the other version that works is when there is just one thread/mono object.

Lowering the concurrency makes it take longer to trigger.  It usually takes at least a few thousand calls before I get a segfault.  

I've used several techniques for doing this while trying to narrow down the issue.  I started with calling into C from java and getting a gchandle, then calling methods on the object the gchandle references.  Then I tried not using gchandles, but sticking the mono object into an array in C# so it wouldn't get collected.  Same errors. Then I moved to just calling mono via static methods and letting mono handle the object instantiation.  All of these variations produce the same errors.

So on to the code.  Forgive how messy it is, I've gone through so many iterations trying to track this down that it's pretty bad.

The ReceiveMessage static method in actor.cs is the latest test code I'm using.  The first argument to ReceiveMessage is a string containing the java thread id concatenated with the actor name.  This is how I am pinning java threads to mono object instances.  The calling code is making a call to attach the mono thread on every call.  I've tried being strict about just calling attach once, and being more liberal, doesn't seem to make a difference.

The C method that calls this is on_receive2 in actor.c.

Most often I get a segfault when the Memorystream is reading bytes.  I see the following in the stack trace:
Thread 5 (Thread 0x7fdd3adfc700 (LWP 32405)):
#0  0x00007fddc48cb4b7 in __libc_waitpid (pid=pid@entry=32409, stat_loc=stat_loc@entry=0x7fdd3adf9cec, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:40
#1  0x00007fddabb1667c in mono_handle_native_sigsegv (signal=signal@entry=11, ctx=ctx@entry=0x7fdd3adfa780) at mini-exceptions.c:2377
#2  0x00007fddaba80e87 in mono_sigsegv_signal_handler (_dummy=11, info=0x7fdd3adfa8b0, context=0x7fdd3adfa780) at mini.c:6640
#3  0x00007fddc3c0b167 in call_chained_handler (context=0x7fdd3adfa780, siginfo=0x7fdd3adfa8b0, sig=11, actp=<optimized out>) at /build/buildd/openjdk-7-7u25-2.3.12/build/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:3791
#4  os::Linux::chained_handler (sig=sig@entry=11, siginfo=siginfo@entry=0x7fdd3adfa8b0, context=context@entry=0x7fdd3adfa780) at /build/buildd/openjdk-7-7u25-2.3.12/build/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:3809
#5  0x00007fddc3c0e807 in JVM_handle_linux_signal (sig=11, info=0x7fdd3adfa8b0, ucVoid=0x7fdd3adfa780, abort_if_unrecognized=<optimized out>) at /build/buildd/openjdk-7-7u25-2.3.12/build/openjdk/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:508
#6  <signal handler called>
#7  mono_array_get_byte_length (array=0x7fddb001e550) at icall.c:6115
#8  ves_icall_System_Buffer_BlockCopyInternal (src=0x7fddb001e550, src_offset=<optimized out>, dest=<optimized out>, dest_offset=<optimized out>, count=<optimized out>) at icall.c:6186

However, It think this is just because it's the hot path.  The underlying issue looks like memory getting stomped on somehow.  In some tests I caught this call Buffer.ByteLength (bytes) succeeding, but then a few lines later it segfaults with the above stack trace while calling the same method via protocol buffer decoding, which seems to imply the memory changed during that time?  Other times it fails on Buffer.ByteLength(bytes) at the top of ReceiveMessage, with the same stack trace.

This is the most common top part of the stacktrace, it's what I see 90% of the time:
at <unknown> <0xffffffff>
  at (wrapper managed-to-native) System.Buffer.BlockCopyInternal (System.Array,int,System.Array,int,int) <0xffffffff>
  at System.Buffer.BlockCopy (System.Array,int,System.Array,int,int) <0x0006b>
  at ProtoBuf.Helpers.BlockCopy (byte[],int,byte[],int,int) <0x00023>
  at ProtoBuf.BufferPool.ResizeAndFlushLeft (byte[]&,int,int,int) <0x0006b>
  at ProtoBuf.ProtoReader.Ensure (int,bool) <0x0007b>
  at ProtoBuf.ProtoReader.TryReadUInt32VariantWithoutMoving (bool,uint&) <0x00043>
  at ProtoBuf.ProtoReader.ReadUInt32Variant (bool) <0x0002b>
  at ProtoBuf.ProtoReader.ReadString () <0x00037>
  at (wrapper dynamic-method) com.game_machine.entity_system.generated.Entity.proto_2 (object,ProtoBuf.ProtoReader) <0x00696>
  at ProtoBuf.Serializers.CompiledSerializer.ProtoBuf.Serializers.IProtoSerializer.Read (object,ProtoBuf.ProtoReader) <0x0003f>
  at ProtoBuf.Meta.RuntimeTypeModel.Deserialize (int,object,ProtoBuf.ProtoReader) <0x00150>
  at ProtoBuf.Meta.TypeModel.DeserializeCore (ProtoBuf.ProtoReader,System.Type,object,bool) <0x00064>
  at ProtoBuf.Meta.TypeModel.Deserialize (System.IO.Stream,object,System.Type,ProtoBuf.SerializationContext) <0x0009b>
  at ProtoBuf.Meta.TypeModel.Deserialize (System.IO.Stream,object,System.Type) <0x0001f>
  at ProtoBuf.Serializer.Deserialize<T> (System.IO.Stream) <0x00043>
  at GameMachine.Actor.ByteArrayToEntity (byte[]) <0x00047>
  at GameMachine.TestActor.OnReceive (object) <0x0006b>
  at GameMachine.Actor.ReceiveMessage (string,string,string,byte[]) <0x000a5>
  at (wrapper runtime-invoke) <Module>.runtime_invoke_void_object_object_object_object (object,intptr,intptr,intptr) <0xffffffff>

Native stacktrace:

	/home2/chris/mono_local/lib/libmonoboehm-2.0.so.1(+0xd4557) [0x7fddabb16557]
	/home2/chris/mono_local/lib/libmonoboehm-2.0.so.1(+0x3ee87) [0x7fddaba80e87]
	/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server/libjvm.so(+0x70f167) [0x7fddc3c0b167]
	/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0x177) [0x7fddc3c0e807]
	/lib/x86_64-linux-gnu/libc.so.6(+0x36ff0) [0x7fddc4840ff0]
	/home2/chris/mono_local/lib/libmonoboehm-2.0.so.1(+0x16b464) [0x7fddabbad464]

Right now I'm looking for what's most useful to submit to track this down.  It's going to take some time to extract everything out into a clean test case, so I want to make sure if I do that, it's going to be useful information.

Comment 1 Chris 2013-10-20 17:52:41 UTC
Hmm I think this has something to do with how I am creating the mono byte array in C.

If at the top of ReceiveMessage I have this it fails, just takes a while.

Buffer.ByteLength (bytes);

If I remove the byte[] argument to ReceiveMessage and don't pass the byte array, and have the following at the start of the method, it works.

byte[] b1 = System.Text.Encoding.UTF8.GetBytes ("TEST^&%$#");
Buffer.ByteLength (b1);
Comment 2 Chris 2013-10-20 20:12:15 UTC
No it's not how I'm setting the array  values.  Get the same errors with just a newly created MonoArray with no values set.

Keep failing here at line 6115 in icall.c:

klass = array->obj.vtable->klass;
Comment 3 Rodrigo Kumpera 2014-01-08 21:37:19 UTC
Can you provide a test case that doesn't require a JVM?
Comment 4 Ludovic Henry 2017-07-08 03:18:58 UTC
Can you still reproduce with latest mono? If that's the case, feel free to reopen, and please provide a repro case. Thank you.