Bug 28398

Summary: * Assertion: should not be reached at class.c:6405
Product: [Mono] Runtime Reporter: Rolf Bjarne Kvinge [MSFT] <rolf>
Component: GeneralAssignee: Rolf Bjarne Kvinge [MSFT] <rolf>
Status: RESOLVED FIXED    
Severity: major CC: adrian.murphy, aed, ashley.gazich, brendan.zagaeski, chris.hamons, cody.beyer, gouri.kumari, jon.goldberger, kumpera, miguel, mono-bugs+mono, mono-bugs+runtime, prchol, rolf, tchristeaan
Priority: High    
Version: 3.12.0   
Target Milestone: ---   
Hardware: PC   
OS: Mac OS   
Tags: Is this bug a regression?: ---
Last known good build:

Description Rolf Bjarne Kvinge [MSFT] 2015-03-25 04:40:03 UTC
Test case:
* Checkout and build github.com/xamarin/maccore
* cd maccore/tests
* while true; do git clean -xfd; make -C activation -j10 dependencies-for-tests; done

Let it run until it crashes, it usually takes a couple of minutes on my machine.

Terminal crash output: https://gist.github.com/rolfbjarne/6fe3b8dee78f619ff0e0
Crash report: https://gist.github.com/rolfbjarne/6690f8b7c514e70cecf2

$ mono --version
Mono JIT compiler version 3.12.1 ((detached/b7764aa Fri Mar  6 15:32:47 EST 2015)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
	TLS:           normal
	SIGSEGV:       altstack
	Notification:  kqueue
	Architecture:  x86
	Disabled:      none
	Misc:          softdebug 
	LLVM:          yes(3.6.0svn-mono-(detached/5486eb2)
	GC:            sgen
Comment 1 Rodrigo Kumpera 2015-04-02 12:09:41 UTC
I can't repro that on 4.0 after running it for an hour.
Comment 2 Rolf Bjarne Kvinge [MSFT] 2015-04-29 05:01:49 UTC
I just ran into this again, with 4.0:

> mono --version
Mono JIT compiler version 4.0.0 ((detached/d136b79 Mon Apr 13 14:40:59 EDT 2015)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
	TLS:           normal
	SIGSEGV:       altstack
	Notification:  kqueue
	Architecture:  x86
	Disabled:      none
	Misc:          softdebug 
	LLVM:          yes(3.6.0svn-mono-(detached/a173357)
	GC:            sgen
Comment 3 Rolf Bjarne Kvinge [MSFT] 2015-04-29 05:02:13 UTC
*** Bug 28140 has been marked as a duplicate of this bug. ***
Comment 4 Rolf Bjarne Kvinge [MSFT] 2015-04-29 05:02:57 UTC
I'll see if I can come up with a better test case.
Comment 5 Rolf Bjarne Kvinge [MSFT] 2015-05-11 10:01:15 UTC
The problem is here:

https://github.com/mono/mono/blob/mono-4.0.0-branch/mono/metadata/class.c#L3619

> 		class->interfaces_packed = mono_class_alloc (class, sizeof (MonoClass*) * interface_offsets_count);

This is called twice for the same class, and there doesn't seem to be any locks around (at least) the second time. This means code in other threads may read this array before it's been filled in with the correct values.

A bit of debug spew will show this easily:

1. Apply https://gist.github.com/rolfbjarne/42a290f0d31806e70a71 (I'm using mono-4.0.0-branch)
2. Build & run https://gist.github.com/rolfbjarne/3d1efa519eba2b6f8cf7

Output:

> Mono.Security.Protocol.Tls.Handshake.HandshakeType
> LOG (pid: 54888 tid: 0xa0bd31d4): setup_interface_offsets HandshakeType interface_offsets_count: 3 interfaces_packed: 0x0 START
> LOG (pid: 54888 tid: 0xa0bd31d4): setup_interface_offsets HandshakeType interface_offsets_count: 3 #1: 0x7b980f80 #2: 0x7b981028 #3: 0x7b9814c0 interfaces_packed: 0x7c962758 DONE
> LOG (pid: 54888 tid: 0xa0bd31d4): setup_interface_offsets HandshakeType interface_offsets_count: 3 interfaces_packed: 0x7c962758 START
> LOG (pid: 54888 tid: 0xa0bd31d4): setup_interface_offsets HandshakeType interface_offsets_count: 3 #1: 0x7b980f80 #2: 0x7b981028 #3: 0x7b9814c0 interfaces_packed: 0x7c962778 DONE
Comment 9 Miguel de Icaza [MSFT] 2015-05-13 21:19:11 UTC
Zoltan, can you take a look at this?

Rolf has a good theory of what the race condition is.
Comment 10 Zoltan Varga 2015-05-13 21:39:15 UTC
@rolf: what class does this fail for ? The comment for the function says it needs to be called with the loader lock held, which code path doesn't take it ?
Comment 11 Rodrigo Kumpera 2015-05-14 21:20:47 UTC
It's not a race condition, as I can repro in the same lock.

It's bad state checking.
Comment 12 Rodrigo Kumpera 2015-05-14 21:26:45 UTC
Fixed. Testing it locally.
Comment 13 Rodrigo Kumpera 2015-05-14 22:04:01 UTC
PR made: https://github.com/mono/mono/pull/1809
Comment 14 Rolf Bjarne Kvinge [MSFT] 2015-06-17 02:26:58 UTC
*** Bug 30859 has been marked as a duplicate of this bug. ***
Comment 15 Rolf Bjarne Kvinge [MSFT] 2015-06-17 02:27:49 UTC
Looks like this is still happening as of Mono 4.0.1 (detached/ed1d3ec) - see bug 30859 comment 3.

I'll try to get a new test case.
Comment 16 Rodrigo Kumpera 2015-06-19 18:48:41 UTC
The above PR is master only AFAICT, maybe we should add it to C5/C6.
Comment 17 Rolf Bjarne Kvinge [MSFT] 2015-06-22 03:42:23 UTC
You're right, the fix is not in mono/ed1d3ec (I had actually checked, but doing the same check now shows it's not there, so I must have done something wrong).
Comment 18 GouriKumari 2015-06-25 11:34:44 UTC
@rolf: Is this fix now included with mono in maccore/master?
Comment 19 Rolf Bjarne Kvinge [MSFT] 2015-06-26 07:49:34 UTC
@Gouri: this is not a problem in maccore, it's a problem with the system mono. AFAIK only Mono 4.2 has this fix (Mono 4.0 in particular does not).
Comment 20 Rolf Bjarne Kvinge [MSFT] 2015-08-06 05:51:27 UTC
*** Bug 30487 has been marked as a duplicate of this bug. ***
Comment 21 Rolf Bjarne Kvinge [MSFT] 2015-10-13 05:58:15 UTC
*** Bug 34797 has been marked as a duplicate of this bug. ***
Comment 22 Rolf Bjarne Kvinge [MSFT] 2015-11-11 08:08:27 UTC
*** Bug 35669 has been marked as a duplicate of this bug. ***
Comment 23 Rolf Bjarne Kvinge [MSFT] 2015-11-11 08:08:43 UTC
*** Bug 35768 has been marked as a duplicate of this bug. ***