Bug 19379

Summary: iPhoneOSGameView deadlocks Apple's Instruments is used
Product: iOS Reporter: manuel
Component: Mono runtime / AOT compilerAssignee: Zoltan Varga <vargaz>
Status: CONFIRMED ---    
Severity: normal CC: aemond, alex.soto, ccarlile, chris.hamons, cody.beyer, cyril, ddunkin, dj_technohead, ggirard, gouri.kumari, horn, instriker, kd, kumpera, lexas, matt, mono-bugs+monodevelop, mono-bugs+monotouch, rolf, sebastien, stephan, todd.diehl, udhams
Priority: Normal    
Version: 7.2.3   
Target Milestone: Future Cycle (TBD)   
Hardware: Macintosh   
OS: Mac OS   
Tags: Is this bug a regression?: ---
Last known good build:
Attachments: solution to reproduce the issue

Description manuel 2014-04-29 12:38:22 UTC
Description of Problem:

In our application we use iPhoneOSGameView. It seems, though, that the application has a memory leak that we are not able to find. We tried Apple's Instruments to assess memory, but it crashes the application as soon as we load "EAGLView" from "OpenGL Application" template. Of course this does not allows to use Instruments and leaves a doubt if the leak is ours or monotouch's.
A similar behaviour (or at least it seems so) can be experienced in the attached project

Note: this was initially exposed in 

http://forums.xamarin.com/discussion/15802/iphoneosgameview-and-memory#latest



Steps to reproduce the problem:
1. Compile attached project
2. Execute the app in simulator
3. Link Apple's Instruments
4. Press button "Load"




How often does this happen? 

Always
Comment 1 manuel 2014-04-29 12:44:07 UTC
Created attachment 6676 [details]
solution to reproduce the issue
Comment 2 Sebastien Pouliot 2014-04-29 16:00:29 UTC
It does not fail for me (using master).

Can you give us all the version information, including the software you use [1] and the iOS version (simulator and/or devices) ?

Also what do you mean by "link" in:

> 3. Link Apple's Instruments

and which "instruments" did you select.

I used Leaks (which loads Allocations and Leaks instruments) and attached to the simulator (iOS 7.1) `teste2` process and then clicked Load.


[1] The easiest way to get exact version information is to use the "Xamarin Studio" menu, "About Xamarin Studio" item, "Show Details" button and copy/paste the version informations (you can use the "Copy Information" button).
Comment 3 manuel 2014-04-30 05:27:01 UTC
Those are great news!! 

i forgot to mention that after attaching Instruments to the running process we pressed "Record" and it happens with both templates "Allocations",  "Leaks" and "Zombies"


iOS simulator
Version 7.1 (463.9.41)

=== Xamarin Studio ===

Version 4.2.3 (build 60)
Installation UUID: 5f421f3f-40c4-4caa-8ae3-df7a8094edf4
Runtime:
	Mono 3.2.6 ((no/9b58377)
	GTK+ 2.24.23 theme: Raleigh
	GTK# (2.12.0.0)
	Package version: 302060000

=== Apple Developer Tools ===

Xcode 5.1.1 (5085)
Build 5B1008

=== Xamarin.Mac ===

Xamarin.Mac: Not Installed

=== Xamarin.Android ===

Version: 4.12.3 (Starter Edition)
Android SDK: /Users/nativolabs/Library/Developer/Xamarin/android-sdk-mac_x86
	Supported Android versions:
		2.1   (API level 7)
		2.2   (API level 8)
		2.3   (API level 10)
		3.1   (API level 12)
		4.0   (API level 14)
		4.0.3 (API level 15)
Java SDK: /usr
java version "1.6.0_65"
Java(TM) SE Runtime Environment (build 1.6.0_65-b14-462-11M4609)
Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-462, mixed mode)

=== Xamarin.iOS ===

Version: 7.2.0.2 (Business Edition)
Hash: 58c3efa
Branch: 
Build date: 2014-10-03 18:02:26-0400

=== Build Information ===

Release ID: 402030060
Git revision: 30c4afc300c2a39ec5300851357ce02e49dd217e
Build date: 2014-03-05 22:09:33+0000
Xamarin addins: f8a9589b57c2bfab2ccd73c880e7ad81e3ecf044

=== Operating System ===

Mac OS X 10.9.2
Darwin nativos-mbp.home 13.1.0 Darwin Kernel Version 13.1.0
    Wed Apr  2 23:52:02 PDT 2014
    root:xnu-2422.92.1~2/RELEASE_X86_64 x86_64
Comment 4 manuel 2014-04-30 06:37:32 UTC
We've just noticed a new update was available and we updated Xamarin.iOS


Xamarin.iOS
Version: 7.2.1.42 (Business Edition)
Hash: 773c77c
Branch: 
Build date: 2014-04-18 15:39:16-0400

Now the sample works, which is a big step to move forward. Our app also shows a different behaviour. Now it loads iPhoneOSGameView but it hangs afterwards. In fact, it blocks execution as a consequence of some user action like clicking on a button that triggers a view (iPhoneOSGameView and others).   
It does not block always in same place, though, making it very difficult to find a chain of events that can reproduce this.
Comment 5 Sebastien Pouliot 2014-04-30 08:30:05 UTC
I'm glad to hear Instruments now works correctly. Now we'll need more information or a test case to diagnose this new problem. E.g.

1. does it happen only on the simulator or on devices too ?

2. once hung can you attach `lldb` to the process (the process ID is printed in the application output) and request a backtrace of every thread ?
Comment 6 manuel 2014-04-30 09:39:51 UTC
1. It also happens on devices
2. Sebastien, if possible, can you suggest some tutorial? Never used it before

Another note:
We tried to follow http://docs.xamarin.com/guides/ios/deployment,_testing,_and_metrics/walkthrough_Apples_instrument/
but it was not possible to bind stack to c# neither in the device nor in the simulator.
Comment 7 Sebastien Pouliot 2014-04-30 10:34:32 UTC
When you start an application on the simulator the "Application Output" pad will show something like:

Starting iPhone Retina (4-inch) simulator 7.1
Launching application
Application launched. PID = 37692
…

That tells you the process id (PID) of your application running in the simulator. Now use your application and hang it.


Next, from a terminal window, type:

    lldb -p 37692

That will launch `lldb` and will attach to your process.

Enter the command: 

   bt all

That will show the backtrace of every thread in execution. Attach this output to this bug report.
Comment 9 Sebastien Pouliot 2014-05-01 20:41:43 UTC
At this stage we'll need some help from you to explain the stack traces, i.e. you show be able to map some threads to your code. 

IOW you have a snapshot of the hung application but, unlike I crash, there's no single point that draws attention. Most threads are waiting (a bit expected) but one (or some) of them should not.

Did you notice anything unexpected ?


Other questions...

1. I might have misunderstood your previous comment, i.e.

> Now it loads iPhoneOSGameView but it hangs afterwards.

Do you mean it _only_ hangs in Instruments ? or 

Does it hang _without_ instruments ? (i.e. does updating to 7.2.1 makes it always hang)


2. You're using the sgen GC. Out of curiosity did you try with the Bohem GC ?


3. Did you try the NRC (NewRefCount) option that was added in 7.2.1 ? 

That will solve two common issues related to memory retention (see release notes). Even if it does not solve (all) issue(s) it could help reduce the memory requirement.
Comment 10 manuel 2014-05-02 10:06:04 UTC
1 - yes only with instruments. It is not restricted to iPhoneOSGameView, though. 
2/3 - We tried them all and it happens with all possible checkboxes for GC

The thread that seems to hang is the main one. The only thing that we noticed is that it has a higher probability when UIViewController.BeginInvokeOnMainThread is called.
We continue to do tests and will get back to you as soon as something new appears.
Comment 11 manuel 2014-05-02 10:18:51 UTC
Note

if we had this to debug

			int countPing = 0;

			var t = new Thread (new ThreadStart (delegate {
				while(true){
					Thread.Sleep(5000);
					Debug.WriteLine("Ping = " + countPing++);
				}
			}));

			t.IsBackground = true;

			t.Start ();

it also stops writing to system.log as soon as the interface gets lock, which we are not sure if it is an expected behaviour
Comment 12 manuel 2014-05-02 10:34:30 UTC
note 2

"native threads" do not seem to hang. We use UIImageView with an image panning all the time. When the "application stops" in their screen, these animations continue to animate
Comment 13 manuel 2014-05-02 13:36:24 UTC
note 3

Running in Release mode with the following configuration changed drastically the frequency of "hanging". 

General
SDK Version - Default
Linker behaviour - Don't link
Enable Debugging = false

Advanced
Use LLVM optimising compiler = false
Enable generic value type sharing = true
Use SGen gerational garbage collector = false
Comment 14 Sebastien Pouliot 2014-05-02 13:45:59 UTC
> "native threads" do not seem to hang.

It's hard to say. Some of CoreAnimation work is done out-of-process. It's smoother this way but it also means that your can hang your app and still see some animations.

> Running in Release mode 

In general (90%) you'll want to profile (using Instruments, HeapShot or other tools) in the Release configuration.

The debug configuration adds a lot of code (see the application size) that affect the speed and memory requirement of the application. Those numbers will not reflect real world usage of your application. IOW take Debug numbers with a grain of salt...
Comment 15 manuel 2014-05-02 14:03:44 UTC
"CoreAnimation work is done out-of-process"

ok, didn't know it

"Release mode"

Sebastien as you suggested, something is happening with GC and / or Linker behavior. As soon as we activate them either together or isolated hanging will come to life. This will also happen when debug is active
For now we'll debug memory with the described compiler options. However, something is happening that probably has to be analysed. Any suggestion for deeper testing?

cheers
Comment 16 Rolf Bjarne Kvinge [MSFT] 2014-05-05 11:01:21 UTC
This looks like a bug/race condition in Instruments.

Threads 5, 8, 9, 13 and 14 are all stopped (waiting for a lock) inside injected code from Instruments.
Comment 17 manuel 2014-05-05 11:33:45 UTC
any idea why setting 
 * Linker behaviour to No Link, 
 * GC to Bohem GC 
 * and debug to false

"solves" the issue? Note : confidence level far bellow 100%
For us the most important thing is debug unmanaged memory, which we are able now. We are not using the desired compiler options, though
Comment 18 Sebastien Pouliot 2014-05-05 12:00:33 UTC
@Manuel, it's hard to say. Every of those option will affect timings so they could (individually or in group) avoid a race condition (either in XI or in Instruments).

We'll likely need to get (or find) a test case to duplicate this ourselves to be sure about it.
Comment 19 Rolf Bjarne Kvinge [MSFT] 2014-05-05 12:05:09 UTC
Race conditions are unpredictable, so it might be just that the favourable conditions for the race condition to occur aren't as probable with those settings.

I'll try to see if it's really Instrument's fault or if it's our own, but it may take some time (it's quite difficult to track down race conditions in the first place, and not having the source doesn't make it easier) - I can reproduce this fairly easy in the simulator, so at least that's something.
Comment 20 manuel 2014-05-05 19:54:32 UTC
One note more. We were able to reproduce it with No Link, Bohem GC and no debug. It is simply less likely to happen
If we find something new that seems relevant we keep you informed
cheers
Comment 21 manuel 2014-05-06 19:24:23 UTC
Guys, i'm not sure if it is relevant for this issue, but i forgot that we receive in the console several messages saying

failed to suspend thread xxxx, hopefully it is dead

This is happening for a long time and i simply assumed it as a "natural behaviour". This morning my brain was able to read it again
Comment 22 manuel 2014-06-06 19:25:17 UTC
ping
Comment 23 Rolf Bjarne Kvinge [MSFT] 2014-07-24 10:05:48 UTC
I have still no proof, but it seems the deadlock is related to the GC somehow, in several other stack traces I've seen the GC trying to stop the world when the deadlock occurs.
Comment 24 manuel 2014-09-23 05:33:28 UTC
Rolf

we were just trying instruments (xcode 6.0.1) with our app running and now it seems that we are not able to do nothing. As soon as we attach it we have a black screen
Comment 26 Jon Goldberger [MSFT] 2015-03-11 18:58:07 UTC
Bug 27931 might be a dupe of this.

https://bugzilla.xamarin.com/show_bug.cgi?id=27931
Comment 27 Rolf Bjarne Kvinge [MSFT] 2015-03-24 13:39:59 UTC
*** Bug 27931 has been marked as a duplicate of this bug. ***
Comment 28 Rolf Bjarne Kvinge [MSFT] 2015-03-24 13:44:29 UTC
This clearly looks like a bad interaction between the GC and Instruments: https://gist.github.com/rolfbjarne/c916c5fa25b9d201d298

Rodrigo, can you have a look?

A few notes:
* I've had most luck reproducing with the sample project in bug #27931 (in a Debug|Simulator configuration).
* I can't attach lldb/gdb to the profiled process, the attach times out. The stack trace from above was taken using Activity Monitor's "Sample" feature (double-click process, in the window that show's up there's a "Sample" button in the lower left corner).
Comment 29 Rodrigo Kumpera 2015-03-24 14:01:54 UTC
Yup, nothing to do, restart and try again.

Mono is not POSIX.1 compliant and expects mmap to work in this scenario, which might not happen.
Comment 30 Kostub Deshmukh 2015-08-31 03:02:10 UTC
Any update on this? We can't use Instruments to find memory issues with the app since it consistently deadlocks the app.
Comment 31 Rodrigo Kumpera 2015-08-31 11:30:30 UTC
It's on our roadmap to eventually fix this.
Comment 32 Kostub Deshmukh 2015-08-31 14:55:59 UTC
Can you explain why this is not a priority?

It is impossible to do any memory profiling on a moderately complex Xamarin app, because Instruments deadlocks as soon as you start profiling, and this directly contradicts the documentation on the Xamarin website: 

https://developer.xamarin.com/guides/ios/deployment,_testing,_and_metrics/using_instruments_to_detect_native_leaks_using_markheap/

Given that Xamarin has a fairly complex memory management as it uses both garbage collection and reference counting, it is imperative we have tools to diagnose memory issues.

So please either fix it so that we can use Instruments or provide a work around. If you'd like I can provide you with full stack traces when our app deadlocks with Instruments.
Comment 33 Rodrigo Kumpera 2015-08-31 15:10:51 UTC
Marcos,

Could look into wrapping mmap&friends calls around suspend critical regions?
It would solve this issue until coop lands.
Comment 34 Chase 2015-08-31 17:46:08 UTC
Good to know this is on the roadmap.  We're unable to profile our app because of this issue as well.
Comment 35 manuel 2015-08-31 19:53:01 UTC
Guys

we had this issue for quite some time.Now it seems it is not happening anymore and we just realise a week ago. 

A lot have change in our project since we first reported this issue and the code is much more complex now. Here is a list o topics that may help us all finding some pattern :

1 - OS Version

We did the tests on iOS 8.4.1 (iPhone 6) and iOS 9 beta 4 (iPhone 6+)

2 - Code Reorganization including Application's

We did many changes to code organization with a lot of classes moving between projects and we also changed the name of our main project with iOS Application. 
Despite it does not seem have nothing to do with this topic, we experienced an interesting behaviour. 

As a first step we did reorganize all shared projects between iOS and Android. We use binary serialization to save some user's data and we did some changes to our namespaces without touching in data structures being serialized. 

After this organization we started to have problems deserializing data saved with old code organization in iOS 7 only (minimum target for us). As time was getting short, we decided to move on and after changing the name of the Application, the serialization issue just disappeared.

Given that we suspected that we had issues with unmanaged memory (and we had) we decided to try instruments again and, voilà, it worked!! 

By the way, memory issue was related to photoKit accesses in a thread created by us without any NSAutoReleasePool management. Calls like 

PHAsset asset;

...

asset.LocalIdentifier

were leaking like hell!! :)
Comment 36 Rodrigo Kumpera 2015-09-10 14:56:44 UTC
Fixed in mono/master da47e00abd3509a4a248cfcf8004f0921108238e and mono/4.2 c02307527f76d3f6edb49f6e89b1356287d730d5.
Comment 37 manuel 2015-09-10 16:36:15 UTC
Rodrigo, we all love you! A super virtual cheers!! 

What was the damn threading issue causing this and probably a few more other strange errors?
Comment 38 Rodrigo Kumpera 2015-09-11 09:24:40 UTC
The problem is due to instruments doing library interposition to intercept some low level unix functions. It would then take a lock in the interception code.

Given mono's GC suspend threads at arbitrary point in times, it could suspend the code that holds the instruments lock and then try itself to call those functions, leading to a deadlock.
Comment 39 Kostub Deshmukh 2015-09-14 13:50:11 UTC
This is awesome! Thanks for getting this fixed so quickly. Do you know what version of Xamarin.iOS the bug fix will land in?
Comment 40 Sebastien Pouliot 2015-09-14 13:54:36 UTC
This will be part of our C6 milestone, which should be in XI 9.2 (unless new iOS release requires up to bump the version number earlier).
Comment 41 manuel 2015-10-14 07:23:12 UTC
Hi guys

our app is again in a state that it is impossible to use instruments. We tried 9.2.0.84 (alpha channel) and it did not solve the issue. Is this version supposed to have this solved?
Comment 42 dj_technohead 2015-10-16 17:59:25 UTC
Just like to point out that I am also seeing this problem with Xcode 7.

=== Xamarin Studio ===

Version 5.9.7 (build 22)
Installation UUID: 7809c2ac-2888-46d7-9014-2f80f46b7620
Runtime:
	Mono 4.0.4 ((detached/d481017)
	GTK+ 2.24.23 (Raleigh theme)

	Package version: 400040004

=== Apple Developer Tools ===

Xcode 7.0.1 (8228)
Build 7A1001

=== Xamarin.iOS ===

Version: 9.0.1.29 (Business Edition)
Hash: 1d27ac2
Branch: master
Build date: 2015-09-25 18:08:44-0400

=== Xamarin.Android ===

Version: 5.1.7.12 (Business Edition)
Android SDK: /Users/dennis/Library/Developer/Xamarin/android-sdk-macosx
	Supported Android versions:
		2.3   (API level 10)
		4.0.3 (API level 15)
		4.4   (API level 19)
		5.0   (API level 21)
Java SDK: /usr
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)

=== Build Information ===

Release ID: 509070022
Git revision: 6bd1f169df44ca96addf8a035316c535a4fa46fa
Build date: 2015-09-30 12:30:15-04
Xamarin addins: 1c3e5c0859bdfec0ecd481a57ad6c03bc22f5536

=== Operating System ===

Mac OS X 10.10.5
Darwin Dennis-MacBook-Pro.local 14.5.0 Darwin Kernel Version 14.5.0
    Wed Jul 29 02:26:53 PDT 2015
    root:xnu-2782.40.9~1/RELEASE_X86_64 x86_64
Comment 43 Cyril Cathala 2015-10-28 06:46:16 UTC
Hi,

Same problem here, I've tested stable (Xamarin.iOS 9.0.1.29) and alpha (Xamarin.iOS 9.2.0.84).
We have a release coming really soon and we need to track some memory leaks, this bug is really a blocker.
Comment 44 Rolf Bjarne Kvinge [MSFT] 2015-11-10 08:51:03 UTC
@Rodrigo, I can still reproduce with maccore/master.

Stack traces: https://gist.github.com/rolfbjarne/d8338d760c044c9a4eeb

$ mtouch --version
mtouch 9.3.2.102 (master: eaa5589)
Comment 45 Rodrigo Kumpera 2015-11-10 13:40:33 UTC
You won't always find the a thread owning the lock.

This happens due a kernel limitation on OSX where lock ordering is strictly FIFO, even if the next thread in line is suspended.

There's no way to work around this limitation without changing Instruments internals.
Comment 46 Alexandre Emond 2015-11-13 08:43:11 UTC
We also have this issue with one of our apps, and the mono profiler is no longer working for us, and the Xamarin profiler never worked. So we are blind and it's a real concern for our app.

And we never got this kind of issue with our pure iOS app. 

I guess there must be something that can be done so we can work around it...
Comment 47 GouriKumari 2016-01-04 21:02:36 UTC
Fix for this issue is currently not available and is a work in progress, hence moving it off the milestone.
Comment 48 Matt Jones 2017-04-12 15:48:01 UTC
If I pay for the Xamarin Profiler, am I going to get the same problem? Does it use Instruments?

This is a bit of a nightmare for us too.
Comment 49 Rolf Bjarne Kvinge [MSFT] 2017-04-17 10:03:40 UTC
@Matt Jones, the Xamarin Profiler does not have the same problem, it's completely different from Instruments.
Comment 50 Sebastien Pouliot 2017-07-26 12:55:09 UTC
@Kumpera can you (or did you) file a radar with Apple for this ? [1]

Once done we can close it with UPSTREAM until the time that Apple fix/adjust their code. Thanks!

[1] internal tracker https://trello.com/b/ZXs89x7A/apple-bug-reports-radar
Comment 51 Rodrigo Kumpera 2017-07-30 05:44:11 UTC
It's not a bug on Instruments, why would I file a bug for it?
Comment 52 Manuel de la Peña 2017-09-06 10:03:16 UTC
@Sebastien can you please clarify why we should be filling a rdar?
Comment 53 Alex Soto [MSFT] 2017-10-09 00:50:46 UTC
> It's not a bug on Instruments, why would I file a bug for it?

Is this fixable on our side then?
Comment 54 Alex Soto [MSFT] 2017-10-09 00:51:18 UTC
@Rodrigo ^
Comment 55 Rodrigo Kumpera 2017-10-09 17:16:57 UTC
It's fixable on our side, just not something we decided to do.
Comment 56 Alex Soto [MSFT] 2017-10-09 17:44:13 UTC
(In reply to Rodrigo Kumpera from comment #55)
> It's fixable on our side, just not something we decided to do.

Closing this as NOT_ON_ROADMAP as per comment #55
Comment 57 Alex Soto [MSFT] 2017-10-09 21:10:26 UTC
Reopening this based on internal discussion in #runtime
Comment 58 Chris Hamons 2017-10-09 21:13:49 UTC
Apologies for the spam and confusion everyone.

We all agree that this needs fixing, it just hasn't been scheduled as of yet.