Xamarin.iOS Threadpool randomly halts when app is suspended and doesn't recover when resumed

I've been witnessing this issue for months and smashing my face against a wall trying to debug it for the past few weeks.

The high level side effect is that all TPL and Threadpool based operations may stop functioning after a Xamarin.iOS app is 'suspended'.

In our applications case, since we use Background location, Geofencing, Background fetch and Bluetooth, there are a number of conditions where iOS can suspend our application, wake it to perform an event, and then suspend it again, significantly increasing the likelihood of reproducing this issue.

Bringing the application into the foreground doesn't repair the Threadpool. In this case the main thread happily executes, as do many of the normal iOS threads, but no operations on the Threadpool work (including the vast majority of TPL async/await, Task.Run, Task.Factory.StartNew etc). The only thing that does work is Task.Factory.StartNew(..., TaskCreationOptions.LongRunning).

I've used the following code to monitor the state continuously:

            var tplCheckTimer = NSTimer.CreateRepeatingScheduledTimer(
                TimeSpan.FromMinutes(1),
                _ =>
                {
                    ThreadPool.GetAvailableThreads(out var availableWorkerThreads, out var availablePortThreads);
                    ThreadPool.GetMinThreads(out var minWorkerThreads, out var minPortThreads);
                    ThreadPool.GetMaxThreads(out var maxWorkerThreads, out var maxPortThreads);
                    bool taskRun = Task.Run(() => { }).Wait(TimeSpan.FromSeconds(1));
                    bool taskStartNew = Task.Factory.StartNew(() => { }).Wait(TimeSpan.FromSeconds(1));
                    bool taskStartNewLongRunning = Task.Factory.StartNew(() => { }, TaskCreationOptions.LongRunning).Wait(TimeSpan.FromSeconds(1));
                    ManualResetEventSlim queueUserWorkItemResetEvent = new ManualResetEventSlim();
                    bool queueUserWorkItem = ThreadPool.QueueUserWorkItem(__ => queueUserWorkItemResetEvent.Set());
                    bool queueUserWorkItemWaited = queueUserWorkItemResetEvent.Wait(TimeSpan.FromSeconds(1));
                    ManualResetEventSlim unsafeQueueUserWorkItemResetEvent = new ManualResetEventSlim();
                    bool unsafeQueueUserWorkItem = ThreadPool.UnsafeQueueUserWorkItem(__ => unsafeQueueUserWorkItemResetEvent.Set(), null);
                    bool unsafeQueueUserWorkItemWaited = unsafeQueueUserWorkItemResetEvent.Wait(TimeSpan.FromSeconds(1));
                    Tracer.For(this).Verbose($"UnsafeQueueUserWorkItem: {unsafeQueueUserWorkItem}, UnsafeQueueUserWorkItemWaited: {unsafeQueueUserWorkItemWaited}, QueueUserWorkItem: {queueUserWorkItem}, QueueUserWorkItemWaited: {queueUserWorkItemWaited}, TaskRun: {taskRun}, TaskStartNew: {taskStartNew}, TaskStartNewLongRunning: {taskStartNewLongRunning}, HasShutdownStarted: {Environment.HasShutdownStarted}, WillTerminate: {_terminating}, Worker Threads: {minWorkerThreads}/{availableWorkerThreads}/{maxWorkerThreads}, Port Threads: {minPortThreads}/{availablePortThreads}/{maxPortThreads}");
                });
            NSRunLoop.Main.AddTimer(tplCheckTimer, NSRunLoopMode.Default);


During normal operation, we see log entries like this:

Aug 7 19:12:23 iPhone SafeZone[86676] <Notice>: 2017-08-07 09:12:23.647Z 00.0000000 * 0 Verbose UnsafeQueueUserWorkItem: True, UnsafeQueueUserWorkItemWaited: True, QueueUserWorkItem: True, QueueUserWorkItemWaited: True, TaskRun: True, TaskStartNew: True, TaskStartNewLongRunning: True, HasShutdownStarted: False, WillTerminate: False, Worker Threads: 2/200/200, Port Threads: 2/200/200

Once the issue is reproduced, we see log entries like this (and it NEVER recovers):

Aug 7 21:31:43 iPhone SafeZone[86676] <Notice>: 2017-08-07 11:31:43.654Z 00.0000000 * 0 Verbose UnsafeQueueUserWorkItem: True, UnsafeQueueUserWorkItemWaited: False, QueueUserWorkItem: True, QueueUserWorkItemWaited: False, TaskRun: False, TaskStartNew: False, TaskStartNewLongRunning: True, HasShutdownStarted: False, WillTerminate: False, Worker Threads: 2/200/200, Port Threads: 2/200/200

However, despite this knowledge, I have been completely unable to create a small reproduction program for it.

DUE TO THE SERIOUSNESS OF THIS ISSUE I BELIEVE THIS REQUIRES INVESTIGATION BY SOMEONE WITH INTIMATE KNOWLEDGE OF BOTH THE MONO THREADPOOL AND IOS APP LIFECYCLE.

I have attached the log of a full reproduction of the issue from a device with mono debug level logging of the threadpool and io-threadpool enabled.

=== Visual Studio Enterprise 2017 for Mac ===
Version 7.0.1 (build 24)
Installation UUID: f6d24073-2aea-4b5b-a922-10d46d88aae9
Runtime:

    Mono 5.0.1.1 (2017-02/5077205) (64-bit)
    GTK+ 2.24.23 (Raleigh theme)
    Package version: 500010001

=== NuGet ===
Version: 4.0.0.2323
=== .NET Core ===
Runtime: Not installed
SDK: Not installed
MSBuild SDKs: /Library/Frameworks/Mono.framework/Versions/5.0.1/lib/mono/msbuild/15.0/bin/Sdks
=== Xamarin.Profiler ===
Version: 1.5.4
Location: /Applications/Xamarin Profiler.app/Contents/MacOS/Xamarin Profiler
=== Apple Developer Tools ===
Xcode 8.3.2 (12175)
Build 8E2002
=== Xamarin.Mac ===
Version: 3.4.0.36 (Visual Studio Enterprise)
=== Xamarin.iOS ===
Version: 10.10.0.36 (Visual Studio Enterprise)
Hash: d2270eec
Branch: d15-2
Build date: 2017-05-22 16:30:53-0400
=== Xamarin.Android ===
Not Installed
=== Xamarin Inspector ===
Version: 1.2.2
Hash: b71b035
Branch: d15-1
Build date: Fri, 21 Apr 2017 17:57:12 GMT
=== Build Information ===
Release ID: 700010024
Git revision: 7ab1ca2ced6f584e56b7a0d4d321d00775cd95c9
Build date: 2017-05-19 05:44:51-04
Xamarin addins: 08d17158f3365beee5e60f67999e607cce4b3f93
Build lane: monodevelop-lion-d15-2
=== Operating System ===
Mac OS X 10.12.5
Darwin 16.6.0 Darwin Kernel Version 16.6.0

    Fri Apr 14 16:21:16 PDT 2017
    root:xnu-3789.60.24~6/RELEASE_X86_64 x86_64

Reference: https://bugzilla.xamarin.com/show_bug.cgi?id=58633