Bug 19823

Summary: InvalidOperationException in ServicePoint.RemoveConnectionGroup
Product: [Mono] Class Libraries Reporter: Tom Philpot <tom.philpot>
Component: SystemAssignee: marcos.henrich
Status: RESOLVED FIXED    
Severity: major CC: alekseev.pg, alex, chrisntr, cody.beyer, heavycrag, joe.feinour, lukaslipka, marcos.henrich, martin.potter, miguel, mono-bugs+mono
Priority: High    
Version: 3.4.0   
Target Milestone: Untriaged   
Hardware: Macintosh   
OS: Mac OS   
Tags: Is this bug a regression?: ---
Last known good build:

Description Tom Philpot 2014-05-15 12:17:35 UTC
It's possible for the service point to try and remove a connection group which is not in the list of connections.

[ERROR] FATAL UNHANDLED EXCEPTION: System.InvalidOperationException: Operation is not valid due to the current state of the object
  at System.Net.ServicePoint.RemoveConnectionGroup (System.Net.WebConnectionGroup group) [0x0008a] in /Users/martin.potter/External/mono-source/mcs/class/System/System.Net/ServicePoint.cs:295
  at System.Net.ServicePoint.CheckAvailableForRecycling (System.DateTime& outIdleSince) [0x00115] in /Users/martin.potter/External/mono-source/mcs/class/System/System.Net/ServicePoint.cs:345
  at System.Net.ServicePointManager.RecycleServicePoints () [0x0003b] in /Users/martin.potter/External/mono-source/mcs/class/System/System.Net/ServicePointManager.cs:392
  at System.Net.ServicePointManager.FindServicePoint (System.Uri address, IWebProxy proxy) [0x00028] in /Users/martin.potter/External/mono-source/mcs/class/System/System.Net/ServicePointManager.cs:333
  at System.Net.HttpWebRequest.GetServicePoint () [0x00027] in /Users/martin.potter/External/mono-source/mcs/class/System/System.Net/HttpWebRequest.cs:680
  at System.Net.HttpWebRequest.get_ServicePoint () [0x00000] in /Users/martin.potter/External/mono-source/mcs/class/System/System.Net/HttpWebRequest.cs:591
  at (wrapper remoting-invoke-with-check) System.Net.HttpWebRequest:get_ServicePoint ()


I believe the regression was introduced here: https://github.com/mono/mono/commit/c6a70a9586971d9d2561933fab47e30cca997e76

If I can find a reliable repro, I'll attach a test case here.
Comment 1 Martin Potter 2014-06-09 13:50:12 UTC
This appears to be the result of a race condition between ServicePointManager.RecycleServicePoints() which calls ServicePoint.CheckAvailableForRecycling(out DateTime outIdleSince) and the idle timer firing and calling ServicePoint.CheckAvailableForRecycling(out DateTime outIdleSince).

In ServicePoint.CheckAvailableForRecycling(out DateTime outIdleSince):
Thread 1: lock (this). Assign groupList.
Thread 2: Wait on lock (this)
Thread 1: Exit lock
Thread 2: lock (this). Assign groupList (this will be the same list as Thread 1 stored).
Thread 1 & 2: Generate the list of groups to remove, which will should be the same since they are iterating over the same list of groups
Thread 1: lock (this). Remove the groups in removeList. If all the groups are removed, set groups to null, dispose idleTimer, set idleTimer to null and exit lock.
Thread 2: lock (this). Remove the groups in removeList (same groups as Thread 1 just removed). RemoveConnectionGroup (WebConnectionGroup group) will throw an exception since groups was set to null by Thread 1.
Comment 2 Martin Baulig 2014-06-10 13:22:39 UTC
Many, many thanks for investigating and reporting this :-)

ServicePointManager.RecycleServicePoints() was a left-over from the old times and it's not needed anymore.  Fixed in mono/master commit f462bcc9447abf3f0f2a336cbf5d94756311cf57.
Comment 3 Tom Philpot 2014-07-01 19:47:11 UTC
I just got this crash again with the patch you mentioned above:

07/01/2014 16:25:18 | ERROR | 44 |          OurApp | Unhandled exception: System.InvalidOperationException: Operation is not valid due to the current state of the object
  at System.Net.ServicePoint.RemoveConnectionGroup (System.Net.WebConnectionGroup group) [0x00021] in /Users/tom.philpot/External/mono-source/mcs/class/System/System.Net/ServicePoint.cs:275 
  at System.Net.ServicePoint.CheckAvailableForRecycling (System.DateTime& outIdleSince) [0x0010b] in /Users/tom.philpot/External/mono-source/mcs/class/System/System.Net/ServicePoint.cs:317 
  at System.Net.ServicePoint.IdleTimerCallback (System.Object obj) [0x00000] in /Users/tom.philpot/External/mono-source/mcs/class/System/System.Net/ServicePoint.cs:338 
  at System.Threading.Timer+Scheduler.TimerCB (System.Object o) [0x00007] in /Users/tom.philpot/External/mono-source/mcs/class/corlib/System.Threading/Timer.cs:317 

It looks like the race condition Martin mentioned above still exists, it's just less likely to occur now that there are fewer code paths for checking for recycling service points.
Comment 4 Tom Philpot 2014-07-02 09:10:29 UTC
We've seen this occur if the app is left paused at a breakpoint. Likely when I resumed, two threads raced because the callback timers had expired, causing this error. Granted that's an unlikely scenario in the wild, so if that's the only place we get this error, it's fine, but not ideal, to have the runtime crash if you've left the debugger paused for too long.
Comment 5 Tom Philpot 2014-10-29 02:07:46 UTC
We had a live instance of this crash in our app today:

2014-10-29 03:31:43.1040 4 Error ApplicationUtility Error building request (POST https://auth.logos.com/…). ~WebException: The request timed out
2014-10-29 04:26:06.1025 4 Error WebServiceClient There was an error while signing in: Error building request. Request: POST https://auth.logos.com/…
2014-10-29 04:35:10.6670 4 Info WebServiceClient (1h 9m 59s) Signing in
2014-10-29 05:02:26.9779 12 Error OurApp Unhandled exception: System.InvalidOperationException: Operation is not valid due to the current state of the object
at System.Net.ServicePoint.RemoveConnectionGroup (System.Net.WebConnectionGroup group) [0x00021] in /Users/ccservice/External/mono-source/mcs/class/System/System.Net/ServicePoint.cs:275 
at System.Net.ServicePoint.CheckAvailableForRecycling (System.DateTime& outIdleSince) [0x0010b] in /Users/ccservice/External/mono-source/mcs/class/System/System.Net/ServicePoint.cs:317 
at System.Net.ServicePoint.IdleTimerCallback (System.Object obj) [0x00000] in /Users/ccservice/External/mono-source/mcs/class/System/System.Net/ServicePoint.cs:338 
at System.Threading.Timer+Scheduler.TimerCB (System.Object o) [0x00007] in /Users/ccservice/External/mono-source/mcs/class/corlib/System.Threading/Timer.cs:317

Obviously, the problem that the web service took 1 hour 10 minutes to respond is troublesome, but it verifies that the issue is still exists since ServicePoint.cs hasn't changed meaningfully since https://github.com/mono/mono/commit/f7f1bb819cfd82da83034332549a9cab346ee037#diff-19d89ff61d1fe3649353c6d22d3a7803
Comment 6 alex 2015-01-23 16:14:03 UTC
We're experiencing this bug on Xamarin for iOS.
We have:
Mono 3.12.0 ((detached/a813491)

Xamarin Studio
Version 5.7.1 (build 14)

Xamarin.iOS
Version: 8.6.0.51 (Business Edition)
Hash: dfb682f
Branch: 
Build date: 2015-01-08 13:39:32-0500

Please advise.
Comment 7 Lukas Lipka 2015-01-28 05:09:40 UTC
Can confirm that we are also seeing this issue, only happens on iOS 7.1, 7.1.1 and 7.1.2 so far.

Terminating app due to uncaught exception 'System.InvalidOperationException: Operation is not valid due to the current state of the object', reason: 'System.InvalidOperationException: Operation is not valid due to the current state of the object at System.Net.ServicePoint.RemoveConnectionGroup (System.Net.WebConnectionGroup group) [0x00000] in <filename unknown>:0 at System.Net.ServicePoint.CheckAvailableForRecycling (System.DateTime& outIdleSince) [0x00000] in <filename unknown>:0 at System.Net.ServicePoint.IdleTimerCallback (System.Object obj) [0x00000] in <filename unknown>:0 at System.Threading.Timer+Scheduler.TimerCB (System.Object o) [0x00000] in <filename unknown>:0 '
Comment 8 alex 2015-02-09 12:46:00 UTC
Hi!
Do we have any news on this bug?
Comment 9 Nate Cook 2015-03-11 12:53:32 UTC
We are getting this on iOS as well. (iOS 8.1.3 by the way)

Info: *** Terminating app due to uncaught exception 'System.InvalidOperationException: Operation is not valid due to the current state of the object', reason: 'System.InvalidOperationException: Operation is not valid due to the current state of the object at System.Net.ServicePoint.RemoveConnectionGroup (System.Net.WebConnectionGroup group) [0x00000] in <filename unknown>:0 at System.Net.ServicePoint.CheckAvailableForRecycling (System.DateTime& outIdleSince) [0x00000] in <filename unknown>:0 at System.Net.ServicePoint.IdleTimerCallback (System.Object obj) [0x00000] in <filename unknown>:0 at System.Threading.Timer+Scheduler.TimerCB (System.Object o) [0x00000] in <filename unknown>:0 '
Comment 10 Pavel Alekseev 2015-03-16 10:50:35 UTC
Is there some workarounds? Like try/catch smth?
Comment 11 marcos.henrich 2015-03-18 11:28:00 UTC
Hi Tom, Alex, Lukas, Nate, Pavel

Thank you for reporting this bug.
The pull request for this issue can be found in the link below.
https://github.com/mono/mono/pull/1638

I also included in the pull request description a possible workaround to avoid your app termination.
Comment 12 alex 2015-03-18 11:31:32 UTC
@Marcos, thank you for the update.
I am not sure that subscribing to AppDomain.UnhandledException event stops my app from crashing.
We have that event subscription (to submit errors to Raygun), however, app still crashes.

Would you be kind as to clarify it?
Comment 13 marcos.henrich 2015-03-18 12:50:16 UTC
@alex Using UnhandledException as an workaround was a bad idea.

The crashes should not occur for a period of at least ServicePointManager.MaxServicePointIdleTime which by default is 100 secs.

ServicePointManager.MaxServicePointIdleTime = 3600 * 24;

This workaround gives the app at least one day without crashes.
Comment 14 marcos.henrich 2015-03-18 13:00:52 UTC
ServicePointManager.MaxServicePointIdleTime is in milliseconds, for one day it should be:

ServicePointManager.MaxServicePointIdleTime = 1000 * 3600 * 24;
Comment 15 marcos.henrich 2015-03-24 12:46:22 UTC
Fixed in master e82dc02e27ea872f3f1bd19d8a6c9770ae12716d.
https://github.com/mono/mono/commit/e82dc02e27ea872f3f1bd19d8a6c9770ae12716d
Comment 16 marcos.henrich 2015-03-24 15:22:30 UTC
Fixed in mono-4.0.0-branch ec5f493016153bf2cd88a926c1c6aade48eaa2db.
https://github.com/mono/mono/commit/ec5f493016153bf2cd88a926c1c6aade48eaa2db