Bug 33286 - Sockets not properly returning from unmanaged threadpool
Summary: Sockets not properly returning from unmanaged threadpool
Status: NEW
Alias: None
Product: Runtime
Classification: Mono
Component: io-layer (show other bugs)
Version: 4.2.0 (C6)
Hardware: PC Linux
: --- normal
Target Milestone: ---
Assignee: Bugzilla
URL:
Depends on:
Blocks:
 
Reported: 2015-08-20 21:34 UTC by Jim Borden
Modified: 2015-08-27 21:36 UTC (History)
3 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
The sync gateway binary needed to run the unit test (59 bytes, text/plain)
2015-08-20 21:38 UTC, Jim Borden
Details


Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report for Bug 33286 on GitHub or Developer Community if you have new information to add and do not yet see a matching new report.

If the latest results still closely match this report, you can use the original description:

  • Export the original title and description: GitHub Markdown or Developer Community HTML
  • Copy the title and description into the new report. Adjust them to be up-to-date if needed.
  • Add your new information.

In special cases on GitHub you might also want the comments: GitHub Markdown with public comments

Related Links:
Status:
NEW

Description Jim Borden 2015-08-20 21:34:46 UTC
I tagged this as 4.2.0 but this affects as many different versions as I can think of (3.2.8, 3.12.1, 4.0.3-20, 4.2, master are the ones I have tried).  This problem happens ONLY on Linux (both x86/x64, Debian 8 and Ubuntu 15) while Xamarin.iOS, Xamarin.Android, Windows .NET 3.5 / 4.5, OS X .NET 3.5 / 4.5 do not exhibit this behavior.

**BACKGROUND**

I am developing a product which makes use of several concurrent HttpClient objects, especially in unit tests.  The unit test HttpClients have no special setup, but the library HttpClient objects are set up with two layers of message handling (one to process HTTP 401 responses, and the other to retry requests for transient network errors).  For what it's worth I've tried to combine some of these objects into one but it didn't entirely resolve the problem.  A replication in this library consists of several http requests and responses (GET, DELETE, PUT, and POST are used).

**PROBLEM**

HttpClient.SendAsync() never completes after a certain number of requests.  Drilling down further and further this is what I notice.  The call to GetResponseAsync() that eventually gets reached inside Mono never returns.  It seems that the request is getting stuck and never goes out over the wire (according to Wireshark).  After many hours of examination, I finally tracked down the transition between managed and unmanaged at socket_pool_queue.  Once the failing connection goes inside there it never returns.  A successful request should cause a callback to DispatcherCB in SocketAsyncWorker.cs, but this goes silent following the request in question.  Pausing the Mono debugger while the freeze is happening reveals nothing in particular.  All thread pool threads are asleep waiting for work, and the two threads spawned by my program (the main thread and a database i/o thread) are both waiting.  The main thread is waiting for the replication sequence between the client and server to finish (by observing its event callback that gets called every time the replication status changes) and the database i/o thread is waiting for new work (the consumer in a producer consumer model).  I'm not sure if this is relevant but the request that hangs is almost always POST.

**ADDITIONAL INFO**

The unit tests waits for 60 seconds for the replication to finish before declaring it hung and unsuccessful.  However, more often than not after this is finished then the callbacks for the socket operations start firing again.  Everything about this points to a problem in my code, but if I were deadlocking I should be able to see it in the managed stack traces.  I'd be happy to try to run a debugger on the runtime itself, but my limited experience with that has rendered that task a bit daunting.

** SOURCE **

The source for the unit test I described above is located at https://github.com/couchbase/couchbase-lite-net/blob/release/1.1.1/src/Couchbase.Lite.Tests.Shared/ReplicationTest.cs#L529.  To run this test you must first clone the repo https://github.com/couchbase/couchbase-lite-net and open the Couchbase.Lite.Net45.sln solution in the src directory.  create a local-test.properties file by copying and pasting the test.properties file.  Copy the executable I attached to this ticket into the Tools directory (for x64, if you need x86 let me know) and run the sync_gateway script found in the root directory.  After that just run the TestPusher test using either Monodevelop or nunit-console and you will observe that the program hangs after some lines that look like this (on 4.0.3-20)

Replication: NotifyChangeListeners (0/0, state=Running (batch=0, net=1))
    Thread Name: Threadpool worker
    Date Time:   8/21/2015 9:55:56 AM
ReplicationObserver: Couchbase.Lite.ReplicationChangeEventArgs changed: 0 / 0
    Thread Name: Threadpool worker
    Date Time:   8/21/2015 9:55:56 AM
ReplicationObserver: ReplicationFinishedObserver.changed called, but replicator still running, so ignore it
    Thread Name: Threadpool worker
    Date Time:   8/21/2015 9:55:56 AM

The master branch makes it slightly further in the process but still exhibits the same symptoms.
Comment 1 Jim Borden 2015-08-20 21:38:50 UTC
Created attachment 12594 [details]
The sync gateway binary needed to run the unit test
Comment 2 Jim Borden 2015-08-27 21:36:58 UTC
For what it is worth, with the latest (as of August 27, 2015) release of Mono into the apt-get repo (4.2.0.179) this problem doesn't happen as much (if at all) anymore.