Bug 39924 - System.Data.SqlClient connection pool unusable after network outage
Summary: System.Data.SqlClient connection pool unusable after network outage
Status: NEW
Alias: None
Product: Class Libraries
Classification: Mono
Component: System.Data (show other bugs)
Version: 4.2.0 (C6)
Hardware: PC Linux
: --- normal
Target Milestone: Untriaged
Assignee: Bugzilla
URL:
Depends on:
Blocks:
 
Reported: 2016-03-26 13:09 UTC by Erwin
Modified: 2017-09-15 23:40 UTC (History)
2 users (show)

See Also:
Tags:
Is this bug a regression?: ---
Last known good build:


Attachments

Description Erwin 2016-03-26 13:09:24 UTC
This has been tested on Windows, Mono version 4.2.3, and the bug does not occur on there.

I build with Visual Studio .Net 4.5 framework, then run it under Linux (latest Raspbian JESSIE) with the same mono version.

To reproduce:

Make an application with a continues loop where it would:

- try Open a database connection with a new SqlConnection instance
- retry to connect on network failure
- try Query on the database
- retry to open the database connection on query failure (and close the connection aswel)
- Close the connection

Start it, then keep it running so it has made a few queries to the database and then simulate a network outage (by plugging out your LAN cable). Keep the outage longer than the timeout period set in the connection (30 seconds or so). Just play save and enable the outage for 1 minute. 

The application would try to reconnect the database even when it has no internet connectivity (while the application is still running), and continue with the loop.

After the internet connectivity is back, it creates and exception at the 'query on the database' part of our loop:"System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding." (again: this works fine with Mono on Windows! No exceptions there)

It does not hang on SqlConnection.Open(), it opens the connection, but that connection (again: new instance of SqlConnection) is unable to query the database. This keeps on happening as long as i keep the application running after the network outage. So my conclusion was to use SqlConnection.ClearPool after the above exception. This fixes the problem but is totally not expected behavior (imo).

It seems that the Sql connection pool gets corrupted as far as i see.
Comment 1 evengard 2017-09-15 11:35:39 UTC
Having similar problem but with different cause causing this "Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.". Sometimes my app really starts a VERY long query, and fails with the aforementioned message. After that I get a similar corruption, with messages like
System.IndexOutOfRangeException
Index was outside the bounds of the array.
at System.Data.SqlClient.SqlDataReader.GetOrdinal (System.String name)

(which doesn't make sense, but makes sense if we think about the connection getting somewhat corrupt, but still going back to the pool and failing when it is taken out of the pool for another job)
etc.

But anyway the cause of such behaviour is always 
System.Data.SqlClient.SqlException
Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.

No matter if it is a network error or just server taking too long to fetch the data. Still the connection gets corrupt - the corrupted connection gets reused and fails again, instead it should be probably destroyed instead and recreated by the pool.
Comment 2 evengard 2017-09-15 23:40:00 UTC
Seems like this chunk of code is the culprit here:


try {
// ...
} catch (TdsTimeoutException ex) {
	// If it is a timeout exception there can be many reasons:
	// 1) Network is down/server is down/not reachable
	// 2) Somebody has an exclusive lock on Table/DB
	// In any of these cases, don't close the connection. Let the user do it
	Connection.Tds.Reset ();
	throw SqlException.FromTdsInternalException ((TdsInternalException) ex);
}
// ...

Lines 425-432 in System.Data.SqlClient/SqlCommand.cs (and similar ones).

It is either that Connection.Tds.Reset (); works incorrectly, or that after all we need to close the connection here. Any way, the connection gets unstable from this point here it seems.

Note You need to log in before you can comment on or make changes to this bug.