System.Data.SqlDataReader returns dataset of previous query after a database timeout happens
Some details from my reproduction case:
On the .NET Framework version 4
There is a possible concurrency issue in the SQL Server ADO.NET implementation that manifests when queries are cancelled or time out on the client, using the SqlCommand.ExecuteReader api.
To reproduce the issue seen in the field:
I run start 3 new timed tasks concurrently every second that run 3 - 5 relatively small queries that complete and return (all using SqlCommand.ExecuteReader), this runs as expected.
Then I add a long running query to the test run, set to execute every 65 seconds but cancel after 60 seconds. It would take longer than 60 seconds for the query to complete so gets cancelled every time (using SqlCommand.Cancel()).
After running for several minutes, suddenly most of the attempts to iterate the SqlDataReader returned error because the expected fields are not present on the returned rows, so when the data layer tries to access them by name, there is an exception.
Adding logging code to print the fields on the row indicate that they are from another query that is being run as part of the test, so one that is either running concurrently or very recently.
Once this problem occurs for one query, it happens very frequently indeed, in fact most queries fail. In the field, even services that were only trying to service 5 or so queries a minute were returning the wrong recordsets back for most queries.
Restarting the process fixes the problem.
1. A new connection, command and reader object are instantiated per query, and are used withing their own 'using' blocks.
2. Default connection pooling for ADO.NET is being used
3. Most connections are to the same DB, there is a seperate connection made to another DB on another server once per task run but this is always completed successfully.
4. The code is mature and in use in production on Windows .NET framework systems without issue, and running the same tests on Windows .NET framework cannot reproduce the problem, so it is unlikely to be an issue across both platforms
Is there any update or workaround for this issue? It is a show stopper for a critical project at my company
I've additionally encountered this bug while writing a fairly routine CRUD api - this seems like a critical bug that needs to be patched asap. The workaround (setting Pooling=false in the connection string) measurably increases the overhead when the application is under high load. More importantly, the bug often only shows up during high load testing and is difficult to track down if the same query is being run many times in parallel and the returned schema is correct but the values are from a different query. Is there a timeline for fixing this?