Bug 19752 - Suspected regression in mono 3.2.8: OracleClient library throws FormatException "No digits found"
Summary: Suspected regression in mono 3.2.8: OracleClient library throws FormatExcepti...
Status: NEW
Alias: None
Product: Class Libraries
Classification: Mono
Component: System.Data (show other bugs)
Version: 3.2.x
Hardware: PC Linux
: --- normal
Target Milestone: Untriaged
Assignee: Bugzilla
URL:
Depends on:
Blocks:
 
Reported: 2014-05-14 03:15 UTC by Peder Chr. Nørgaard
Modified: 2015-12-04 08:53 UTC (History)
5 users (show)

See Also:
Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
Prototype patch to fix OCI call problems (19.15 KB, patch)
2014-08-29 11:25 UTC, Neale Ferguson
Details | Diff

Description Peder Chr. Nørgaard 2014-05-14 03:15:53 UTC
When testing a medium sized (~300,000 lines) web application using OracleClient for Oracle database access, that is moved from Ubuntu 13.10 to 14.04, thus switching from Mono 2.10.8.1 to Mono 3.2.8, the application has started to produce FormatExceptions with the message "No digits found" randomly and rarely, although often enough to render the application useless for practical purposes.

The exception is thrown in OciDefineHandle.GetValue, from the call of Decimal.Parse:

	tmp = Marshal.PtrToStringAnsi (Value, Size);
	if (tmp != null)
		return Decimal.Parse (String.Copy ((string) tmp), formatProvider);

Investigation has shown that when the exception is thrown the variable tmp is the empty string, so Decimal.Parse is actually right in throwing the exception.  Value looks like a regular pointer, and Size is zero.

As mentioned, the but occurs randomly and unpredictably, and we have not been able to design a small demo application that reproduces the error.

Certain observations, however:

- the bug is observed against both Oracle server versions 9 and 10
- the bug is observed using both Oracle client versions 10 and 11
- the bug is not specific to the Ubuntu version, but the Mono version; it has been observed on earlier Ubuntu versions, with the Mono 3.2.8 hand-installed
- the bug is only observed at columns of type NUMBER, NOT NULL.  In all cases, the table actually contains valid values
- the bug is only observed when we use OracleDataAdapter.Fill, but that may be because we do not use regular Execute very much in the application.
- the bug occurs in production, but it actually occurs more often when we run the application in MonoDevelop using NUnit and a mock web setup.  That has been a big help in finding these observations.
- the bug is occasionally stable, in the sense that it does happen that two runs in MonoDevelop causes the bug to happen the same place. If anything is changed, however - such as coding a try/catch construction to retrieve more information - the bug moves to a different place
- because of the peculiar code in System.Data.Common.DataAdapter.FillTable method, where the retrieval loop try to compensate for an exception from the call of dataTable.LoadDataRow by calling dataReader.GetValues, the exception is actually thrown twice when it occurs - once in the call stack below LoadDataRow and the again in the call stack below GetValues - it is the latter that will be thrown from the OracleDataAdapter.Fill

We have no idea what can cause a bug like this.  It smells of a classical C pointer bug, but our understanding of the OCI library and the C/C# interaction in the System.Data.OracleClient.Oci directory is not deep enough to directly follow up on that theory.

We are very eager to have the problem fixed, however, and will and can assist in getting more information if the team should so wish.

best regards

Peder Chr. Nørgaard, Scandiatransplant, Århus, Denmark
Comment 1 Peder Chr. Nørgaard 2014-05-15 03:01:56 UTC
We have tried a very simple thing: Move the source code of System.Data.OracleClient and System.Data.OracleClient.Oci from 2.10.8.1 to the 3.2.8 source tree - there are a few changes between the two versions, all looking reasonable (and relatively harmless).

No joy, however; the bug remains.  So the bug is not introduced by the small code changes in OracleClient.
Comment 2 Peder Chr. Nørgaard 2014-06-30 03:28:57 UTC
I have a few additional observations that might help pin down this bug.

- running the mono runtime system with --gc=boehm significantly reduces the frequence of the bug. The bug is not gone, it can still be provoked when using NUnit testing, but so far I have actually not observed it in production.  In my eyes, this observation sits well with my theory that the bug is some kind of pointer bug.

- I have tried to install our application on a "debian testing" system, which is currently using mono version 3.0.6 (called 3.0.6+dfsg2-12 in the distro).  I have not seen the bug in this, no matter which garbage collector I am using.   Of course this is only an indicator - the bug may be there, just not manifesting itself for any kind of reason.  But is IS an indicator that the bug may be introduced between 3.0 and 3.2.
Comment 3 Neale Ferguson 2014-08-29 11:25:27 UTC
Created attachment 7872 [details]
Prototype patch to fix OCI call problems

The problem arises from the use of "ref" fields in OCI calls, namely OCIDefineByPos[Ptr] and OCIBindxxxxx. 

In the former the indicator and rlenp objects (used to return a status of the column and the length of the data in the column) are defined as "short". The use of ref passes the address to the unmanaged Oracle client routine. However, if GC were to occur then its possible that those "short" fields will be moved and the address recorded by Oracle and used at a later time when the data is fetched will be incorrect and errors will be experienced by the OracleClient code. 

In the latter the same is true of the indicator object used by OracleParameter.cs. 

To get around this the "ref" objects are replaced by IntPtr objecs and get/set methods created to enable access to the unmanaged values.

The attached patch appears to work but it may not be complete.
Comment 4 Oscar 2015-08-03 14:50:59 UTC
Hi Neale, 
did you upload corrected files to any place?, I ask because 
if correction is already done by you is better than build a new 
file from differences.

Thanks.
Comment 5 tkobalas 2015-09-30 02:52:10 UTC
Hi All, 

After distro upgrading Ubuntu 12.04 to 14.04 i've started to get this exception. Is there any update for patch?

I am saving date as an integer value to a table. Next query is formatting integer value for last position to date. This moment exception is thrown.

System.FormatException: No digits found
  at System.Decimal.stripStyles (System.String s, NumberStyles style, System.Globalization.NumberFormatInfo nfi, System.Int32& decPos, System.Boolean& isNegative, System.Boolean& expFlag, System.Int32& exp, Boolean throwex) [0x00000] in <filename unknown>:0
  at System.Decimal.PerformParse (System.String s, NumberStyles style, IFormatProvider provider, System.Decimal& res, Boolean throwex) [0x00000] in <filename unknown>:0
  at System.Decimal.Parse (System.String s, NumberStyles style, IFormatProvider provider) [0x00000] in <filename unknown>:0
  at System.Decimal.Parse (System.String s, IFormatProvider provider) [0x00000] in <filename unknown>:0
  at System.Data.OracleClient.Oci.OciDefineHandle.GetValue (IFormatProvider formatProvider, System.Data.OracleClient.OracleConnection conn) [0x00000] in <filename unknown>:0
  at System.Data.OracleClient.OracleDataReader.GetValue (Int32 i) [0x00000] in <filename unknown>:0
  at System.Data.OracleClient.OracleDataReader.GetValues (System.Object[] values) [0x00000] in <filename unknown>:0
  at System.Data.Common.DataAdapter.FillTable (System.Data.DataTable dataTable, IDataReader dataReader, Int32 startRecord, Int32 maxRecords, System.Int32& counter) [0x00000] in <filename unknown>:0
  at System.Data.Common.DataAdapter.FillInternal (System.Data.DataSet dataSet, System.String srcTable, IDataReader dataReader, Int32 startRecord, Int32 maxRecords) [0x00000] in <filename unknown>:0
  at System.Data.Common.DataAdapter.Fill (System.Data.DataSet dataSet, System.String srcTable, IDataReader dataReader, Int32 startRecord, Int32 maxRecords) [0x00000] in <filename unknown>:0
  at System.Data.Common.DbDataAdapter.Fill (System.Data.DataSet dataSet, Int32 startRecord, Int32 maxRecords, System.String srcTable, IDbCommand command, CommandBehavior behavior) [0x00000] in <filename unknown>:0
  at System.Data.Common.DbDataAdapter.Fill (System.Data.DataSet dataSet, System.String srcTable) [0x00000] in <filename unknown>:0
  at (wrapper remoting-invoke-with-check) System.Data.Common.DbDataAdapter:Fill (System.Data.DataSet,string)
  at acCryptoLog.DBHelper.ExecuteQueryTimedOut (System.String position, System.Data.Common.DbConnection conn, System.String sql, Int32 timeOut, System.Object[] parameters) [0x00000] in <filename unknown>:0
Comment 6 Remigiusz 2015-12-04 08:53:15 UTC
Hi,

I encountered the same bug. I appears when you invoke the same query in loop. I prepared code where you can observe this issue:


Loop code invokes 1000 query and write digit every 100'th call:

            for (int j = 0; j < 10; j++)
            {
               for (int i = 0; i < 100; i++)
               {
                  OracleTest();
               }
               Console.Write(j);
            }

Table TASKS information:
 * TK_ID is PRIMARY_KEY, NUMBER, NOT NULL 
 * there is row in table TASKS with ID = 538194

Invoking query - load only one column (works the same if you get other columns from table too)

      public static DataTable OracleTest()
      {
         string sql = "SELECT TK_ID FROM TASKS WHERE TK_ID = :TK_ID";
         var cs = "User ID=***;Password=***;Data Source=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=VTS)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=TM_TEST)))";

         var provider = DbProviderFactories.GetFactory("System.Data.OracleClient");
         using (var connection = provider.CreateConnection())
         {
            connection.ConnectionString = cs.ConnectionString;
            connection.Open();
            var cmd = provider.CreateCommand();
            cmd.Connection = connection;
            cmd.CommandText = sql;
            var dp = provider.CreateParameter();
            dp.ParameterName = "TK_ID";
            dp.Value = 538194;
            cmd.Parameters.Add(dp);
            DataTable dt = null;
            for (int i = 0; i < 3; i++)
            {
               try
               {
                  var da = provider.CreateDataAdapter();
                  da.SelectCommand = cmd;
                  dt = new DataTable();
                  da.Fill(dt);
                  return dt;
               }
               catch (Exception)
               {
                  if (i == 0)
                     Console.Write("A");
                  if (i == 1)
                     Console.Write("B");
                  if (i == 2)
                     Console.Write("C");
               }
            }
            throw new InvalidOperationException("Cannot execute");
         }
      }

Result is: "0A1A23A45A6AA7A8AA9" witch means:
* 1000 querries executed
* 9 exceptions ("A" letter) appeared
* every exception occured only 1 time in a row (next invoking the same query again always works correctly)

Hope this helps you to find this bug :-)

Note You need to log in before you can comment on or make changes to this bug.