Bug 27258

Summary: Two build problems on s390x
Product: [Mono] Class Libraries Reporter: Neale Ferguson <neale>
Component: mscorlibAssignee: Atsushi Eno <atsushi>
Status: RESOLVED FIXED    
Severity: normal CC: masafa, mono-bugs+mono
Priority: ---    
Version: master   
Target Milestone: Untriaged   
Hardware: Other   
OS: Linux   
Tags: Is this bug a regression?: ---
Last known good build:

Description Neale Ferguson 2015-02-20 11:21:25 UTC
There are two build errors being encountered with head as of 2015-02-17:

(1) When I use 3.6.1 to build I get:

MCS     [basic] mscorlib.dll
warning CS2002: Source file
`../../../external/referencesource/mscorlib/system/globalization/bidicatego
ry.cs' specified multiple times
warning CS2002: Source file
`../../../external/referencesource/mscorlib/system/globalization/charunicod
einfo.cs' specified multiple times
warning CS2002: Source file
`../../../external/referencesource/mscorlib/system/globalization/globalizat
ionassembly.cs' specified multiple times
../../../external/referencesource/mscorlib/system/collections/compatiblecom
parer.cs(23,25): warning CS1635: Cannot restore warning `CS0618' because
it was disabled globally
../../../external/referencesource/mscorlib/system/collections/compatiblecom
parer.cs(67,25): warning CS1635: Cannot restore warning `CS0618' because
it was disabled globally
../../../external/referencesource/mscorlib/system/collections/hashtable.cs(
1206,25): warning CS1635: Cannot restore warning `CS0618' because it was
disabled globally
../../../external/referencesource/mscorlib/system/collections/hashtable.cs(
1244,25): warning CS1635: Cannot restore warning `CS0618' because it was
disabled globally
../../../external/referencesource/mscorlib/system/collections/hashtable.cs(
1270,25): warning CS1635: Cannot restore warning `CS0618' because it was
disabled globally
../../../external/referencesource/mscorlib/system/threading/Tasks/TaskSched
uler.cs(50,25): warning CS1635: Cannot restore warning `CS0618' because it
was disabled globally
System/TimeZoneInfo.cs(1414,30): warning CS0219: The variable
`ambiguousStartModified' is assigned but its value is never used
System/TimeZoneInfo.cs(1415,30): warning CS0219: The variable
`ambiguousEndModified' is assigned but its value is never used
../../../external/referencesource/mscorlib/system/text/asciiencoding.cs(502
,21): error CS0165: Use of unassigned local variable `ch'
../../../external/referencesource/mscorlib/system/text/asciiencoding.cs(683
,21): error CS0165: Use of unassigned local variable `ch'
../../../external/referencesource/mscorlib/system/text/unicodeencoding.cs(4
93,21): error CS0165: Use of unassigned local variable `ch'
../../../external/referencesource/mscorlib/system/text/unicodeencoding.cs(4
93,21): error CS0165: Use of unassigned local variable `ch'
../../../external/referencesource/mscorlib/system/text/unicodeencoding.cs(7
77,21): error CS0165: Use of unassigned local variable `ch'
../../../external/referencesource/mscorlib/system/text/unicodeencoding.cs(7
77,21): error CS0165: Use of unassigned local variable `ch'
../../../external/referencesource/mscorlib/system/threading/Tasks/Parallel.
cs(125,21): warning CS0472: The result of comparing value type
`System.Threading.CancellationToken' with null is always `false'
../../../external/referencesource/mscorlib/system/threading/Tasks/Parallel.
cs(126,21): warning CS0162: Unreachable code detected
Compilation failed: 6 error(s), 10 warnings

The code in question looks like this:

            // Go ahead and do it, including the fallback.
            char ch;
            while ((ch = (fallbackBuffer == null) ? '\0' : fallbackBuffer.InternalGetNextChar()) != 0 ||
                    chars < charEnd)
            {

                // First unwind any fallback
                if (ch == 0)
                {

The complaint is for the use of ch in the "if" statement.

I tried initializing the variable ch but that results in a different
error being reported.

(2) When I use monolite-latest things go bizarre and appear endian-related:

/home/neale/Mono/mono/mcs/class/lib/monolite/basic.exe:
/home/neale/Mono/mono/mcs/class/lib/monolite/basic.exe: cannot execute
binary file
make[6]: *** [build/deps/basic-profile-check.exe] Error 126
*** The compiler '/home/neale/Mono/mono/mcs/class/lib/monolite/basic.exe'
doesn't appear to be usable.
*** Trying the 'monolite' directory.
.............miudlc/moom/nabis-crpfoli-ehcce.ksc1(0,:)e rrroC 1S25:5U
enpxceet dysbmlo` suinSy'
'''''''''''''miudlc/moom/nabis-crpfoli-ehcce.ksc2(,535:)e rrroC 1S10:0N
weileni  nocsnattn
nnnnnnnnnnnnnmiudlc/moom/nabis-crpfoli-ehcce.ksc2(,632:)e rrroC 1S10:0N
weileni  nocsnattn
nnnnnnnnnnnnnmiudlc/moom/nabis-crpfoli-ehcce.ksc3(,9)4 :reor rSC5098
:nIetnrlac moipel rreor rudirgnp raisgnyStsmeF.roamEtcxpeitno :nUnkwo
nhcra
a  tyStsmeD.uolb.eaPsr eS(syet.mtSirgns  ,uNbmretSlyses ytel
,FIroamPtorived rrpvodire )0[0x0000 ]ni< ifelanemu knonnw:> 0
a  toMonC.hSra.poTekinez.rdaujtsr_ae lT(pyCedo e,tL cotaoi nol)c[
x00000]0i  nf<linema enunkwo>n0:
   taM no.oSCahprT.konezirei._sunbmre( nI3t 2,cB ooelnad toeLda )0[0x0000
]ni< ifelanemu knonnw:> 0
a  toMonC.hSra.poTekinez.rtxkone(  )0[0x0000 ]ni< ifelanemu knonnw:> 0
a  toMonC.hSra.poTekinez.rotek n)([ x00000]0i  nf<linema enunkwo>n0:
   taM no.oSCahprC.hSraPpraes.ryyapsr ey(Iypntuy Lyxe )0[0x0000 ]ni<
ifelanemu knonnw:> 0
a  toMonC.hSra.pSCahpraPsrrep.raes(  )0[0x0000 ]ni< ifelanemu knonnw:> 0
Copmlitaoi naflide : 4reor(r)s , 0awnrings

Neale
Comment 1 Marek Safar 2015-02-23 05:33:12 UTC
Issue #1: It looks like the latest reference sources code requires bug fixes not available in mono 3.6. Bumped the bootstrap to 3.10.

Issue #2: This looks like related to source code import eno did. Microsoft code does not play well with big endian? eno could you have at that?
Comment 2 Atsushi Eno 2015-02-23 07:25:57 UTC
Should be fixed in master [45d731a] but I don't have s390 or any bigendian target device. So please try by yourself.
Comment 3 Neale Ferguson 2015-02-23 11:24:42 UTC
Looks much better. Gets through a great deal of the build but now craps out at a different place. Not sure if it has any relation to the endianess issue or something new:

MCS     [net_4_5] Microsoft.Build.Framework.dll
Microsoft.Build.Framework/IBuildEngine4.cs(10,43): error CS0589: Internal compiler error during parsingSystem.NullReferenceException: Object reference not set to an instance of an object
  at System.Globalization.CharUnicodeInfo.InternalGetCategoryValue (Int32 ch, Int32 offset) [0x00000] in <filename unknown>:0 
  at System.Globalization.CharUnicodeInfo.InternalGetUnicodeCategory (Int32 ch) [0x00000] in <filename unknown>:0 
  at System.Globalization.CharUnicodeInfo.GetUnicodeCategory (Char ch) [0x00000] in <filename unknown>:0 
  at System.Char.IsLetter (Char c) [0x00000] in <filename unknown>:0 
  at Mono.CSharp.Tokenizer.is_identifier_start_character (Int32 c) [0x00000] in <filename unknown>:0 
  at Mono.CSharp.Tokenizer.xtoken () [0x00000] in <filename unknown>:0 
  at Mono.CSharp.Tokenizer.token () [0x00000] in <filename unknown>:0 
  at Mono.CSharp.CSharpParser.yyparse (yyInput yyLex) [0x00000] in <filename unknown>:0 
  at Mono.CSharp.CSharpParser.parse () [0x00000] in <filename unknown>:0 
Compilation failed: 1 error(s), 0 warnings
make[8]: *** [../../class/lib/net_4_5/Microsoft.Build.Framework.dll] Error 1
Comment 4 Atsushi Eno 2015-02-23 12:46:32 UTC
Microsoft's CharUnicodeInfo likely has big-endian issue too. This #if WIN64 smells.
https://github.com/mono/referencesource/blob/mono/mscorlib/system/globalization/charunicodeinfo.cs#L264

Unlike previous fix, there is no simple code switch that Microsoft was already aware, so we need someone who has big endian (and 64bit) device to fix this and verify if it works...
Comment 5 Neale Ferguson 2015-02-23 13:10:11 UTC
I have both and if you can provide guidance I'll attempt a fix and test.
Comment 6 Atsushi Eno 2015-02-23 13:32:59 UTC
That'd be great if you can try something!

For the previous fix, there were some code in the corresponding classes (UTF8Encoding and UnicodeEncoding) in Microsoft referencesource:
https://github.com/mono/referencesource/commit/a302cc81f9f4387f39f685f386bb3818eb2e7eb3

They use compilation definition (#if BIGENDIAN) to provide different code paths for each platform. We don't do that but use BitConveter.IsLittleEndian as the run-time switch.

The problematic type, CharUnicodeInfo, has several pointer-based operations that could bring in endianness issues.
The problematic method this time is InternalGetCategoryValue():
https://github.com/mono/referencesource/blob/mono/mscorlib/system/globalization/charunicodeinfo.cs#L450

it retrieves ushort value directly from the pointer buffer. The data buffer is from some generated resource on little endian environment, so it likely needs to be swapped for big endian environment.

Once you make changes under external/referencesource, build mcs/class/corlib and then try mcs/class/Microsoft.Build.Framework. If the crash above doesn't happen, then it's likely fixed (either the build passes or you get other kind of errors).
Comment 7 Neale Ferguson 2015-02-23 14:42:08 UTC
Should these lines have been changed as well in unicodeencoding or is having it encoded as little endian but then dealing with its use at runtime the way to do things?

                                    // If they happen to be high/low/high/low, we may as well continue.  Check the next
                                    // bit to see if its set (low) or not (high) in the right pattern
#if BIGENDIAN
                                    if (((0xfc00fc00fc00fc00 & *longChars) ^ 0xd800dc00d800dc00) != 0)
#else
                                    if (((0xfc00fc00fc00fc00 & *longChars) ^ 0xdc00d800dc00d800) != 0)
#endif

Around line 2073.
Comment 8 Atsushi Eno 2015-02-23 23:57:48 UTC
Yes, thanks, fixed them too. [mono master 9f5a54b] [referencesource mono 839c219]
Comment 9 Neale Ferguson 2015-02-24 12:52:12 UTC
I'm not sure if the problem is endian-related or not. The crash occurs when referencing the s_pCategoryLevel1Index array:

        ushort index = s_pCategoryLevel1Index[ch >> 8];

So I placed a couple of display statements in the initialization method for this array:

        unsafe static bool InitTable() {

            // Go to native side and get pointer to the native table
            byte * pDataTable = GlobalizationAssembly.GetGlobalizationResourceBytePtr(typeof(CharUnicodeInfo).Assembly, UNICODE_INFO_FILE_NAME);

            UnicodeDataHeader* mainHeader = (UnicodeDataHeader*)pDataTable;

            // Set up the native pointer to different part of the tables.
            s_pCategoryLevel1Index = (ushort*) (pDataTable + mainHeader->OffsetToCategoriesIndex);
Console.WriteLine("{0}",((UIntPtr)s_pCategoryLevel1Index));
Console.WriteLine("{0}",s_pCategoryLevel1Index[0]);
            s_pCategoriesValue = (byte*) (pDataTable + mainHeader->OffsetToCategoriesValue);
            s_pNumericLevel1Index = (ushort*) (pDataTable + mainHeader->OffsetToNumbericIndex);
            s_pNumericValues = (byte*) (pDataTable + mainHeader->OffsetToNumbericValue);
            s_pDigitValues = (DigitValues*) (pDataTable + mainHeader->OffsetToDigitValue);

            return true;
        }

When run it I get:

1037262340
Microsoft.Build.Framework/IBuildEngine4.cs(10,43): error CS0589: Internal compiler error during parsingSystem.TypeInitializationException: An exception was thrown by the type initializer for System.Globalization.CharUnicodeInfo ---> System.NullReferenceException: Object reference not set to an instance of an object
Comment 10 Neale Ferguson 2015-02-24 15:16:19 UTC
I have a feeling it's an endian issue with mcs/class/corlib/resources/charinfo.nlp. This file gets read in and the offset values contained within it are used to set the values in the InitTable method. Displaying the contents shows the mainHeader->OffsetToCategoriesIndex is 0x3C000000. I assume it should be 0x0000003c.
Comment 11 Atsushi Eno 2015-02-25 02:24:13 UTC
Yes, charinfo.nlp is little endian based. It is generated within coreclr which is compatible with referencesource that we have been importing, and the format cannot be changed. The code that retrieves or uses the resulting array needs to reorder them.
Comment 12 Neale Ferguson 2015-02-25 09:11:14 UTC
Other than those offset values are there other fields that will require byte-swapping? What is the preferred method for doing the swaps?
Comment 13 Neale Ferguson 2015-02-25 15:42:32 UTC
This appears to solve the problem (I get a clean build at least). I'm sure it's suboptimal but it illustrates what needs doing:

--- a/mscorlib/system/globalization/charunicodeinfo.cs
+++ b/mscorlib/system/globalization/charunicodeinfo.cs
@@ -109,6 +109,54 @@ namespace System.Globalization {
             internal sbyte digit;
         }
 
+		unsafe private static int EndianSwap(int value)
+		{
+			if (!BitConverter.IsLittleEndian) {
+				byte *ptr = (byte *) &value;
+				int res;
+				byte *buf = (byte *) &res;
+				int t = sizeof(int) - 1;
+
+				for (int i = 0; i < sizeof(int); i++) 
+					buf[t-i] = ptr[i];
+				
+				return(res);
+			} else
+				return(value);
+		}
+
+		unsafe private static uint EndianSwap(uint value)
+		{
+			if (!BitConverter.IsLittleEndian) {
+				byte *ptr = (byte *) &value;
+				uint res;
+				byte *buf = (byte *) &res;
+				uint t = sizeof(uint) - 1;
+
+				for (uint i = 0; i < sizeof(uint); i++) 
+					buf[t-i] = ptr[i];
+				
+				return(res);
+			} else
+				return(value);
+		}
+
+		unsafe private static ushort EndianSwap(ushort value)
+		{
+			if (!BitConverter.IsLittleEndian) {
+				byte *ptr = (byte *) &value;
+				ushort res;
+				byte *buf = (byte *) &res;
+				ushort t = sizeof(ushort) - 1;
+
+				for (ushort i = 0; i < sizeof(ushort); i++) 
+					buf[t-i] = ptr[i];
+				
+				return(res);
+			} else
+				return(value);
+		}
+
 
         //We need to allocate the underlying table that provides us with the information that we
         //use.  We allocate this once in the class initializer and then we don't need to worry
@@ -125,11 +173,11 @@ namespace System.Globalization {
             UnicodeDataHeader* mainHeader = (UnicodeDataHeader*)pDataTable;
 
             // Set up the native pointer to different part of the tables.
-            s_pCategoryLevel1Index = (ushort*) (pDataTable + mainHeader->OffsetToCategoriesIndex);
-            s_pCategoriesValue = (byte*) (pDataTable + mainHeader->OffsetToCategoriesValue);
-            s_pNumericLevel1Index = (ushort*) (pDataTable + mainHeader->OffsetToNumbericIndex);
-            s_pNumericValues = (byte*) (pDataTable + mainHeader->OffsetToNumbericValue);
-            s_pDigitValues = (DigitValues*) (pDataTable + mainHeader->OffsetToDigitValue);
+            s_pCategoryLevel1Index = (ushort*) (pDataTable + EndianSwap(mainHeader->OffsetToCategoriesIndex));
+            s_pCategoriesValue = (byte*) (pDataTable + EndianSwap(mainHeader->OffsetToCategoriesValue));
+            s_pNumericLevel1Index = (ushort*) (pDataTable + EndianSwap(mainHeader->OffsetToNumbericIndex));
+            s_pNumericValues = (byte*) (pDataTable + EndianSwap(mainHeader->OffsetToNumbericValue));
+            s_pDigitValues = (DigitValues*) (pDataTable + EndianSwap(mainHeader->OffsetToDigitValue));
 
             return true;
         }
@@ -254,11 +302,11 @@ namespace System.Globalization {
         internal unsafe static double InternalGetNumericValue(int ch) {
             Contract.Assert(ch >= 0 && ch <= 0x10ffff, "ch is not in valid Unicode range.");
             // Get the level 2 item from the highest 12 bit (8 - 19) of ch.
-            ushort index = s_pNumericLevel1Index[ch >> 8];
+            ushort index = EndianSwap(s_pNumericLevel1Index[ch >> 8]);
             // Get the level 2 WORD offset from the 4 - 7 bit of ch.  This provides the base offset of the level 3 table.
             // The offset is referred to an float item in m_pNumericFloatData.
             // Note that & has the lower precedence than addition, so don't forget the parathesis.
-            index = s_pNumericLevel1Index[index + ((ch >> 4) & 0x000f)];
+            index = EndianSwap(s_pNumericLevel1Index[index + ((ch >> 4) & 0x000f)]);
             byte* pBytePtr = (byte*)&(s_pNumericLevel1Index[index]);
             // Get the result from the 0 -3 bit of ch.
 #if WIN64
@@ -448,12 +496,13 @@ namespace System.Globalization {
 
         [System.Security.SecuritySafeCritical]  // auto-generated
         internal unsafe static byte InternalGetCategoryValue(int ch, int offset) {
+
             Contract.Assert(ch >= 0 && ch <= 0x10ffff, "ch is not in valid Unicode range.");
             // Get the level 2 item from the highest 12 bit (8 - 19) of ch.
-            ushort index = s_pCategoryLevel1Index[ch >> 8];
+            ushort index = EndianSwap(s_pCategoryLevel1Index[ch >> 8]);
             // Get the level 2 WORD offset from the 4 - 7 bit of ch.  This provides the base offset of the level 3 table.
             // Note that & has the lower precedence than addition, so don't forget the parathesis.
-            index = s_pCategoryLevel1Index[index + ((ch >> 4) & 0x000f)];
+            index = EndianSwap(s_pCategoryLevel1Index[index + ((ch >> 4) & 0x000f)]);
             byte* pBytePtr = (byte*)&(s_pCategoryLevel1Index[index]);
             // Get the result from the 0 -3 bit of ch.
             byte valueIndex = pBytePtr[(ch & 0x000f)];
Comment 14 Neale Ferguson 2015-02-27 10:46:12 UTC
Has anyone had a chance to review the above fix?
Comment 15 Atsushi Eno 2015-03-03 15:04:17 UTC
Sorry, it dropped from my task list...! Let me verify on x86 and apply the changes tomorrow.
Comment 16 Atsushi Eno 2015-03-04 07:29:58 UTC
Applied the patch in referencesource and bumped submodule in mono master. Thanks!

[referencesource mono 41a6d3d] [mono master 9dc2beb]