Bug 37583

Summary: MAJOR performance decrease between 4.0.5.1 and 4.2.1
Product: [Mono] Class Libraries Reporter: Jeffrey Stedfast <jeff>
Component: mscorlibAssignee: Marek Safar <masafa>
Status: RESOLVED FIXED    
Severity: normal CC: masafa, mono-bugs+mono, mono-bugs+runtime, vargaz
Priority: ---    
Version: 4.2.0 (C6)   
Target Milestone: Untriaged   
Hardware: PC   
OS: Mac OS   
Tags: Is this bug a regression?: ---
Last known good build:
Attachments: mbox-parser.cs

Description Jeffrey Stedfast 2016-01-11 18:57:50 UTC
Created attachment 14539 [details]
mbox-parser.cs

I was testing performance of MimeKit's parser this weekend and I noticed a *major* decrease in performance since the last time I ran my benchmarks a year or so ago.

To narrow down the problem, I reverted my MimeKit checkout to various points between now and 1.0 and noticed that the was very little difference. At this point I was began to suspect Mono, so I downloaded Mono 4.0.5.1 to check the performance of MimeKit's parser using that.

Here are the results (Mac OS X Yosemite 10.10.5):

Mono JIT compiler version 4.0.5 ((detached/1d8d582 Tue Oct 20 15:15:33 EDT 2015)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
	TLS:           normal
	SIGSEGV:       altstack
	Notification:  kqueue
	Architecture:  x86
	Disabled:      none
	Misc:          softdebug 
	LLVM:          yes(3.6.0svn-mono-(detached/a173357)
	GC:            sgen
[fejj@localhost playground]$ mono --llvm mbox-parser.exe larger.mbox 
Parsed 14896 messages in 1.78 seconds.
[fejj@localhost playground]$ mono mbox-parser.exe larger.mbox 
Parsed 14896 messages in 2.37 seconds.



Here are the results for the current stable release of Mono:

Mono JIT compiler version 4.2.1 (explicit/6dd2d0d Fri Nov  6 12:25:19 EST 2015)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
	TLS:           normal
	SIGSEGV:       altstack
	Notification:  kqueue
	Architecture:  x86
	Disabled:      none
	Misc:          softdebug 
	LLVM:          yes(3.6.0svn-mono-(detached/a173357)
	GC:            sgen
[fejj@localhost playground]$ mono --llvm mbox-parser.exe larger.mbox 
Parsed 14896 messages in 5.11 seconds.
[fejj@localhost playground]$ mono mbox-parser.exe larger.mbox 
Parsed 14896 messages in 5.45 seconds.


That's a pretty significant decrease in performance.

Attached you'll find my mbox-parser.cs test program.

To get MimeKit:
git clone https://github.com/jstedfast/MimeKit.git

I can't attach the sample mbox that I was using (because it's over a gigabyte), but you can easily recreate it by concatenating multiple copies of MimeKit/UnitTests/TestData/mbox/jwz.mbox.txt into a single file (approx. 100 times). That should give you a large enough input to rule out statistical noise.
Comment 1 Zoltan Varga 2016-01-11 21:35:01 UTC
Here is what the profiler outputs:

  200089        0          1 (wrapper runtime-invoke) <Module>:runtime_invoke_void_object (object,intptr,intptr,intptr)
  200089       11          1 MboxParser.Program:Main (string[])
  199843        7      12160 MimeKit.MimeParser:ParseMessage (System.Threading.CancellationToken)
  199836       33      12160 MimeKit.MimeParser:ParseMessage (byte*)
  161829       21      11440 MimeKit.MimeParser:ConstructMultipart (MimeKit.Multipart,byte*)
  160000       59      11440 MimeKit.MimeParser:MultipartScanSubparts (MimeKit.Multipart,byte*)
  148425      417    1216769 System.OrdinalComparer:GetHashCode (string)
  148007      423    1216766 System.Globalization.TextInfo:GetHashCodeOrdinalIgnoreCase (string)
  147584      602    1216766 System.Globalization.TextInfo:GetHashCodeOrdinalIgnoreCase (string,bool,long)
  146812      430    1217122 System.Globalization.TextInfo:GetCaseInsensitiveHashCode (string,bool,long)
  146381      452    1217122 System.CultureAwareComparer:GetHashCode (string)
  145928      423    1217122 System.Globalization.CompareInfo:GetHashCodeOfString (string,System.Globalization.CompareOptions)
  145504      596    1217122 System.Globalization.CompareInfo:GetHashCodeOfString (string,System.Globalization.CompareOptions,bool,long)
  144631      445    1217122 System.Globalization.CompareInfo:GetSortKey (string,System.Globalization.CompareOptions)
  144186      421    1217122 System.Globalization.CompareInfo:CreateSortKey (string,System.Globalization.CompareOptions)
  143765      775    1217122 System.Globalization.CompareInfo:CreateSortKeyCore (string,System.Globalization.CompareOptions)
  142538      407    1217122 Mono.Globalization.Unicode.SimpleCollator:GetSortKey (string,System.Globalization.CompareOptions)
  142130     1027    1217122 Mono.Globalization.Unicode.SimpleCollator:GetSortKey (string,int,int,System.Globalization.CompareOptions)
  127447    11392    1217122 Mono.Globalization.Unicode.SimpleCollator:GetSortKey (string,int,int,Mono.Globalization.Unicode.SortKeyBuffer,System.Globalization.CompareOptions)
   98029      361     278400 MimeKit.HeaderList:Add (MimeKit.Header)
   83428      391      21920 MimeKit.MimeMessage:.ctor (MimeKit.ParserOptions,System.Collections.Generic.IEnumerable`1<MimeKit.Header>)
   68671    14597   11582357 Mono.Globalization.Unicode.SimpleCollator:FillSortKeyRaw (int,Mono.Globalization.Unicode.SimpleCollator/ExtenderType,Mono.Globalization.Unicode.SortKeyBuffer,System.Globalization.CompareOptions)
   46289       34      76242 MimeKit.MimeParser:Step (byte*)
   46227     1309      64080 MimeKit.MimeParser:StepHeaders (byte*)
   43171       23       9760 MimeKit.MimeParser:ConstructMessagePart (MimeKit.MessagePart,byte*)
   42115      218     343680 MimeKit.MimeParser:ParseAndAppendHeader ()
Comment 2 Jeffrey Stedfast 2016-01-11 21:52:55 UTC
I guess the problem is that the code ends up here?

        [System.Security.SecuritySafeCritical]  // auto-generated
        internal unsafe int GetCaseInsensitiveHashCode(String str, bool forceRandomizedHashing, long additionalEntropy)
        {
            // Validate inputs
            if (str==null) 
            {
                 throw new ArgumentNullException("str");
            }
            Contract.EndContractBlock();

#if MONO
            return StringComparer.CurrentCultureIgnoreCase.GetHashCode (str);
#else
            // Return our result
            return (InternalGetCaseInsHash(this.m_dataHandle, this.m_handleOrigin, this.m_textInfoName, str, forceRandomizedHashing, additionalEntropy));
#endif
        }


I'm betting that the StringComparer.CurrentCultureIgnoreCase.GetHashCode() is insanely expensive.

FWIW, GetCaseInsensitiveHashCode() is effectively called from here:

        [ResourceExposure(ResourceScope.None)]
        [ResourceConsumption(ResourceScope.Process, ResourceScope.Process)]
        internal static int GetHashCodeOrdinalIgnoreCase(String s, bool forceRandomizedHashing, long additionalEntropy)
        {
            // This is the same as an case insensitive hash for Invariant
            // (not necessarily true for sorting, but OK for casing & then we apply normal hash code rules)
            return (Invariant.GetCaseInsensitiveHashCode(s, forceRandomizedHashing, additionalEntropy));
        }


Notice that the comment says that it expects to use the InvariantCulture, not the CurrentCulture. This may make a difference. Of course, I'm not sure why they switch to Invariant seeing as how Ordinal should still be cheaper than Invariant?
Comment 3 Jeffrey Stedfast 2016-01-11 21:54:06 UTC
D'oh, never mind. I misread the code.
Comment 4 Marek Safar 2016-01-12 08:17:46 UTC
The issue should be addressed by https://github.com/mono/referencesource/commit/8382c4605d622d56354e7340bf872caaf36b972a

jeff, could you try your test with the latest mono or mono 4.3.2
Comment 5 Jeffrey Stedfast 2016-01-12 13:26:17 UTC
Unfortunately that change didn't seem to help much:

Mono JIT compiler version 4.3.2 (mono-4.3.2-branch/83e0157 Mon Jan 11 18:26:16 EST 2016)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
	TLS:           normal
	SIGSEGV:       altstack
	Notification:  kqueue
	Architecture:  x86
	Disabled:      none
	Misc:          softdebug 
	LLVM:          yes(3.6.0svn-mono-master/a173357)
	GC:            sgen
[fejj@localhost perf]$ mono --llvm ./mbox-parser.exe gigabyte.mbox 
Parsed 14896 messages in 10.819862s.
[fejj@localhost perf]$ mono ./mbox-parser.exe gigabyte.mbox 
Parsed 14896 messages in 10.593343s.
[fejj@localhost perf]$ mono --version
Mono JIT compiler version 4.0.5 ((detached/1d8d582 Tue Oct 20 15:15:33 EDT 2015)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
	TLS:           normal
	SIGSEGV:       altstack
	Notification:  kqueue
	Architecture:  x86
	Disabled:      none
	Misc:          softdebug 
	LLVM:          yes(3.6.0svn-mono-(detached/a173357)
	GC:            sgen
[fejj@localhost perf]$ mono --llvm ./mbox-parser.exe gigabyte.mbox 
Parsed 14896 messages in 5.737068s.
[fejj@localhost perf]$ mono ./mbox-parser.exe gigabyte.mbox 
Parsed 14896 messages in 6.912308s.


(testing this on my laptop which is slower, so posted timings of both 4.3.2 and 4.0.5 for comparison)
Comment 6 Marek Safar 2016-01-14 13:41:42 UTC
Fixed in master