Bug 46940 - The first invocation of a method on a generic interface is 1060x slowe than the second
Summary: The first invocation of a method on a generic interface is 1060x slowe than t...
Status: REOPENED
Alias: None
Product: iOS
Classification: Xamarin
Component: Mono runtime / AOT compiler (show other bugs)
Version: XI 10.2 (iOS 10.1)
Hardware: PC Windows
: Normal enhancement
Target Milestone: Future Cycle (TBD)
Assignee: Zoltan Varga
URL:
Depends on:
Blocks:
 
Reported: 2016-11-14 17:57 UTC by Jerome Laban
Modified: 2017-02-08 02:05 UTC (History)
2 users (show)

See Also:
Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
Repro (856.02 KB, application/x-zip-compressed)
2016-11-14 17:57 UTC, Jerome Laban
Details

Description Jerome Laban 2016-11-14 17:57:00 UTC
Created attachment 18455 [details]
Repro

On iOS, the first invocation of a method on a generic interface is extremely slow.

Using the attached sample, on an iPhone 5 (ARMv7), the first invocation is 1060 times slower than the second. On an iPhone 7 (ARM64), it is 340 times slower.

The code in the sample is a repetition of :

   ((IEnumerable<ItemXXX>)new ItemXXX[0]).GetEnumerator();

where XXX counts to 5000.

This impacts the startup of a large application pretty severely.
Comment 1 Rodrigo Kumpera 2016-11-14 20:34:15 UTC
This happens because we lazily resolve interface methods, it's an expected cost.

We're constantly working to optimize startup and given how fast interface dispatch is, such huge difference is expected.

Do you have a specific app where interface method resolution is disproportionately affecting your startup time?
Comment 2 Jerome Laban 2016-11-14 21:22:34 UTC
I understand this is an expected cost, yet it's very high, and it's even worse when multithreading is involved (because of mono_loader_lock). 

I picked this sample specifically, but the same happens the using async methods, with the AsyncTaskMethodBuilder.

Most apps we create exhibit this specific issue, in the same way most F# apps which rely heavily on generics should be impacted.
Comment 3 Zoltan Varga 2016-11-15 02:07:17 UTC
It is possible to do something about this, but instantiating 1000s of types is never going to be fast, the runtime is designed to do the same thing multiple times, not a lot of things once.
Keeping the bug open.
Comment 4 Jerome Laban 2016-11-15 02:33:17 UTC
@zoltan thanks for the update. That's the thing though, the startup of an application is all about the first use of a lot of types and methods, either from the BCL, or the app itself. Capturing (and non-capturing to a certain extent) C# lambdas in generic types, as well as async methods are also all about using new types for display classes.

I've also noticed that on an iPhone 7, with large apps, most leaves of stack traces shown by Instruments are mutexes locks or unlocks. Could some cases benefits from rwlocks instead of mutexes ? Maybe not the loader lock though, which is used at a very large number of unrelated locations, but ones like the domain lock or image lock.
Comment 5 Rodrigo Kumpera 2017-02-07 22:38:19 UTC
Hi Jerome,

I got some news, not great, but good. We're in the process of landing some loading scalability improvements to the runtime.

This won't fix the fixed initialization cost, but it will behave much better on multi-threaded setups like yours.
Comment 6 Rodrigo Kumpera 2017-02-07 22:47:06 UTC
Funny story, I was just informed that work I did for the current cycle might actually help here. I did change how we handle interfaces on arrays to be lazier.

This should reduce the type loading cost by some.

We must re-evaluate this issue once a Xamarin.iOS with the above changes ship.
Comment 7 Zoltan Varga 2017-02-08 02:05:24 UTC
@kumpera a few reasons this is still slow:

- we create the full vtables for the array classes even if only one method is required, this
  requires the inflation of 25 generic methods per class.
- the inflated methods are kept in one hash which grows to 125k entries when using the testcase.
- due to the creation of all these interfaces, we allocate 80mb of memory to hold
  klass->interface_bitmap.

Note You need to log in before you can comment on or make changes to this bug.