|Summary:||Xamarin.iOS startup performance issue|
|Product:||iOS||Reporter:||Jerome Laban <jerome.laban>|
|Component:||XI runtime||Assignee:||Bugzilla <bugzilla>|
|Severity:||normal||CC:||danisha, lupus, miguel, mohitk, mono-bugs+monotouch, sebastien, vargaz|
|Tags:||Is this bug a regression?:||---|
|Last known good build:|
|Attachments:||A synthetic benchmark|
Description Jerome Laban 2014-04-08 21:29:32 UTC
Created attachment 6534 [details] A synthetic benchmark The cold-start of an application takes a significant amount of time on lower-end devices, like the iPhone 4 and iPad 2. The profiling sessions show that the time is spent in generic_trampoline_delegate, a method that will build the trampolines for generic method calls. I’ve attached a synthetic benchmark that reproduces the issue, which contains thousands of generic methods, in a similar way as our application does work with generic methods. On an iPad 2, the first call of the chain takes about 1150ms, whereas on the second call, it takes about 4ms (four). The caching mechanism of the mini runtime is working properly, as we can see that the time drops significantly. However, during the first calls, the time taken to resolve the methods is significant, and seems to linearly increase, in relation to the number of generic types present in the application domain. As a tentative performance improvement, parallelizing does not seem to have any impact, as when calling the same code on two threads, the first call takes 2330ms, where with the second call, both take 4ms (four). Note that the ratio between cold and warm time is *very* different with latest Apple devices, like the 5S, where the cold duration drops by a factor of 4. This has a great impact on the perceived performance of the app for the consumer, even though when the app is warmed up, the performance is great.
Comment 1 Zoltan Varga 2014-04-08 22:00:55 UTC
Thanks for the testcase. Checked in a fix to mono master 078dc0321d53f9e161957656550fd10cc41db618/mono-3.4.0 0081c27e0d6473a83cc856abf67c4a42dc21b53d. It improves the first run of the benchmark from 1.1s to 0.4s for me.
Comment 2 Jerome Laban 2014-04-09 11:10:33 UTC
Thank Zoltan, that's quite an improvement :) Would you know if that also improves the performance in multi-thread scenarios ? Thank you,
Comment 3 Zoltan Varga 2014-04-09 12:27:19 UTC
It probably does.
Comment 4 Jerome Laban 2014-04-09 14:56:28 UTC
I'm asking because of this: https://github.com/mono/mono/blob/078dc0321d53f9e161957656550fd10cc41db618/mono/metadata/metadata.c#L2808 Where there is contention when resolving the generic methods. The work being done inside the lock is pretty significant...
Comment 5 Miguel de Icaza [MSFT] 2014-04-09 15:54:25 UTC
This patch introduced a regression, Mono no longer bootstraps, see:  https://github.com/mono/mono/commit/078dc0321d53f9e161957656550fd10cc41db618#commitcomment-5953321
Comment 6 Zoltan Varga 2014-04-09 15:59:28 UTC
The changes were reverted from master/3.4.0 for now. @Jerome: Will look at reducing the work done inside the lock.
Comment 7 Zoltan Varga 2014-04-11 23:17:10 UTC
Committed a fixed fix to mono master ea490c5486af6e1ce6ce8b1a117f1d99cf988df0. It will be in a future mt version after some testing.
Comment 8 Zoltan Varga 2014-04-17 12:28:14 UTC
The corresponding change on the 3.4.0 branch is 28145e01f42317e685ad1020a47ba746f164c28b.
Comment 9 Jerome Laban 2014-04-17 14:41:55 UTC
Using the same PoC, the run time is down from 1150ms to 268ms, same hardware. Great improvement Zoltan, thanks ! Note that the behavior for multi-thread is vastly better, bit still slower than the single-cpu test. (2330ms down to 380ms)
Comment 10 Sebastien Pouliot 2014-06-18 21:14:04 UTC
This fix is part of the 7.2.6 release (in th alpha channel right now).