Created attachment 19834 [details]
Chart of build times doubling and 2 build logs (before and after)
After we upgraded to the latest stable channel of Xamarin our build times have doubled. See attached image of our build times on TeamCity.
I have also attached build logs of before and after. I do not remember what version of Xamarin we upgraded from but we do know that on this latest version our build times went from averaging 20 minutes to averages 40 minutes.
We tried what is currently on the beta channel and that did not fix the problem.
From what I can tell looking at the logs, it looks like its the llvm optimization/linking that doubled in time.
Complete version information:
Xamarin Studio Community
Version 6.1.5 (build 0)
Installation UUID: ad25229e-3520-4078-a6f7-33a544d0dbba
Mono 4.6.2 (mono-4.6.0-branch/ac9e222) (64-bit)
GTK+ 2.24.23 (Raleigh theme)
Package version: 406020016
Apple Developer Tools
Xcode 8.2.1 (11766.1)
Version: 188.8.131.52 (Xamarin Studio Community)
Android SDK: /Users/xcpobuild/Library/Developer/Xamarin/android-sdk-macosx
Supported Android versions:
2.3 (API level 10)
4.0.3 (API level 15)
4.3 (API level 18)
4.4 (API level 19)
5.0 (API level 21)
5.1 (API level 22)
6.0 (API level 23)
7.0 (API level 24)
SDK Tools Version: 25.2.3
SDK Platform Tools Version: 25.0.1
SDK Build Tools Version: 25.0.1
Java SDK: /usr
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
Android Designer EPL code available here:
Xamarin Android Player
Version: 10.3.1.8 (Xamarin Studio Community)
Build date: 2016-12-20 02:58:14-0500
Release ID: 601050000
Git revision: 7494718e127af9eaec45a3bd6282d3da927488bd
Build date: 2017-01-17 10:31:01-05
Xamarin addins: c92d0626d347aaa02839689eaac2961d24c9f446
Build lane: monodevelop-lion-cycle8
Mac OS X 10.11.6
Darwin xcpobuild-7.xactware.com 15.6.0 Darwin Kernel Version 15.6.0
Thu Jun 23 18:25:34 PDT 2016
> I do not remember what version of Xamarin we upgraded from
It's part of the logs :)
- [Building Xactimate.iOS] Xamarin.iOS 10.0.1.10 (cycle8-sr0-xi: ad1cd42) using framework: /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS10.0.sdk
+ [Building Xactimate.iOS] Xamarin.iOS 10.3.1.8 (cycle8-xi: 7beaef4) using framework: /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS10.2.sdk
Release builds are not our normal optimization target, since it's executed less often that debug builds, but this is quite unexpected - even more since this is all from C8 (so the amount of changes is not huge).
- [Building Xactimate.iOS.sln] Building Xactimate.iOS (7m:36s)
+ [Building Xactimate.iOS.sln] Building Xactimate.iOS (30m:26s)
From a quick look all the arguments appears to be identical. Parts of the build is ran in parallel so analyzing the logs will take some time.
Now there's something important that our current* logging does not show, it's the size of the native dependencies. A few people ran into issues rebuilding (or upgrading) native libraries that now includes bitcode and that can make them 10x as large. Can you check if your dependencies size has changed during that period ?
* we'll be logging this information in the future
mtouch is called on Xactimate.iOS at
> grep "abi=arm" *log.log
> Last sub 20 minute log.log:[13:40:09]
> Most recent log.log:[09:44:56]
Managed Linking starts (twice for 32/64 bits) at
> grep "[Building Xactimate.iOS]" *log.log | grep "Linking /"
> Last sub 20 minute log.log:[13:40:17] +8 sec
> Last sub 20 minute log.log:[13:41:01] +44 sec
> Most recent log.log:[09:45:05] +9 sec
> Most recent log.log:[09:45:49] +44 sec
Native framework detection (first of two groups for 32/64 bits) at
> grep "[Building Xactimate.iOS]" *log.log | grep "Linking with"
> Last sub 20 minute log.log:[13:41:49] +48 sec
> Last sub 20 minute log.log:[13:44:00] +2m11s
> Most recent log.log:[09:46:35] +46 sec
> Most recent log.log:[09:55:50] +9m15s <--- at this stage only one arch was built
So the issue seems related to the native build.
I'll drill it down further tomorrow and start to compare the changes between the 2 versions.
@Zoltan do you recall any major change wrt LLVM in C8 service releases ?
Enabling llvm might cause long build times, but 20/40 minutes seem extreme.
The fix for https://bugzilla.xamarin.com/show_bug.cgi?id=52475 might help here, as it disables llvm on large methods.
To respond to what you asked about native dependencies changing: we tried the same exact code with the same xcode but the two different versions of xamarin and we got double build time on the latest xamarin stable. So the only difference seems to be Xamarin version. I attached new build logs with verbosity turned on so you can see.
Created attachment 20048 [details]
latest logs showing same code, different versions of Xamarin, with verbose logging enabled.
We tried again after the stable channel update (Xamarin.iOS 10.4.0.123) and there was no change.
I could not find remarkable time differences on build time using an app that uses Xamarin Forms plus + other dependencies + LLVM enabled. (Build logs: https://gist.github.com/dalexsoto/8528fc898d1bceb78c4d99c16f0c52c0).
Time Elapsed 00:06:47.2079750
Time Elapsed 00:07:04.5010380
@Bill if the final 10.4 did not help then it seems unrelated to the bug Zoltan mentioned in comment #3.
It's also not something we see if normal tests or what @Alex tried, so you might be running into a corner case.
It's been a while but iirc we had access, at some point, to your application. Do you know if that's still the case ? If so we might be able to do a binary search to pinpoint the change where this started.
Also do you know if other builds (e.g. local debug build) also suffers from this same slowdown ? I assumed not initially, and though it could be related to LLVM, but if it's a corner case then the issue might be inside the mono (not LLVM) AOT backend (we do know the time difference happens when doing native compilation).
@Sebastien you guys still have a copy of our repo. I will update it and then let you know via a comment on this bug.
Our debug builds are not affected.
I have updated our code on your repo. To build we run this command: ./xactimate.ios/Xactimate.iOS/build.sh "enterprise-build" which ends up running some FAKE scripts.
FYI we also tried MSBuild over xBuild to see if it would make a difference. It didn't.
I am able to reproduce this issue
C9 current stable XI:
Time Elapsed 00:25:50.5371740
Time Elapsed 00:08:06.1501160
Some results of running mtouch using the mono profiler using
mono64 --profile=log:nocalls,output=log,heapshot=100gc <mtouch args>
GC resizes: 0
Max heap size: 0
Object moves: 0
Gen0 collections: 355, max time: 11243us, total time: 2016881us, average: 5681us
Gen1 collections: 488, max time: 1035228us, total time: 303848084us, average: 622639us
GC handles weak: created: 4, destroyed: 4, max: 3
GC handles weaktrack: created: 41, destroyed: 37, max: 17
-> 303 seconds (5 minutes) are spent doing major GCs.
Heap shot 3 at 324.910 secs: size: 1357481056, object count: 14100062, class count: 560, roots: 0
Bytes Count Average Class name
339926752 11965 28410 System.Byte (bytes: +8464, count: +5)
329505344 5884024 56 Mono.Cecil.Cil.Instruction (bytes: +0, count: +0)
87706272 292751 299 Mono.Cecil.Cil.Instruction (bytes: +0, count: +0)
51408320 321302 160 Mono.Cecil.MethodDefinition (bytes: +0, count: +0)
48059712 711082 67 System.String (bytes: -1322280, count: -8831)
38624000 482800 80 Mono.Cecil.ParameterDefinition (bytes: +0, count: +0)
31383680 245185 128 Mono.Cecil.GenericInstanceType (bytes: +0, count: +0)
23094000 288675 80 Mono.Cecil.Cil.MethodBody (bytes: +0, count: +0)
20689208 569 36360 System.Collections.Generic.Dictionary.Entry<System.UInt32,System.String> (bytes: +0, count: +0)
20636360 515909 40 Mono.Cecil.MethodReturnType (bytes: +0, count: +0)
18510720 144615 128 Mono.Cecil.FieldDefinition (bytes: +0, count: +0)
17788664 270415 65 Mono.Cecil.TypeReference (bytes: +0, count: +0)
15573616 2 7786808 System.Collections.Generic.HashSet.Slot<Mono.Cecil.IMetadataTokenProvider> (bytes: +0, count: +0)
-> about 1.5GB of persistent cecil related data is kept around by mtouch, which needs to be scanned on each major gc.
* during the assembly phase, the cpu loads seems very low. Is this done in parallel ?
* ditto for the strip phase.
* stripping is done by invoking mono-cil-strip.exe. That spends about 0.5s per invocation just
for JITting cecil etc.
* the stripper loads assemblies using:
which will load all IL code, which is discarded right after.
* We should probably upgrade mono-cil-strip to the current cecil. Why is it using its own private copy ?
This is caused by this:
and similar code in multiple files.
If we want to keep this, we should do it every 10-20 process spawns, since mtouch has a 1gb heap due to cecil data structures.
This is the commit where this was introduced https://github.com/xamarin/xamarin-macios/commit/350be8882124966e6072b12cd6e895c06dab7177
@Zoltan thanks for helping Alex investigate the issue!
It's a big project, the same did not show up when measuring other, smaller projects. It might be possible to move stuff around and close (reclaim memory) of the loaded assemblies before starting the AOT compilation.
> * during the assembly phase, the cpu loads seems very low. Is this done in parallel ?
Is not the assembly phase run by the AOT compiler ? which is called in parallel;
> * ditto for the strip phase.
Stripping is done in parallel. However I remember this did not help a lot small projects. It's likely that small assemblies are processed a lot faster than calling the tool itself.
> * the stripper loads assemblies using:
which will load all IL code, which is discarded right after.
@JB is that needed ?
> * We should probably upgrade mono-cil-strip to the current cecil. Why is it using its own private copy ?
mono-cil-strip uses Cecil 0.6.x (not 0.9 or newer 1.0 betas) since it's the last version that could be use without modifying the metadata tokens, which the AOT compiler requires to remain constant. That's the reason it's stuck in the past with it's own copy of Cecil.
This is something I discussed with JB (ETL meeting). Cecil is not design to retain the original order/tokens, so the best option seems to investigate using System.Reflection.Metadata. However it means re-loading the same assemblies again into memory (or again hit the spawn cost).
Q: Zoltan, are you aware of the progress wrt https://bugzilla.xamarin.com/show_bug.cgi?id=43462 ? removing the calls to Collect would be better and safer than guessing the frequency required to avoid the bug.
So the low cpu usage was caused by the GC.Collect () calls.
(In reply to Zoltan Varga from comment #15)
> This is caused by this:
> and similar code in multiple files.
> If we want to keep this, we should do it every 10-20 process spawns, since
> mtouch has a 1gb heap due to cecil data structures.
We might not need the cecil data structures when these GC.Collect calls occur, so we might be able to just null those out and collect them all.
Still, doing 1 gc per process spawn is going to be slow.
Going through the cil-strip and old-cecil code, I think the FullLoad is not necessary. At least it's worth trying to remove.
An intermediate solution would be to have 1 process doing the cil-strip for all assemblies. We can modify cil-strip to take a list of assemblies to process.
I'll think about what we can do with Cecil 0.10 for this use case.
> At least it's worth trying to remove.
> An intermediate solution would be to have 1 process doing the cil-strip
and strip in parallel from inside the tool ?
If not this might affect the build performance for smaller projects. Again worth adding, just not sure if it will be useful (or harmful) generally... but we have the choice to use it (or not).
> I'll think about what we can do with Cecil 0.10 for this use case.
Our discussion ended with the fact that strip is called on assemblies that might not have been re-written by Cecil - which makes it more than a "deterministic output" issue.
It would still be _ideal_ to have Cecil be able to _respect_ the original metadata tokens. In such case we could strip from mtouch itself (and the data already in memory) - no more spawning processes, no more reloading assembly data...
Since the original fix was a workaround for bug #43462 and this was fixed in C9  then I think we can revert the GC.Collect calls in `master` and, time builds are confirmed, in `d15-2`.
@Alex please go ahead with a PR for master. We'll merge it unless other concerns are raised.
Also please move the 2 other suggestions (strip and collect cecil data once not needed) to different bug reports (with links back here). Thanks!
Project built with current master + PR 1976 patch
Time Elapsed 00:08:25.08
Fixed (master): https://github.com/xamarin/xamarin-macios/commit/2a85ec1674da63b2fd808b93721aafd9cbc3faed
Backported to dev15-2 https://github.com/xamarin/xamarin-macios/pull/1979
*** Bug 55115 has been marked as a duplicate of this bug. ***
@Bill Hi, as this is a development related bug. kindly, verify it on your end or provide me steps to verify this.
@Aman I will verify it once it reaches at least the alpha channel. Has the fix made it there?
@Bill yes, it's in our latest alpha. Anything with 10.10.0.6+ will have it. Again thanks for your help tracking this :)
@Aman @Sebastien The latest alpha improved the build times back to about where they were. Thanks for your help!