Bug 26842

Summary: Support reproducible builds
Product: [Mono] Runtime Reporter: henrik
Component: packagingAssignee: Jo Shields <jo.shields>
Status: RESOLVED FIXED    
Severity: normal CC: alex.koeplinger, dkg, masafa, mono-bugs+mono, mono-bugs+runtime
Priority: ---    
Version: unspecified   
Target Milestone: ---   
Hardware: All   
OS: All   
Tags: Is this bug a regression?: ---
Last known good build:
Attachments: first pass at getting a pre-defined timestamp into the PE module

Description henrik 2015-02-08 09:49:27 UTC
Currently there's no guidance available on how to create reproducible builds.

There's an official one from mono, which is lacks documentation on how it was produced (from my searching).

There's the 'make && make install' way of doing things, which lacks things, amongst others the PCL libraries (see https://bugzilla.xamarin.com/show_bug.cgi?id=26493 too - this is not a duplicate, btw).

Besides that, it contains a lot of diff against the 'official installation':

$ diff --brief -r /usr/local/Cellar/mono/3.12.0/ /Library/Frameworks/Mono.framework/Versions/3.12.0 | wc -l
=> 2228

Making it very hard to be certain that the homebrew installation is complete.

Also, the official release contains F#, but `brew edit fsharp` doesn't correspond to those same settings. How to build 32 bits F# is not documented anywhere.

There's also the more global question of being able to reproduce builds from a perspective of ensuring that the installation has not been tainted, which is being addressed through Debian's and Fedora's ReproducibleBuild; but for mono it seems we're not there yet: http://lists.alioth.debian.org/pipermail/reproducible-builds/Week-of-Mon-20141103/000545.html -- also see the wiki https://wiki.debian.org/ReproducibleBuilds.

I would like Xamarin and the community to ensure that we can easily script a development environment and be certain that it works. Also see https://github.com/Homebrew/homebrew/issues/36279 and https://github.com/fsharp/fsharp/issues/328
Comment 1 Jo Shields 2015-02-10 06:11:28 UTC
This is really multiple questions in one, I'll try as best I can to answer it.

> There's an official one from mono, which is lacks documentation on how it was
produced (from my searching).

I assume you mean the "MDK" Mac .pkg here? You're right, it's not documented - the scripts that generate those files are not currently public (they're part of the same repository with the build scripts for our proprietary iOS/Android products). I'll see what I'm allowed to share - there shouldn't be anything secret in those Mac-specific files, but I need to confirm.

> There's the 'make && make install' way of doing things, which lacks things,
amongst others the PCL libraries

Yep. "make install" in the Mono source is only the runtime/classlib. The MDK also contains (off the top of my head) The PCL libs, F#, IronPython, IronRuby, Gtk#, Boo, XSP, NuGet, VBNC. These are all open-source, but their individual build systems may vary.

> Also, the official release contains F#, but `brew edit fsharp` doesn't
correspond to those same settings. How to build 32 bits F# is not documented
anywhere.

F# is entirely managed code, there's no distinction between 32 and 64-bit binaries (although the build systems for some of the Microsoft-sourced stuff like IronPython have some Windowsisms, and generating both "ipy" and "ipy64" is an example of this - there's no difference on Mono, so these differences can be ignored)

> There's also the more global question of being able to reproduce builds from a perspective of ensuring that the installation has not been tainted, which is being addressed through Debian's and Fedora's ReproducibleBuild; but for mono it seems we're not there yet

And even more complicated than that - as well as a build timestamp in the PE/COFF header, there's an optional auto-generated timestamp-based version number in some assemblies (I don't think many of the Mono assemblies include this, but it's a general concern for .NET stuff) and more problematically, a GUID baked into assemblies, which is used in our AOT runtime to ensure a match between assemblies and AOT caches. I don't know offhand if those AOT caches include any kind of stamps or markers other than the matching GUID.
Comment 2 henrik 2015-02-13 03:32:49 UTC
Hello Jo,

I think it would be a good step to take, to make those scripts public. In particular to make them able to take some 'extra component' and build it -- this would make it a lot easier to fit them together with mono + F# + PCL, or in your case mono + Gtk# + ... + F# + PCL. You could keep your proprietary components built in secret, but the open source components which are needed to do actual professional development on mono would be easy to construct.

I'm cross linking a discussion I'm having on encoding the build as a homebrew formula - https://github.com/Homebrew/homebrew/pull/36764#issuecomment-74214177

Perhaps collecting all public repos like so https://github.com/mono/monodevelop/tree/master/main/external could be a part of the 'componetised' way of building?

As for the timestamps; they would only be a question mark when the build can be reliably reproduced for the open components. However, in the link to debian's mailing list, there was a link to a binary parser that was capable of setting the timestamps correctly - would it be possible for Xamarin to take part in the discussion -- http://lists.alioth.debian.org/pipermail/reproducible-builds/Week-of-Mon-20141103/000545.html and help the Debian folks take their reproducible-builds project forward?
Comment 3 Daniel Kahn Gillmor 2015-02-16 04:25:48 UTC
This does seem like a lot of questions in one (or maybe i'm just hampered by my lack of understanding of the homebrew and fsharp toolchain.

I'm definitely interested in what we can do to make sure that mono produces reproducible assemblies.  For embedded timestamps in other toolchains, the usual approach is to be able to supply the desired timestamp from outside the build process (e.g. via an environment variable or explicit argument).

Is that doable within the mono toolchain?  solving that concrete problem would help us to narrow down the remaining causes of non-determinism in the build process.
Comment 4 Jo Shields 2015-02-16 05:59:08 UTC
@Daniel

I don't believe our existing toolchain supports this.

https://github.com/mono/ikvm-fork/blob/master/reflect/Writer/PEWriter.cs#L161 is the line in question which embeds the stamp, if you want to propose the change on the mono-devel mailing list.
Comment 5 Daniel Kahn Gillmor 2015-02-16 12:55:41 UTC
Created attachment 9859 [details]
first pass at getting a pre-defined timestamp into the PE module

Thanks for the pointer, Jo.  I'm afraid i don't know enough about the syntax or the toolchain here to be confident in any specific contributions, but i've attached an (untested) patch that suggests a way to pass a preferred timestamp into ModuleWriter.WriteModule()

This doesn't expose the functionality all the way to the outside world yet (i don't know how you'd prefer to expose it to the process invoking the toolchain), but i wanted to see if this is the sort of approach that would be acceptable.
Comment 6 Jo Shields 2015-02-16 14:14:52 UTC
A ternary operator could glue the lot together into a one liner - replace:

		public DWORD TimeDateStamp = (uint)(DateTime.UtcNow - new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc)).TotalSeconds;

with

		public DWORD TimeDateStamp = Environment.GetEnvironmentVariable("IKVM_WRITER_TIMESTAMP_EPOCH") != null ? uint.Parse(Environment.GetEnvironmentVariable("IKVM_WRITER_TIMESTAMP_EPOCH")) : (uint)(DateTime.UtcNow - new DateTime(1970, 1, 1, 0, 0, 0,DateTimeKind.Utc)).TotalSeconds;

However, at this point I'd DEFINITELY recommend taking it to the mailing list. And maybe CC Jeroen Frijters, since IKVM.Reflection is his. And it doesn't solve the secondary issue of the assembly GUID.
Comment 7 Daniel Kahn Gillmor 2015-02-17 16:36:45 UTC
This appears to have been settled on the mailing list:

http://lists.ximian.com/pipermail/mono-devel-list/2015-February/042776.html

Alexander Köplinger says:

> FYI, roslyn made the switch to being deterministic by default in April
> last year:
> https://github.com/dotnet/roslyn/commit/04462c44e30dfa91267581abdb029f31
> 02796486
> 
> Quoting from the commit:
> "(1) The timestamp in the header is replaced with 0 (which is
> specifically allowed by the spec)
>  (2) The module version ID Guid (Mvid) is computed by hashing the
> contents of the generated assembly (with zero
>       where the Mvid will go for the purposes of computing the hash)
> 
> The name of the "private implementation details" class no longer
> includes the Mvid."

And Jeroen Frijters says:

> By setting the UniverseOptions.DeterministicOutput flag, 
> IKVM.Reflection will now do the same as Roslyn. This is
> currently not compatible with PDB file generation (because
> the PDB file generates another random GUID in the debug
> directory), but for Mono that is not an issue.
>
> Marek, if mcs sets this flag it won't need to do anything
> else and given that Roslyn behaves the same, it is unlikely
> to cause problems (although for large files there is a
> small perf impact of hashing the output file).

This is excellent news!  Is there anything else that needs to be done to get these changes available all the way up the stack?
Comment 8 Marek Safar 2015-02-18 14:04:09 UTC
mcs part implemented in master
Comment 9 Alexander Köplinger 2015-02-18 14:36:49 UTC
Just to correct my comment above, the Roslyn devs have moved this to opt-in since then (https://github.com/dotnet/roslyn/commit/798942020da183159cec4d7ab0187116ba2b5313). I asked on the gitter chat for some insights:

> @akoeplinger: deterministic compiling is now opt-in right? did you run into problems or why is it not the default?
> @gafter: First, it isn't done. There are two IL tables that are not deterministic.
> @gafter: The PDBs are generated using an API that produces its own GUID for inclusion in the assembly, and we have no way to make that deterministic.
> @gafter: In order to make the timestamp deterministic, there are a lot of consumers that need to be prepared to handle the weird timestamps we would generate.
> @gafter: So it does not work (today).
> @gafter: I think we need to revise the PDB format before we can complete it.
> @gafter: I think someone is thinking about that.

AFAIK the PDB problem shouldn't affect mcs, right?
Comment 10 Daniel Kahn Gillmor 2015-07-18 18:16:26 UTC
on 2015-04-04, Jo Shields marked this bug as RESOLVED FIXED, but i don't see where the fix is, or how i can test it.  can you point me to the fix?
Comment 11 Jo Shields 2015-07-20 03:06:09 UTC
(In reply to comment #10)
> on 2015-04-04, Jo Shields marked this bug as RESOLVED FIXED, but i don't see
> where the fix is, or how i can test it.  can you point me to the fix?

a803d17038c0fcc8b40b12744801a87ceddb15ba - just use a Mono with that commit in it for deterministic builds.