Bug 39042 - appdomain-unload.exe sometimes hangs in CI
Summary: appdomain-unload.exe sometimes hangs in CI
Alias: None
Product: Runtime
Classification: Mono
Component: General ()
Version: unspecified
Hardware: PC Linux
: --- normal
Target Milestone: ---
Assignee: Vlad Brezae
Depends on:
Reported: 2016-02-23 14:20 UTC by Alexander Köplinger [MSFT]
Modified: 2016-03-11 12:15 UTC (History)
2 users (show)

Is this bug a regression?: ---
Last known good build:

Repro app and gdb output (2.43 KB, application/x-zip-compressed)
2016-02-23 14:20 UTC, Alexander Köplinger [MSFT]

Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.

Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:

Description Alexander Köplinger [MSFT] 2016-02-23 14:20:03 UTC
Created attachment 15149 [details]
Repro app and gdb output

This is the last of the runtime tests that are sometimes flaky on Jenkins. It'd be great to have this finally fixed.

I reduced the test case to the attached sample.

Run it in a loop and after a while it'll hang. I've attached a gdb output.

By running with exception tracing on, I found out that when the hang happens, there's no ThreadAbortException happening in the Timer scheduler thread so it keeps spinning in https://github.com/mono/mono/blob/9f6238791f8b9e26bf952add917a93045cf07817/mcs/class/corlib/System.Threading/Timer.cs#L334

Comment from Ludovic from a while back:
> I wouldn't be surprised it's a race in Thread.Abort,
> like the on @bernhard.urban fixed in Thread.Suspend last week.
> the call to `abort_thread_internal` out of a `LOCK_THREAD(thread)` / `UNLOCK_THREAD(thread)`
> seems suspicious to me (that's just a guess though).

Environment: Ubuntu 14.04 x64 (didn't try on other configs)

Vlad: since you said yesterday you're happy for runtime bugs, I thought I'd give you a present and assign this to you :)
Comment 1 Alexander Köplinger [MSFT] 2016-02-29 15:25:08 UTC
It does seem to happen a lot more frequent with llvm Mono, e.g. if you look at the runtime-llvm step in https://wrench.internalx.com/Wrench/ViewTable.aspx?lane_id=2457&host_id=148 almost all of the recent failures are due to appdomain-unload.exe timing out.