This happens randomly during the build
* Assertion at ..\mono\utils\mono-threads.c:707, condition `info' not met
make: *** [../../build/library.make:315: ../../class/lib/net_4_x-win32/PEAPI.dll] Error 127
We don't have a reliable reproduction case, but it's recurrent on CI.
I have made the repro of this issue in the debugger. It is a race between main thread shutdown in mono_thread_manage and threads running mono_thread_detach_internal during shutdown. The problem is the removal from the threads list since mono_thread_manage will pick threads to wait upon from that list. If a thread removes itself from the list mono_thread_detach_internal it will then race against the shutdown of the runtime that will in turn terminate everything including the GC, invalidating the MonoInternalThread pointer used in mono_thread_detach_internal, that could then trigger the assert seen above or other undefined behavior depending on where the threads resume execution after main thread has shutdown runtime, but not yet terminated the process.
Steps to reproduce (in debugger):
* Run one of the failing compiling tests from above.
* Set breakpoint before runtime returns from C main method.
* Set a breakpoint at the end of mono_thread_detach_internal on the call to mono_thread_info_unset_internal_thread_gchandle.
* Set a breakpoint at threads.c mono_thread_manage:3334, on the call to mono_threads_lock.
* Run test under debugger. When main thread hits breakpoint freeze thread and continue execution.
* When worker threads hits breakpoint in mono_thread_detach_internal, freeze each worker thread until no more are active in the process.
* Switch back and resume main thread.
* NOTE, when finalizer thread hits breakpoint in mono_thread_detach_internal, just continue execution of the finalizer thread.
* When main thread hits breakpoint just before returning from main method, observe that all worker threads are still around.
* Freeze main thread (to prevent process from terminating).
* Pick on worker thread and resume execution.
* Worker thread fails with above assertion.
The fix is to make sure that threads accessing info from MonoInternalThread in mono_thread_detach_internal after they removed themselves from the threads list are still waited upon before runtime terminates the GC. For externally attached thread, there is no way to know that and it must be on the integrators responsibility to make sure they are completed before initialize runtime shutdown. For internal threads like the thread pool threads causing the problems in this case, we could add them to the joinable thread list, once removed from the threads list (but while still keeping the threads list lock). That will make sure runtime won’t finalize the shutdown until all joinable threads have completed. This feature is currently not implemented on Windows but probably should be and could then be used to solve the problem, using a mechanism already in use by the runtime.
If (for some reason) we can’t use the joinable wait list, we need to add a separate list for threads shutting down and make sure we wait for them to terminate during runtime shutdown.
I will make a prototype on the joinable thread solution and see how far that will takes us to fixing this shutdown race condition.
Proposed fix in https://github.com/mono/mono/pull/5599.
Fixed by https://github.com/mono/mono/commit/ed1884bb9b43ddd69daba52302494ad2d02bbbd9.