Bug 56322

Summary: Running nunit tests with domain isolation crashes Mono
Product: [Mono] Runtime Reporter: Aleksey Kliger <aleksey>
Component: ReflectionAssignee: Aleksey Kliger <aleksey>
Status: VERIFIED FIXED    
Severity: major CC: joncham, luis.aguilera, masinha, mono-bugs+runtime
Priority: ---    
Version: 5.0   
Target Milestone: 15.2.2   
Hardware: PC   
OS: Mac OS   
Tags: Is this bug a regression?: Yes
Last known good build: mono-4.8.0-branch
Attachments: leaked coop handles

Description Aleksey Kliger 2017-05-12 23:45:48 UTC
Created attachment 22153 [details]
leaked coop handles

Running NUnit tests with process isolation leads to a crash in the GC.

[19:15:04] 1>* Assertion: should not be reached at ./sgen-scan-object.h:90
[19:15:04] 1>
[19:15:04] 1>Stacktrace:
[19:15:04] 1>
[19:15:04] 1>
[19:15:04] 1>Native stacktrace:
[19:15:04] 1>
[19:15:04] 1>    0 mono 0x000000010bdd6561 mono_handle_native_crash + 277
[19:15:04] 1>    1 libsystem_platform.dylib 0x00007fff8968ff1a _sigtramp + 26
[19:15:04] 1>    2 ??? 0x0000000000000004 0x0 + 4
[19:15:04] 1>    3 libsystem_c.dylib 0x00007fff8b1729b3 abort + 129
[19:15:04] 1>    4 mono 0x000000010bf6a530 mono_log_write_logfile + 360
[19:15:04] 1>    5 mono 0x000000010bf7e5d8 monoeg_g_logv + 83
[19:15:04] 1>    6 mono 0x000000010bf7e77d monoeg_assertion_message + 143
[19:15:04] 1>    7 mono 0x000000010bf46ad0 drain_gray_stack + 8056
[19:15:04] 1>    8 mono 0x000000010bf3bdc3 finish_gray_stack + 117
[19:15:04] 1>    9 mono 0x000000010bf3c519 major_finish_collection + 125
[19:15:04] 1>    10 mono 0x000000010bf390c6 major_do_collection + 154
[19:15:04] 1>    11 mono 0x000000010bf38656 sgen_perform_collection + 687
[19:15:04] 1>    12 mono 0x000000010bf39a4d sgen_gc_collect + 50
[19:15:04] 1>    13 mono 0x000000010bef5162 unload_thread_main + 813
[19:15:04] 1>    14 mono 0x000000010bf756d6 inner_start_thread + 128
[19:15:04] 1>    15 libsystem_pthread.dylib 0x00007fff87a9b05a _pthread_body + 131
[19:15:04] 1>    16 libsystem_pthread.dylib 0x00007fff87a9afd7 _pthread_body + 0
[19:15:04] 1>    17 libsystem_pthread.dylib 0x00007fff87a983ed thread_start + 13

See attachment for state of the main threads coop handle stack.  There is a coop handle leak, because the main thread at the time is blocked on a wait, and we investigated and the objects in the coop handle stack belong to an unloaded domain.
Comment 1 Aleksey Kliger 2017-05-12 23:46:56 UTC
We have a PR against master that fixes the issue: https://github.com/mono/mono/pull/4852

Need to cherrypick to 2017-04 and 2017-02
Comment 2 Aleksey Kliger 2017-05-13 01:25:37 UTC
One way to reproduce the problem in a mono checkout:

1. Compile mono.  Ensure that the "corlib" and "System" tests are compiled, too:
   make -C mcs/class/corlib check
   make -C mcs/class/System check
2. Run the following:
  MONO_PATH="./mcs/class/lib/net_4_x:$MONO_PATH" runtime/mono-wrapper --debug ./mcs/class/lib/net_4_x/nunit-console.exe  -domain=Multiple mcs/class/System/net_4_x_System_test.dll mcs/class/corlib/net_4_x_corlib_test.dll -exclude=NotOnMac,MacNotWorking,NotWorking,ValueAdd,CAS,InetAccess -nothread

(Need to run more than one test assembly via nunit-console.exe and they should both be fairly large so that we leak a fair number of handles).

Expected output: some number of passed or failed tests.
Actual output: Mono asserts in sgen-scan-object.h and a crash.
Comment 3 Aleksey Kliger 2017-05-15 19:05:02 UTC
Fixed on mono master with commits https://github.com/mono/mono/commit/d321424cabda97947e66b02242f66881e27ab744 (and 7eafb61cf17393f67f847d7ad79182df0f6f6e61)

Fixed on mono 2017-04 with https://github.com/mono/mono/commit/82f5fb6d0bbc806724e51052766cbed0f75e7ad3 (and af2b7b62d0bfac790ae76efa13ce79b24e6a6052)

Fixed on mono 2017-02 with https://github.com/mono/mono/commit/25ac18a9b7176b6c5995113dbcc8afd880bfb633 (and d95d7d30fe89bd373af74cf08d7d2fff197b36c4)