Bug 28187

Summary: Calling System.Environment.Exit() leads to SIGABRT: Assertion at mini-exceptions.c:834, condition `domain' not met
Product: [Mono] Runtime Reporter: delcypher
Component: GeneralAssignee: Bugzilla <bugzilla>
Status: RESOLVED FIXED    
Severity: normal CC: alexander.kyte, mono-bugs+mono, mono-bugs+runtime, pr0vieh, vargaz
Priority: ---    
Version: 3.12.0   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Tags: Is this bug a regression?: ---
Last known good build:
Attachments: A patch that works around the issue

Description delcypher 2015-03-19 06:51:50 UTC
Hi,

I'm using mono 3.12.1 (built from [1]) on Arch Linux. I'm running into an issue where calling System.Environment.Exit() frequently leads to the mono runtime hitting an assertion and it aborting.

This seems to occur when I run multiple copies of my application in parallel many times. I can't seem to reproduce it very easily if I just run a single instance.

Here's the full stacktrace.

```
* Assertion at mini-exceptions.c:834, condition `domain' not met

Stacktrace:

  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) System.Environment.Exit (int) <0xffffffff>
  at Microsoft.Boogie.GPUVerifyBoogieDriver.Main (string[]) <0x0043b>
  at (wrapper runtime-invoke) <Module>.runtime_invoke_void_object (object,intptr,intptr,intptr) <0xffffffff>

Native stacktrace:

        /usr/lib/libmonosgen-2.0.so.1(+0xcf1da) [0x7fe3088de1da]
        /usr/lib/libpthread.so.0(+0x10210) [0x7fe308603210]
        /usr/lib/libc.so.6(gsignal+0x37) [0x7fe308283a97]
        /usr/lib/libc.so.6(abort+0x16a) [0x7fe308284e6a]
        /usr/lib/libmonosgen-2.0.so.1(+0x25a5e9) [0x7fe308a695e9]
        /usr/lib/libmonosgen-2.0.so.1(+0x25a85c) [0x7fe308a6985c]
        /usr/lib/libmonosgen-2.0.so.1(+0x25a9f3) [0x7fe308a699f3]
        /usr/lib/libmonosgen-2.0.so.1(+0xce052) [0x7fe3088dd052]
        /usr/lib/libmonosgen-2.0.so.1(+0xce449) [0x7fe3088dd449]
        /usr/lib/libmonosgen-2.0.so.1(+0x1b45a0) [0x7fe3089c35a0]
        /usr/lib/libmonosgen-2.0.so.1(+0x1b8f6e) [0x7fe3089c7f6e]
        /usr/lib/libmonosgen-2.0.so.1(+0x1b9555) [0x7fe3089c8555]
        /usr/lib/libmonosgen-2.0.so.1(+0x164956) [0x7fe308973956]
        [0x40f1fc02]

Debug info from gdb:

Mono support loaded.
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[New LWP 18293]
[New LWP 18292]
[New LWP 18264]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
0x00007fe308602dfb in waitpid () from /usr/lib/libpthread.so.0
  Id   Target Id         Frame
  4    Thread 0x7fe30611b700 (LWP 18264) "Finalizer" 0x00007fe308601920 in sem_wait () from /usr/lib/libpthread.so.0
  3    Thread 0x7fe3059bd700 (LWP 18292) "mono" 0x00007fe308338853 in epoll_wait () from /usr/lib/libc.so.6
  2    Thread 0x7fe30597c700 (LWP 18293) "IO Threadpool w" 0x00007fe308601920 in sem_wait () from /usr/lib/libpthread.so.0
* 1    Thread 0x7fe308fc3780 (LWP 18262) "mono" 0x00007fe308602dfb in waitpid () from /usr/lib/libpthread.so.0

Thread 4 (Thread 0x7fe30611b700 (LWP 18264)):
#0  0x00007fe308601920 in sem_wait () from /usr/lib/libpthread.so.0
#1  0x00007fe308a5d766 in mono_sem_wait (sem=sem@entry=0x7fe308db0780 <finalizer_sem>, alertable=alertable@entry=1) at mono-semaphore.c:101
#2  0x00007fe3089e3ab9 in finalizer_thread (unused=<optimized out>) at gc.c:1077
#3  0x00007fe3089c6d07 in start_wrapper_internal (data=<optimized out>) at threads.c:663
#4  start_wrapper (data=<optimized out>) at threads.c:710
#5  0x00007fe308a63195 in inner_start_thread (arg=0x7fffe53d3170) at mono-threads-posix.c:88
#6  0x00007fe3085fa314 in start_thread () from /usr/lib/libpthread.so.0
#7  0x00007fe30833824d in clone () from /usr/lib/libc.so.6

Thread 3 (Thread 0x7fe3059bd700 (LWP 18292)):
#0  0x00007fe308338853 in epoll_wait () from /usr/lib/libc.so.6
#1  0x00007fe3089ca8c9 in tp_epoll_wait (p=0x7fe308db04c0 <socket_io_data>) at ../../mono/metadata/tpool-epoll.c:118
#2  0x00007fe3089c6d07 in start_wrapper_internal (data=<optimized out>) at threads.c:663
#3  start_wrapper (data=<optimized out>) at threads.c:710
#4  0x00007fe308a63195 in inner_start_thread (arg=0x7fffe53d1e90) at mono-threads-posix.c:88
#5  0x00007fe3085fa314 in start_thread () from /usr/lib/libpthread.so.0
#6  0x00007fe30833824d in clone () from /usr/lib/libc.so.6

Thread 2 (Thread 0x7fe30597c700 (LWP 18293)):
#0  0x00007fe308601920 in sem_wait () from /usr/lib/libpthread.so.0
#1  0x00007fe308a5d766 in mono_sem_wait (sem=sem@entry=0x7fe2fc000950, alertable=alertable@entry=0) at mono-semaphore.c:101
#2  0x00007fe308a6338a in suspend_signal_handler (_dummy=<optimized out>, info=<optimized out>, context=0x7fe30597b7c0) at mono-threads-posix.c:312
#3  <signal handler called>
#4  0x00007fe308334107 in munmap () from /usr/lib/libc.so.6
#5  0x00007fe308a593c5 in mono_vfree (addr=<optimized out>, length=<optimized out>) at mono-mmap.c:342
#6  0x00007fe3088dde82 in mono_free_altstack (tls=tls@entry=0x7fe2fc002290) at mini-exceptions.c:2006
#7  0x00007fe30884de81 in free_jit_tls_data (jit_tls=0x7fe2fc002290) at mini.c:2883
#8  0x00007fe30884defa in mini_thread_cleanup (thread=0x7fe3078c4430) at mini.c:2947
#9  0x00007fe3089c5e79 in thread_cleanup (thread=0x7fe3078c4430) at threads.c:461
#10 0x00007fe3089c6d0f in start_wrapper_internal (data=<optimized out>) at threads.c:686
#11 start_wrapper (data=<optimized out>) at threads.c:710
#12 0x00007fe308a63195 in inner_start_thread (arg=0x7fffe53d1e70) at mono-threads-posix.c:88
#13 0x00007fe3085fa314 in start_thread () from /usr/lib/libpthread.so.0
#14 0x00007fe30833824d in clone () from /usr/lib/libc.so.6

Thread 1 (Thread 0x7fe308fc3780 (LWP 18262)):
#0  0x00007fe308602dfb in waitpid () from /usr/lib/libpthread.so.0
#1  0x00007fe3088de270 in mono_handle_native_sigsegv (signal=<optimized out>, ctx=<optimized out>) at mini-exceptions.c:2323
#2  <signal handler called>
#3  0x00007fe308283a97 in raise () from /usr/lib/libc.so.6
#4  0x00007fe308284e6a in abort () from /usr/lib/libc.so.6
#5  0x00007fe308a695e9 in monoeg_log_default_handler (log_domain=<optimized out>, log_level=G_LOG_LEVEL_ERROR, message=<optimized out>, unused_data=<optimized out>) at goutput.c:232
#6  0x00007fe308a6985c in monoeg_g_logv (log_domain=log_domain@entry=0x0, log_level=log_level@entry=G_LOG_LEVEL_ERROR, format=format@entry=0x7fe308a72d30 "* Assertion at %s:%d, condition `%s' not met\n", args=args@entry=0x7fffe53d24d0) at goutput.c:113
#7  0x00007fe308a699f3 in monoeg_assertion_message (format=format@entry=0x7fe308a72d30 "* Assertion at %s:%d, condition `%s' not met\n") at goutput.c:133
#8  0x00007fe3088dd052 in mono_walk_stack_full (func=0x7fe3089c30c0 <last_managed>, start_ctx=0x7fe2fc000998, domain=0x0, jit_tls=0x7fe2fc002290, lmf=0x0, unwind_options=MONO_UNWIND_NONE, user_data=0x7fffe53d29a0) at mini-exceptions.c:834
#9  0x00007fe3088dd449 in mono_walk_stack_with_state (func=0x7fe3089c30c0 <last_managed>, state=0x4756, unwind_options=150747008, user_data=0x7fffe53d29a0) at mini-exceptions.c:789
#10 0x00007fe3089c35a0 in mono_thread_info_get_last_managed (info=info@entry=0x7fe2fc0008e0) at threads.c:4539
#11 0x00007fe3089c7f6e in suspend_thread_internal (thread=thread@entry=0x7fe3078c4430, interrupt=interrupt@entry=1) at threads.c:4658
#12 0x00007fe3089c8555 in mono_thread_suspend_all_other_threads () at threads.c:3116
#13 0x00007fe308973956 in ves_icall_System_Environment_Exit (result=0) at icall.c:6534
#14 0x0000000040f1fc02 in ?? ()
#15 0x00007fe307401000 in ?? ()
#16 0x00007fe307427788 in ?? ()
#17 0x00007fe307401000 in ?? ()
#18 0x00007fe307427818 in ?? ()
#19 0x000000004026bda0 in ?? ()
#20 0x0000000001f4c560 in ?? ()
#21 0x000000004026c2b8 in ?? ()
#22 0x00007fffe53d3020 in ?? ()
#23 0x00007fffe53d2eb0 in ?? ()
#24 0x000000004026c1dc in ?? ()
#25 0x0000000100000000 in ?? ()
#26 0x0000000000000000 in ?? ()

=================================================================
Got a SIGABRT while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries
used by your application.
=================================================================
```

I tried looking at the code at ``mini-exceptions.c:834`` I'm confused.

We have...

```
/**
 * mono_walk_stack_full:
 * @func: callback to call for each stack frame
 * @domain: starting appdomain, can be NULL to use the current domain
 * @unwind_options: what extra information the unwinder should gather
 * @start_ctx: starting state of the stack walk, can be NULL.
 * @thread: the thread whose stack to walk, can be NULL to use the current thread
 * @lmf: the LMF of @thread, can be NULL to use the LMF of the current thread
 * @user_data: data passed to the callback
 *
 * This function walks the stack of a thread, starting from the state
 * represented by start_ctx. For each frame the callback
 * function is called with the relevant info. The walk ends when no more
 * managed stack frames are found or when the callback returns a TRUE value.
 */
static void
mono_walk_stack_full (MonoJitStackWalk func, MonoContext *start_ctx, MonoDomain *domain, MonoJitTlsData *jit_tls, MonoLMF *lmf, MonoUnwindOptions unwind_options, gpointer user_data)
{
    gint il_offset, i;
    MonoContext ctx, new_ctx;
    StackFrameInfo frame;
    gboolean res;
    mgreg_t *reg_locations [MONO_MAX_IREGS];
    mgreg_t *new_reg_locations [MONO_MAX_IREGS];
    gboolean get_reg_locations = unwind_options & MONO_UNWIND_REG_LOCATIONS;
    gboolean async = mono_thread_info_is_async_context ();

    g_assert (start_ctx);
    g_assert (domain);
```

The comments for the function say it is domain can be NULL but the code has ``g_assert (domain)`` which is asserting
that the domain is not null.

[1] https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=packages/mono
Comment 1 delcypher 2015-03-19 08:13:00 UTC
Created attachment 10409 [details]
A patch that works around the issue

This patch fixes the issue for me.
Comment 2 pr0vieh 2015-03-21 15:04:31 UTC
Confirmed!

I have the same issue.

can you create an PR with your fix ?
Comment 3 delcypher 2015-03-21 15:44:23 UTC
I could put I was hoping someone from the mono team could take a look at my patch. I'm not convinced it's the right solution.
Comment 4 delcypher 2015-03-21 20:24:06 UTC
Pull requested submitted

https://github.com/mono/mono/pull/1649
Comment 5 Zoltan Varga 2015-03-22 02:15:09 UTC
The patch looks harmless, but this situation shouldn't happen, 'domain' should be equal to mono_domain_get ().
Comment 6 Zoltan Varga 2015-03-22 02:25:41 UTC
Could you try adding this line to mono_thread_info_get_last_managed () in metadata/threads.c, just before the call to mono_walk_stack_with_state ():

	g_assert (info->suspend_state.valid);

and see if the assert fails in this case ?
Comment 7 Zoltan Varga 2015-03-22 02:27:44 UTC
Nm, mono_walk_stack_with_state () already contains that assert.
Comment 8 delcypher 2015-03-22 07:49:16 UTC
I guess discussion of this should continue on the pull request.
Comment 9 Alexander Kyte 2015-04-20 17:39:51 UTC
I've yet to be able to reproduce this myself. Any steps beyond simply calling Environment.Exit on linux? I put an app with that in a while-true loop in a bash script and ran 4 of them in parallel, and didn't see anything.

- Alex
Comment 10 delcypher 2015-04-22 14:43:35 UTC
We have our application in a Docker container where it is possible to reproduce the issue. You could try running that...

Unfortunately our application is rather complicated so even if you reproduce the issue tracking it down may be difficult but this might at least let you observe the issue.

$ docker pull delcypher/gpuverify-docker
$ docker run --rm -ti --entrypoint=/bin/bash delcypher/gpuverify-docker

now inside the container

$ cd gpuverify
$ ./gvtester.py -l ERROR testsuite/


This python script runs our application in parallel on various test cases. The ``-l ERROR`` command line option will suppress most output.

Almost every time I run the python script I will see a crash like this...

```
Stacktrace:

  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) System.Environment.Exit (int) <0xffffffff>
  at Microsoft.Boogie.GPUVerifyBoogieDriver.Main (string[]) <0x0043b>
  at (wrapper runtime-invoke) <Module>.runtime_invoke_void_object (object,intptr,intptr,intptr) <0xffffffff>

Native stacktrace:

        mono() [0x4accac]
        /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340) [0x7f7c88950340]
        /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x39) [0x7f7c885aecc9]
        /lib/x86_64-linux-gnu/libc.so.6(abort+0x148) [0x7f7c885b20d8]
        mono() [0x6232f9]
        mono() [0x623507]
        mono() [0x623656]
        mono() [0x4abac2]
        mono() [0x4abb1c]
        mono() [0x57e6b1]
        mono() [0x5827ab]
        mono() [0x582c5d]
        mono() [0x531d26]
        [0x4127dd22]

Debug info from gdb:


=================================================================
Got a SIGABRT while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries
used by your application.
=================================================================

Traceback (most recent call last):
  File "/home/gv/gpuverify/GPUVerify.py", line 878, in <module>
    rc = main(args, sys.stdout, sys.stderr)
  File "/home/gv/gpuverify/GPUVerify.py", line 861, in main
    returnCode = gv_instance.invoke()
  File "/home/gv/gpuverify/GPUVerify.py", line 602, in invoke
    return self.interpretBoogieDriverCrucherExitCode(success)
  File "/home/gv/gpuverify/GPUVerify.py", line 537, in interpretBoogieDriverCrucherExitCode
    assert False
AssertionError

GPUVerify kernel analyser finished with 1 verified, 0 errors
* Assertion at mini-exceptions.c:834, condition `domain' not met

```

Strangely enough I've noticed

* On Linux 3.19.3 (Arch Linux) with Docker 1.6.0 I can't reproduce the crashes
* On Linux 3.13.0-45-generic (Ubuntu 14.04) with Docker 1.6.0 I can reproduce the crashes

Let me know if you'd like any help using our Docker container.
Comment 11 Zoltan Varga 2015-05-26 11:17:28 UTC
Hopefully fixed by this patch:
https://github.com/mono/mono/commit/58064152f2f1dff813390e97ce9e8306079a4b54
Comment 12 delcypher 2015-05-26 13:12:15 UTC
@Zoltan Varga

Thanks. I'll try applying this on top of 3.12.1 and see if it fixes the issues.

Do you plan to incorporate my fix to mono_walk_stack_full() as well [1]? Even if you don't the comments for that function really need fixing because they don't match the implementation.


[1] https://github.com/delcypher/mono/commit/3348cd21b3a58f8a9111dc7f1f53afa3b209258a
Comment 13 Zoltan Varga 2015-05-30 08:55:58 UTC
Did the fix work ?
Comment 14 delcypher 2015-05-31 12:32:33 UTC
@Zoltan

Sorry for the delay. It's been a quite a while since I've looked at this so I needed to recreate the old environment I had.

Applying your patch to Mono 3.12.1 appears to fix the issues I was was seeing with ``System.Environment.Exit()``

I had not tried applying it to master or on Mono 4.0.1, although I can confirm the issue is present in mono 4.0.1.
Comment 15 delcypher 2015-06-04 02:22:10 UTC
I can also confirm that the patch fixes the issue in mono 4.0.1
Comment 16 Zoltan Varga 2015-06-04 09:39:05 UTC
-> FIXED.