Bug 30553 - Timing/race conditions in single core tests
Summary: Timing/race conditions in single core tests
Status: NEW
Alias: None
Product: Runtime
Classification: Mono
Component: GC (show other bugs)
Version: 4.0.0
Hardware: PC Linux
: --- normal
Target Milestone: ---
Assignee: Bugzilla
URL:
Depends on:
Blocks:
 
Reported: 2015-05-28 14:54 UTC by Neale Ferguson
Modified: 2015-05-28 14:54 UTC (History)
2 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
Backtrace of execution of sleep.exe showing where thread 1 is attempting to compile the call to stopwatch (7.68 KB, application/octet-stream)
2015-05-28 14:54 UTC, Neale Ferguson
Details


Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report for Bug 30553 on GitHub or Developer Community if you have new information to add and do not yet see a matching new report.

If the latest results still closely match this report, you can use the original description:

  • Export the original title and description: GitHub Markdown or Developer Community HTML
  • Copy the title and description into the new report. Adjust them to be up-to-date if needed.
  • Add your new information.

In special cases on GitHub you might also want the comments: GitHub Markdown with public comments

Related Links:
Status:
NEW

Description Neale Ferguson 2015-05-28 14:54:05 UTC
Created attachment 11376 [details]
Backtrace of execution of sleep.exe showing where thread 1 is attempting to compile the call to stopwatch

I have been experiencing some failures with the tests in mono/tests,
particularly in a single core configuration.

Firstly, the sleep test: when the delegated thread is started, the main
thread goes to call the StopWatch start method which requires JITting.
This involves gc interaction as objects are allocated. However, the
delegated thread gets up and starts issuing GC.Collection() calls which
end up occurring every 50 microseconds. This means the main thread never
gets a chance to get out of the allocation phase so never gets to execute
the stopwatch start, thread sleep etc. so the thread never ends. In a
multi-core configuration this is not a problem and the test passes. I
found by inserting a Thread.Yield() as the first method called in the
delegated thread eliminates the problem [1]. Perhaps there need to be 
additional areas that are defined "critical" so that an attempt to suspend
the thread is not attempted.

Secondly, the xxxxx-exit (e.g. thread-exit) tests will occasionally fail
with an abort due to "suspend_thread suspend took xxx ms, which is more
than the allowed 200 ms” where xxx exceeds 200. This seems to be due to
the exiting thread sometimes not getting to the stage of setting the
thread->state to ThreadState_Stopped in the
ves_icall_System_Environment_Exit() processing within the 200ms time
period. Again, with multiple cores this is not a problem (or the problem
is much rarer). I found by inserting a mono_thread_info_yield() prior to
the suspend_internal_thread() in mono_thread_suspend_all_other_threads()
fixes the problem [2]. I am not sure this is the best option and it’s
still theoretically possible for the problem to still occur depending on
how heavily the system is loaded. I was wondering if the setting of the
state to ThreadState_stopped could be moved earlier in the process rather
than in thread_cleanup() or if there’s another alternative.

While the occasional failure has been experienced with some of the more
pathological tests, the trouble is they happen nearly 100% of the time on
a single core virtual machine, less often on a 2 core but in a virtual
machine environment where there may be 100s of virtual machines competing
for the real cores the probability of failure increases. In addition tests
in the main test suite also have failed for the same reason as described
in the second case.

Neale

[1] Circumvention for case 1 -

--- a/mono/tests/sleep.cs
+++ b/mono/tests/sleep.cs
@@ -13,6 +13,7 @@ public class Tests
        public static int test_0_time_drift () {
                // Test the Thread.Sleep () is able to deal with time
drifting due to interrupts
                Thread t = new Thread (delegate () {
+                               Thread.Yield();
                                while (!finished)
                                        GC.Collect ();
                        });

[2] Circumvention for case 2 -

--- a/mono/metadata/threads.c
+++ b/mono/metadata/threads.c

@@ -3132,6 +3147,8 @@ void mono_thread_suspend_all_other_threads (void)
                        UNLOCK_THREAD (thread);
+                       mono_thread_info_yield ();
+
                        /* Signal the thread to suspend */
                        suspend_thread_internal (thre