Bug 59875 - Data races in sgen-fin-weak-hash.c
Summary: Data races in sgen-fin-weak-hash.c
Status: NEW
Alias: None
Product: Runtime
Classification: Mono
Component: GC (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- normal
Target Milestone: ---
Assignee: Bugzilla
URL:
Depends on:
Blocks:
 
Reported: 2017-09-30 01:08 UTC by Armin
Modified: 2017-10-17 20:22 UTC (History)
3 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
Stack traces (47.48 KB, text/plain)
2017-09-30 01:08 UTC, Armin
Details


Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report for Bug 59875 on GitHub or Developer Community if you have new information to add and do not yet see a matching new report.

If the latest results still closely match this report, you can use the original description:

  • Export the original title and description: GitHub Markdown or Developer Community HTML
  • Copy the title and description into the new report. Adjust them to be up-to-date if needed.
  • Add your new information.

In special cases on GitHub you might also want the comments: GitHub Markdown with public comments

Related Links:
Status:
NEW

Description Armin 2017-09-30 01:08:40 UTC
Created attachment 25045 [details]
Stack traces

I can pinpoint some (possibly all) races that happen in 3741d16503a973a99c724bdab9a255a5f07a3637 in combination with `add_stage_entry ()` and `process_stage_entries ()`. I will discuss the races here, please find the corresponding reports and stack traces attached.

To get these results, I stopped the GC from collecting the entries (immediately returned from `sgen_finalize_if ()` without executing anything) and lowered the number of entries to 100. Due to that, the draining process was started by `sgen_object_register_for_finalization ()` in a very predictable manner; and this is the root of all the reported data races: while `process_stage_entries ()` is wrapped into `LOCK_GC ... UNLOCK_GC`, `add_stage_entry ()` does not know about that (by design).

One thing that can occur is that two threads pick the same `next_entry`. One pauses here (let's call it "T1"), the other one stores its data successfully. More stuff happens, then the entries get drained due to an overflow and "T1" finally continues. At this point, there is no synchronisation between "T1" (checking `entries [index].state != STAGE_ENTRY_FREE`) and the thread that drains the entries (putting `entries [i].state = STAGE_ENTRY_FREE`) which results in #3 (and, vice versa, in a delayed way, in #5). This should not be a problem since `state` is `gint32` and these values are read and written within one instruction on "casual" platforms.

Of course, #4 occurs, when `process_stage_entries ()` sets `*next_entry = 0`. This happens without synchronisation between `process_stage_entries ()` and `add_stage_entry ()` (which reads this value quite often). This is a race (by design) but it is `gint32` again, and can therefore be considered to be harmless.

The other reports (#01, #02, #06, #07, #08, #09) look like false positives to me as TSan seems to have issues with understanding the data flow between `process_stage_entries ()` and `add_stage_entry ()´. With #06 - #09, the message "As if synchronized via sleep" seems to be spot on. #01 and #02 seem to be connected to the unsynchronised read of `state` in `process_stage_entries ()` (technicaly, that's another `gint32` data race): interlocking it (`state = InterlockedRead (&entries [i].state)`) mute these reports.

As far as I can tell, all memory barriers are in place, and I am unable to find a way in which any of these reports could be connected to a harmful execution order (besides having the described `gint32` races).