Bug 59875 - Data races in sgen-fin-weak-hash.c
Summary: Data races in sgen-fin-weak-hash.c
Status: NEW
Alias: None
Product: Runtime
Classification: Mono
Component: GC (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- normal
Target Milestone: ---
Assignee: Bugzilla
URL:
Depends on:
Blocks:
 
Reported: 2017-09-30 01:08 UTC by Armin
Modified: 2017-10-17 20:22 UTC (History)
3 users (show)

See Also:
Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
Stack traces (47.48 KB, text/plain)
2017-09-30 01:08 UTC, Armin
Details

Description Armin 2017-09-30 01:08:40 UTC
Created attachment 25045 [details]
Stack traces

I can pinpoint some (possibly all) races that happen in 3741d16503a973a99c724bdab9a255a5f07a3637 in combination with `add_stage_entry ()` and `process_stage_entries ()`. I will discuss the races here, please find the corresponding reports and stack traces attached.

To get these results, I stopped the GC from collecting the entries (immediately returned from `sgen_finalize_if ()` without executing anything) and lowered the number of entries to 100. Due to that, the draining process was started by `sgen_object_register_for_finalization ()` in a very predictable manner; and this is the root of all the reported data races: while `process_stage_entries ()` is wrapped into `LOCK_GC ... UNLOCK_GC`, `add_stage_entry ()` does not know about that (by design).

One thing that can occur is that two threads pick the same `next_entry`. One pauses here (let's call it "T1"), the other one stores its data successfully. More stuff happens, then the entries get drained due to an overflow and "T1" finally continues. At this point, there is no synchronisation between "T1" (checking `entries [index].state != STAGE_ENTRY_FREE`) and the thread that drains the entries (putting `entries [i].state = STAGE_ENTRY_FREE`) which results in #3 (and, vice versa, in a delayed way, in #5). This should not be a problem since `state` is `gint32` and these values are read and written within one instruction on "casual" platforms.

Of course, #4 occurs, when `process_stage_entries ()` sets `*next_entry = 0`. This happens without synchronisation between `process_stage_entries ()` and `add_stage_entry ()` (which reads this value quite often). This is a race (by design) but it is `gint32` again, and can therefore be considered to be harmless.

The other reports (#01, #02, #06, #07, #08, #09) look like false positives to me as TSan seems to have issues with understanding the data flow between `process_stage_entries ()` and `add_stage_entry ()´. With #06 - #09, the message "As if synchronized via sleep" seems to be spot on. #01 and #02 seem to be connected to the unsynchronised read of `state` in `process_stage_entries ()` (technicaly, that's another `gint32` data race): interlocking it (`state = InterlockedRead (&entries [i].state)`) mute these reports.

As far as I can tell, all memory barriers are in place, and I am unable to find a way in which any of these reports could be connected to a harmful execution order (besides having the described `gint32` races).

Note You need to log in before you can comment on or make changes to this bug.