Bug 58032 - Race conditions of threadpool-worker-default.c
Summary: Race conditions of threadpool-worker-default.c
Alias: None
Product: Runtime
Classification: Mono
Component: General (show other bugs)
Version: master
Hardware: PC Linux
: --- normal
Target Milestone: Future Cycle (TBD)
Assignee: Ludovic Henry
Depends on:
Reported: 2017-07-08 20:06 UTC by Armin
Modified: 2017-07-10 15:27 UTC (History)
3 users (show)

See Also:
Is this bug a regression?: ---
Last known good build:

Stack traces (29.17 KB, text/plain)
2017-07-08 20:06 UTC, Armin

Description Armin 2017-07-08 20:06:50 UTC
Created attachment 23422 [details]
Stack traces

In the course of my work with clang's ThreadSanitizer (my GSoC project) I have found three (connected) races within threadpool-worker-default.c (commit 57239e923f3f5942d62e058d7fe4dc01f67c0735) and I am not entirely sure if they are all harmless. I have attached the three corresponding stack traces and I will also try to reconstruct what I think is going on.
It all starts here:

static ThreadPoolWorker worker;

static void heuristic_notify_work_completed (void) {
  // ...
  worker.heuristic_last_dequeue = mono_msec_ticks ();
  if (heuristic_should_adjust ())
    heuristic_adjust ();

`worker.heuristic_last_dequeue` gets updated from at least two different threads concurrently without synchronisation. While this might be dismissible, I would at least document it to a minimal extent, as the value looks monotonic, which might not be true.

Next, there is the call of `heuristic_should_adjust ()`:

static gboolean heuristic_should_adjust (void) {
  if (worker.heuristic_last_dequeue > worker.heuristic_last_adjustment + worker.heuristic_adjustment_interval) {
    // ...
  // ...

`worker.heuristic_last_dequeue` is read and might have just been reset to an unexpected (higher / lower) number by another thread. However, interestingly, I have not found a warning about this particular write/read by tsan yet (I will keep looking).

In addition, `worker.heuristic_last_adjustment` and `worker.heuristic_adjustment_interval` are read and might have been (independently) set by another thread that runs the last piece of this puzzle - `heuristic_adjust ()` and `hill_climbing_update ()`:

static void heuristic_adjust (void) {
  if (mono_coop_mutex_trylock (&worker.heuristic_lock) == 0) {
    // ...
    new_thread_count = hill_climbing_update (counter._.max_working, sample_duration, completions, &worker.heuristic_adjustment_interval);
    // ...
    worker.heuristic_last_adjustment = mono_msec_ticks ();
    // ...
    mono_coop_mutex_unlock (&worker.heuristic_lock);

static gint16 hill_climbing_update (gint16 current_thread_count, guint32 sample_duration, gint32 completions, gint64 *adjustment_interval) {
  // ...
  *adjustment_interval = hc->current_sample_interval;
  // ...

`heuristic_adjust ()` uses a mutex; however, this does not prevent `heuristic_should_adjust ()` from reading (partly) updated values of `worker.heuristic_last_adjustment` and `worker.heuristic_adjustment_interval`.

It looks to me as if this whole situation could use a little more synchronisation.

Note You need to log in before you can comment on or make changes to this bug.