Bug 57936 - Race conditions of mempool.c
Summary: Race conditions of mempool.c
Status: RESOLVED ANSWERED
Alias: None
Product: Runtime
Classification: Mono
Component: General (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- normal
Target Milestone: Future Cycle (TBD)
Assignee: Zoltan Varga
URL:
Depends on:
Blocks:
 
Reported: 2017-07-05 10:58 UTC by Armin
Modified: 2017-07-05 20:54 UTC (History)
5 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
Stack traces (16.06 KB, text/plain)
2017-07-05 10:58 UTC, Armin
Details


Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED ANSWERED

Description Armin 2017-07-05 10:58:29 UTC
Created attachment 23297 [details]
Stack traces

As part of my Google Summer of Code project, I introduce clang's tsan (https://clang.llvm.org/docs/ThreadSanitizer.html) to Mono. While playing with tsan (with the latest version of the code in the master branch), I found an issue within mempool.c. As I understand, the main issue consists of multiple instructions that manipulate total_bytes_allocated and pool->d.allocated concurrently (without even a fixed execution order). To keep this report clear, I attached the two relevant stack traces.

gpointer mono_mempool_alloc (MonoMemPool *pool, guint size) {
  // ...
  guint new_size = SIZEOF_MEM_POOL + size;
  // ...
  pool->d.allocated += new_size;
  total_bytes_allocated += new_size;
  // ...
}

MonoMemPool *mono_mempool_new_size (int initial_size) {
  MonoMemPool *pool;
  initial_size = SIZEOF_MEM_POOL;
  pool = (MonoMemPool *)g_malloc (initial_size);
  pool->d.allocated = pool->size = initial_size;
  total_bytes_allocated += initial_size;
  // ...
}

void mono_mempool_destroy (MonoMemPool *pool) {
  MonoMemPool *p, *n;
  total_bytes_allocated -= pool->d.allocated;
  // ...
}

In addition, even the assignment within mono_mempool_new_size consists of multiple assignments, when compiling it with gcc or clang -O0:

MonoMemPool *mono_mempool_new_size (int initial_size) {
  // ...
  total_bytes_allocated += initial_size;
  // ...
}

gcc 6.3.1:

movq total_bytes_allocated(%rip), %rax
addq %rdx, %rax
movq %rax, total_bytes_allocated(%rip)

clang 3.9.1:

addq total_bytes_allocated, %rax
movq %rax, total_bytes_allocated
Comment 1 Ludovic Henry 2017-07-05 14:53:31 UTC
Hello,

Which version are you on?

`total_bytes_allocated` and `pool->d.allocated` manipulation indeed seems wrong, but it is mostly harmless, since these values are only used for some debugging, or some reporting.

Thank you,
Ludovic
Comment 2 Armin 2017-07-05 15:16:38 UTC
Hi,

sorry, I forgot to include that - this occurred with commit 13aeb8836a5c036a7b8918a2e53f58a10475069e on Fedora 25.

I already talked to Bernhard Urban about it as well and, as mutexes seem to be a rather unwanted trade-off here, we could annotate / whitelist these functions for tsan and add a short comment that explains why this race is acceptable?

Have a great day,
Armin
Comment 3 Zoltan Varga 2017-07-05 15:28:51 UTC
The runtime contains a number of places where we increase global counters without synchronization. Some of these can be changed to use an InterlockedAdd () if they're not called very often. The ones called often like the mempool counter probably need to stay as they are, and the tools need to be taught to ignore them using whitelisting or such.
Comment 4 Armin 2017-07-05 20:32:14 UTC
Can I annotate and comment the functions in the code or should we "stealthily" tell tsan to ignore it whenever we run it (in that case we should probably list all known + accepted races on the website)?
Comment 5 Ludovic Henry 2017-07-05 20:54:22 UTC
You should be as explicit as possible, so please annotate and comment the functions. Thank you!