Bug 57936 - Race conditions of mempool.c
Summary: Race conditions of mempool.c
Status: RESOLVED ANSWERED
Alias: None
Product: Runtime
Classification: Mono
Component: General (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- normal
Target Milestone: Future Cycle (TBD)
Assignee: Zoltan Varga
URL:
Depends on:
Blocks:
 
Reported: 2017-07-05 10:58 UTC by Armin
Modified: 2017-07-05 20:54 UTC (History)
5 users (show)

See Also:
Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
Stack traces (16.06 KB, text/plain)
2017-07-05 10:58 UTC, Armin
Details

Description Armin 2017-07-05 10:58:29 UTC
Created attachment 23297 [details]
Stack traces

As part of my Google Summer of Code project, I introduce clang's tsan (https://clang.llvm.org/docs/ThreadSanitizer.html) to Mono. While playing with tsan (with the latest version of the code in the master branch), I found an issue within mempool.c. As I understand, the main issue consists of multiple instructions that manipulate total_bytes_allocated and pool->d.allocated concurrently (without even a fixed execution order). To keep this report clear, I attached the two relevant stack traces.

gpointer mono_mempool_alloc (MonoMemPool *pool, guint size) {
  // ...
  guint new_size = SIZEOF_MEM_POOL + size;
  // ...
  pool->d.allocated += new_size;
  total_bytes_allocated += new_size;
  // ...
}

MonoMemPool *mono_mempool_new_size (int initial_size) {
  MonoMemPool *pool;
  initial_size = SIZEOF_MEM_POOL;
  pool = (MonoMemPool *)g_malloc (initial_size);
  pool->d.allocated = pool->size = initial_size;
  total_bytes_allocated += initial_size;
  // ...
}

void mono_mempool_destroy (MonoMemPool *pool) {
  MonoMemPool *p, *n;
  total_bytes_allocated -= pool->d.allocated;
  // ...
}

In addition, even the assignment within mono_mempool_new_size consists of multiple assignments, when compiling it with gcc or clang -O0:

MonoMemPool *mono_mempool_new_size (int initial_size) {
  // ...
  total_bytes_allocated += initial_size;
  // ...
}

gcc 6.3.1:

movq total_bytes_allocated(%rip), %rax
addq %rdx, %rax
movq %rax, total_bytes_allocated(%rip)

clang 3.9.1:

addq total_bytes_allocated, %rax
movq %rax, total_bytes_allocated
Comment 1 Ludovic Henry 2017-07-05 14:53:31 UTC
Hello,

Which version are you on?

`total_bytes_allocated` and `pool->d.allocated` manipulation indeed seems wrong, but it is mostly harmless, since these values are only used for some debugging, or some reporting.

Thank you,
Ludovic
Comment 2 Armin 2017-07-05 15:16:38 UTC
Hi,

sorry, I forgot to include that - this occurred with commit 13aeb8836a5c036a7b8918a2e53f58a10475069e on Fedora 25.

I already talked to Bernhard Urban about it as well and, as mutexes seem to be a rather unwanted trade-off here, we could annotate / whitelist these functions for tsan and add a short comment that explains why this race is acceptable?

Have a great day,
Armin
Comment 3 Zoltan Varga 2017-07-05 15:28:51 UTC
The runtime contains a number of places where we increase global counters without synchronization. Some of these can be changed to use an InterlockedAdd () if they're not called very often. The ones called often like the mempool counter probably need to stay as they are, and the tools need to be taught to ignore them using whitelisting or such.
Comment 4 Armin 2017-07-05 20:32:14 UTC
Can I annotate and comment the functions in the code or should we "stealthily" tell tsan to ignore it whenever we run it (in that case we should probably list all known + accepted races on the website)?
Comment 5 Ludovic Henry 2017-07-05 20:54:22 UTC
You should be as explicit as possible, so please annotate and comment the functions. Thank you!

Note You need to log in before you can comment on or make changes to this bug.