Bug 60422 - Native crash due to unhandled errno values when accessing files on networked filesystems
Summary: Native crash due to unhandled errno values when accessing files on networked ...
Status: RESOLVED FIXED
Alias: None
Product: Runtime
Classification: Mono
Component: General (show other bugs)
Version: 5.2 (2017-04)
Hardware: PC Linux
: --- normal
Target Milestone: ---
Assignee: Bugzilla
URL:
Depends on:
Blocks:
 
Reported: 2017-10-27 20:34 UTC by Brian Luczkiewicz
Modified: 2017-11-15 19:05 UTC (History)
5 users (show)

See Also:
Tags:
Is this bug a regression?: ---
Last known good build:


Attachments

Description Brian Luczkiewicz 2017-10-27 20:34:37 UTC
This is very similar to https://bugzilla.xamarin.com/show_bug.cgi?id=58738 but we are hitting another case.

We have a mono-based process that calls File.Exists() on a path located on a samba share every few seconds, and sometimes performs other interactions (open, read) periodically with that file based on the results.

If the samba server is powered off while our process is running, the process crashes. All of the crashes occur in mono/metadata/w32error-unix.c in the mono_w32error_unix_to_win32 function because of unhandled errno values in that switch statement.

This bug affects Mac + Linux for us. We are seeing it on mono 5.2 and 5.4, and based on the current code checked into master, we expect that this issue is open for newer builds as well.

---[ Linux ]------------------------------------------------

This is what we see:

Stacktrace:

w32error-unix.c: unknown error (112) "Host is down"
  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) System.IO.MonoIO.GetFileStat (string,System.IO.MonoIOStat&,System.IO.MonoIOError&) [0x00009] in <9790d962aaad40deb63d33029ba0d2f6>:0
  at System.IO.File.FillAttributeInfo (string,System.IO.MonoIOStat&,bool,bool) [0x00009] in <9790d962aaad40deb63d33029ba0d2f6>:0
  at System.IO.FileSystemInfo.Refresh () [0x00000] in <9790d962aaad40deb63d33029ba0d2f6>:0
  at System.IO.DirectoryInfo.get_Exists () [0x00009] in <9790d962aaad40deb63d33029ba0d2f6>:0
  at Roon.DirectoryWatcher._CheckAvailable () [0x00091] in /home/ben/roon/FileSystem/DirectoryWatcher.cs:161
  at Roon.DirectoryWatcher.<.ctor>b__16_0 (object) [0x00001] in /home/ben/roon/FileSystem/DirectoryWatcher.cs:68
  at System.Threading.Timer/Scheduler.TimerCB (object) [0x00007] in <9790d962aaad40deb63d33029ba0d2f6>:0
  at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem () [0x00008] in <9790d962aaad40deb63d33029ba0d2f6>:0
  at System.Threading.ThreadPoolWorkQueue.Dispatch () [0x00074] in <9790d962aaad40deb63d33029ba0d2f6>:0
  at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback () [0x00000] in <9790d962aaad40deb63d33029ba0d2f6>:0
  at (wrapper runtime-invoke) <Module>.runtime_invoke_bool (object,intptr,intptr,intptr) [0x0001f] in <9790d962aaad40deb63d33029ba0d2f6>:0

This is the result of stat or lstat returning EHOSTDOWN, which is indeed left unhandled in w32error-unix.c


---[ OS X ]------------------------------------------------

On OS X we also sometimes get crashes in mono_w32error_unix_to_win32 in other calls like Open:

	0   roon                                0x0000000108ecfc91 mono_handle_native_crash + 257
	1   roon                                0x0000000108f35af6 altstack_handle_and_restore + 70
	2   libsystem_c.dylib                   0x00007fff7e4c3452 strlen + 18
	3   libsystem_c.dylib                   0x00007fff7e507c55 __vfprintf + 4711
	4   libsystem_c.dylib                   0x00007fff7e52f0a9 __v2printf + 473
	5   libsystem_c.dylib                   0x00007fff7e506600 _vasprintf + 370
	6   roon                                0x00000001090be325 monoeg_g_log + 181
	7   roon                                0x0000000108f517dd mono_w32error_unix_to_win32 + 141
	8   roon                                0x0000000108f4ad6d _wapi_set_last_path_error_from_errno + 61
	9   roon                                0x0000000108f4a6c0 mono_w32file_create + 912
	10  roon                                0x0000000108f7f22c ves_icall_System_IO_MonoIO_Open + 364
	11  ???                                 0x0000000109f4dce5 0x0 + 4462009573

and Read:

	0   roon                                0x0000000105e24c91 mono_handle_native_crash + 257
	1   roon                                0x0000000105e8aaf6 altstack_handle_and_restore + 70
	2   libsystem_c.dylib                   0x00007fff7e4c3452 strlen + 18
	3   libsystem_c.dylib                   0x00007fff7e507c55 __vfprintf + 4711
	4   libsystem_c.dylib                   0x00007fff7e52f0a9 __v2printf + 473
	5   libsystem_c.dylib                   0x00007fff7e506600 _vasprintf + 370
	6   roon                                0x0000000106013325 monoeg_g_log + 181
	7   roon                                0x0000000105ea67dd mono_w32error_unix_to_win32 + 141
	8   roon                                0x0000000105ea4a02 file_read + 258
	9   roon                                0x0000000105ed432d ves_icall_System_IO_MonoIO_Read + 125
	10  ???                                 0x000000010880ca05 0x0 + 4437625349

We have seen these associated with ENOTCONN

-----------------------------------------

Comments:

It is clear that this mapping function was written without consideration for networked filesystems and their failure modes.
Based on the ticket linked above, it seems that the bug-fixing tactic in this area recently has been to add error codes as one-offs as failure cases are reported. I think it is worth reconsidering this approach and implementing a more thorough fix to this class of problem. 

We do not think that the framework should be taking down the entire process because of a networking failure. Networking failures are expected behavior in all network systems, and it should be the job of the application, not the framework, to decide how best to handle them. Blowing up the whole process is extreme overkill for something as mundane and common as ENOTCONN or EHOSTDOWN.


Recommendations:

(1) At minimum, map ENOTCONN and EHOSTDOWN to win32 error codes. I propose ERROR_DEV_NOT_EXIST 
(2) Perform an audit of errno values on Linux and OS X and ensure that as many as possible are being mapped. Other candidates that I could see being a problem: ECONNREFUSED, EHOSTUNREACH, ETIMEDOUT, ESHUTDOWN, ECONNRESET, ECONNABORTED, EADDRNOTAVAILABLE, ENETDOWN, ENETUNREACH, ENETRESET. This list isn't meant to be comprehensive.
(3) Marshal unanticipated error codes to ERROR_CAN_NOT_COMPLETE or something similarly broad, while logging a warning instead of blowing up the world with g_error.
Comment 1 Zoltan Varga 2017-11-02 03:03:42 UTC
https://github.com/mono/mono/pull/5927

Note You need to log in before you can comment on or make changes to this bug.