Bug 31635

Summary: UnixMarshal.PtrToString fails with UTF32Encoding
Product: [Mono] Class Libraries Reporter: Ondřej Hošek <ondra.hosek>
Component: Mono.POSIXAssignee: Bugzilla <bugzilla>
Severity: normal CC: meebey, mono-bugs+mono
Priority: ---    
Version: master   
Target Milestone: Untriaged   
Hardware: PC   
OS: Linux   
Tags: Is this bug a regression?: ---
Last known good build:
Attachments: patch special-casing UTF32Encoding as UTF8Encoding, UnicodeEncoding, etc. are
test program for unmarshaling UTF-32 strings

Description Ondřej Hošek 2015-07-03 14:14:21 UTC
Created attachment 11854 [details]
patch special-casing UTF32Encoding as UTF8Encoding, UnicodeEncoding, etc. are

UnixMarshal.PtrToString(IntPtr, Encoding) does not work correctly if the Encoding argument is an instance of UTF32Encoding.

PtrToString calls GetStringByteLength, which doesn't special-case UTF32Encoding and thus calls GetRandomBufferLength. However, encoding.GetMaxByteCount(1) returns 8 for instances of UTF32Encoding (apparently because some specific input of one char can return two UTF-32 units, i.e. 8 bytes). Thus, GetRandomBufferLength reads until encountering 8 NUL bytes instead of the requisite 4.

Since GetStringByteLength contains special cases for UTF-8, UTF-7, UTF-16, ASCII and UnixEncoding, it probably makes sense to special-case UTF-32 too, as I have done in the attached patch.

The attached BrokenUtf32DecodingTestProgram.cs is a test application that marshals a byte array containing (among some garbage) a UTF-32 string, then tries to unmarshal it using PtrToString.
Comment 1 Ondřej Hošek 2015-07-03 14:15:03 UTC
Created attachment 11855 [details]
test program for unmarshaling UTF-32 strings
Comment 2 Ondřej Hošek 2015-07-03 17:32:57 UTC
Opened a pull request on Github: https://github.com/mono/mono/pull/1913
Comment 3 Ondřej Hošek 2015-07-06 06:41:12 UTC
Merged in f7816d844aa482f3682efc3111f9001f0994b2d7.