Bug 31635 - UnixMarshal.PtrToString fails with UTF32Encoding
Summary: UnixMarshal.PtrToString fails with UTF32Encoding
Alias: None
Product: Class Libraries
Classification: Mono
Component: Mono.POSIX ()
Version: master
Hardware: PC Linux
: --- normal
Target Milestone: Untriaged
Assignee: Bugzilla
Depends on:
Reported: 2015-07-03 14:14 UTC by Ondřej Hošek
Modified: 2015-07-06 06:41 UTC (History)
2 users (show)

Is this bug a regression?: ---
Last known good build:

patch special-casing UTF32Encoding as UTF8Encoding, UnicodeEncoding, etc. are (771 bytes, patch)
2015-07-03 14:14 UTC, Ondřej Hošek
test program for unmarshaling UTF-32 strings (1.79 KB, text/plain)
2015-07-03 14:15 UTC, Ondřej Hošek

Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.

Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:

Description Ondřej Hošek 2015-07-03 14:14:21 UTC
Created attachment 11854 [details]
patch special-casing UTF32Encoding as UTF8Encoding, UnicodeEncoding, etc. are

UnixMarshal.PtrToString(IntPtr, Encoding) does not work correctly if the Encoding argument is an instance of UTF32Encoding.

PtrToString calls GetStringByteLength, which doesn't special-case UTF32Encoding and thus calls GetRandomBufferLength. However, encoding.GetMaxByteCount(1) returns 8 for instances of UTF32Encoding (apparently because some specific input of one char can return two UTF-32 units, i.e. 8 bytes). Thus, GetRandomBufferLength reads until encountering 8 NUL bytes instead of the requisite 4.

Since GetStringByteLength contains special cases for UTF-8, UTF-7, UTF-16, ASCII and UnixEncoding, it probably makes sense to special-case UTF-32 too, as I have done in the attached patch.

The attached BrokenUtf32DecodingTestProgram.cs is a test application that marshals a byte array containing (among some garbage) a UTF-32 string, then tries to unmarshal it using PtrToString.
Comment 1 Ondřej Hošek 2015-07-03 14:15:03 UTC
Created attachment 11855 [details]
test program for unmarshaling UTF-32 strings
Comment 2 Ondřej Hošek 2015-07-03 17:32:57 UTC
Opened a pull request on Github: https://github.com/mono/mono/pull/1913
Comment 3 Ondřej Hošek 2015-07-06 06:41:12 UTC
Merged in f7816d844aa482f3682efc3111f9001f0994b2d7.