Bug 23771 - UTF8 Decoder's Convert does not keep internal state between calls when 'flush' parameter is false
Summary: UTF8 Decoder's Convert does not keep internal state between calls when 'flush...
Status: VERIFIED FIXED
Alias: None
Product: Class Libraries
Classification: Mono
Component: mscorlib (show other bugs)
Version: unspecified
Hardware: Other Other
: --- normal
Target Milestone: 4.2.0 (C6)
Assignee: Bugzilla
URL:
Depends on:
Blocks:
 
Reported: 2014-10-13 01:10 UTC by Yusuke Fujiwara
Modified: 2016-04-13 08:16 UTC (History)
5 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:

Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
VERIFIED FIXED

Description Yusuke Fujiwara 2014-10-13 01:10:34 UTC
Version: 3.10.0 # not appeared in the dropdown
Hardware: iPad, Genymotion
OS: Android, iOS

There are 3 problems in UTF-8 Decoder.Convert to prevent streaming decoding and non-ASCII char support in Mono 3.10.0 (of latest Xamarin Android/Xamarin iOS).

1) bytesUsed and charsUsed out parameters should return really used counts instead of possible used counts. They are used to shift byte/char offset in next call.
2) completed out parameter should return which the bytes parameter's contents are fully used or not.
3) Decoding state must be preserved in the decoder to enable streaming decoding. This is important to decode 'multi-byte' chars like non-ASCII area of UTF-8. 

Following code is test code to reproduce it (note that this code will pass in desktop CLR and previous Mono):

[Test]
public void ReproDecoderIssue()
{
    var input = "\u733F"; // 'mono' on Japanese, 3bytes in UTF-8.
    var encoded = Encoding.UTF8.GetBytes(input);
    var decoder = Encoding.UTF8.GetDecoder();
    var chars = new char[ 10 ]; // Just enough space to decode.
    var result = new StringBuilder();
    var bytes = new byte[ 1 ]; // Simulates chunked input bytes.
    // Specify encoded bytes separetely.
    foreach ( var b in encoded )
    {
        bytes[ 0 ] = b;
        int bytesUsed, charsUsed;
        bool completed;
        decoder.Convert( bytes, 0, bytes.Length, chars, 0, chars.Length, false, out bytesUsed, out charsUsed, out completed );
        result.Append( chars, 0, charsUsed );
        // Expected outputs are written in bottom.
        Debug.Print( "bytesUsed:{0}, charsUsed:{1}, completed:{2}, result:'{3}'", bytesUsed, charsUsed, completed, result );
    }

    // Expected: NO assertion error.
    Assert.That( result.ToString(), Is.EqualTo( input ) );

    /*
     * Expected Debug outputs are:
     * bytesUsed:1, charsUsed:0, completed:True, result:''
     * bytesUsed:1, charsUsed:0, completed:True, result:''
     * bytesUsed:1, charsUsed:1, completed:True, result:'猿'
     * 
     * -- Note: '猿' is U+733F (1char in UTF-16)
     * 
     * Actual Debug output are:
     * bytesUsed:3, charsUsed:1, completed:False, result:'�'
     * bytesUsed:3, charsUsed:1, completed:False, result:'��'
     * bytesUsed:3, charsUsed:1, completed:False, result:'���'
     * 
     * All output parameters are not match.
     * -- Note: '�' is decoder fallback char (U+FFFD)
     */
}
// end of test code

This issue might be related to bug #10692, but I'm not sure so.
Comment 1 Atsushi Eno 2015-02-10 14:57:14 UTC
Once we could bring referencesource UTF8Encoding, it will get fixed. I just verified with my ongoing attempt to do so.
https://github.com/atsushieno/mono/tree/import-text-encoding

(Still several fixes are needed to get it working.)
Comment 2 Atsushi Eno 2015-02-16 02:45:32 UTC
UTF8Encoding and co. are now based on referencesource and it's fixed. Thanks for the report.

[master 90b11244]
Comment 3 Shruti 2016-04-13 08:16:01 UTC
I have checked this issue with C7 sku alignment builds and observed that I am getting expected behaviour as given in comment(0)
* Expected Debug outputs are:
     * bytesUsed:1, charsUsed:0, completed:True, result:''
     * bytesUsed:1, charsUsed:0, completed:True, result:''
     * bytesUsed:1, charsUsed:1, completed:True, result:'猿'

Screencast:http://www.screencast.com/t/jOx6BBfxGZ
Environment Info: https://gist.github.com/Shruti360/950884068499e51af12288616ad6c43c

Hence, Closing this issue.