Bug 5363 - ManagedCollation problems with certain unicode characters
Summary: ManagedCollation problems with certain unicode characters
Alias: None
Product: Class Libraries
Classification: Mono
Component: mscorlib (show other bugs)
Version: 2.10.x
Hardware: PC Linux
: --- normal
Target Milestone: Future Release
Assignee: Bugzilla
Depends on:
Reported: 2012-05-27 19:47 UTC by Vidar
Modified: 2017-10-12 13:35 UTC (History)
3 users (show)

See Also:
Is this bug a regression?: ---
Last known good build:

Short program demonstrating bug. (10.56 KB, application/x-gzip)
2012-05-27 19:47 UTC, Vidar
Entire output when program crashes (1.39 KB, text/plain)
2012-05-27 19:51 UTC, Vidar
Entire output when program runs as intended (4.18 KB, text/plain)
2012-05-27 19:55 UTC, Vidar

Description Vidar 2012-05-27 19:47:56 UTC
Created attachment 1971 [details]
Short program demonstrating bug.

Ubuntu 12.04, mono/gmcs, also with older versions of Ubuntu and mono/gmcs back to 8.10 and 1, respectively (see https://bugzilla.novell.com/show_bug.cgi?id=485888)

The program in the test case (see attachment) adds words from a text file as keys in a SortedList, with an integer as value. When running a foreach loop to print out the values from the list, a KeyNotFoundException is thrown.

Reproducible: Always

Steps to Reproduce:
1. Put both files from the attachment in the same dir
2. Compile the test program (gmcs test.cs)
3. Run it (mono test.exe)

Actual Results:  
Program prints some of the values in the SortedList, one per line, then crashes, throwing a KeyNotFoundException

Expected Results:  
Program should print all 1312 integer values (as counted by "wc -l") in SortedList and exit without crashing

Environment variable LANG currently set to "en_DK.utf8". Also tried setting it to "en_US.utf8" and "dk_DK.utf8" without success, however setting it to "nb_NO.utf8" or "nn_NO.utf8" allows the program to run without crashing.

Dispensing with the input file and entering only the key that causes the crash into the source code, e.g "words.Add("fåe", 5)", also works.

(Using Windows, the program runs without issues in VS2008, newer versions have not been tried.)
Comment 1 Vidar 2012-05-27 19:51:20 UTC
Created attachment 1974 [details]
Entire output when program crashes
Comment 2 Vidar 2012-05-27 19:55:07 UTC
Created attachment 1975 [details]
Entire output when program runs as intended
Comment 3 Zoltan Varga 2012-05-29 07:53:45 UTC
This seems like a string collation problem:

using System;
using System.Collections.Generic;
using System.IO;

public class Tests
	public static void Main (String[] args) {
		SortedList<String, int> words = new SortedList<String, int>();

		string s1 = "fær";
		string s2 = "fåe";

		Console.WriteLine (s1.CompareTo (s2));
		Console.WriteLine (s2.CompareTo (s1));
		words.Add (s1, 0);
		words.Add (s2, 0);
		string last_w = null;
		foreach (string w in words.Keys) {
			if (last_w != null) {
				Console.WriteLine (last_w.CompareTo (w));
				if (last_w.CompareTo (w) >= 0)
					throw new Exception (w);
			last_w = w;

Notice that the comparison of s1 and s2 returns 1 both ways, which doesn't seem right. This confuses SortedList.
Comment 4 Rodrigo Kumpera 2013-01-11 16:52:18 UTC
SortedList requires you to provide monotonically comparable objects. If your collation makes string comparison to not work this way there's nothing to be done.
Comment 5 Vidar 2013-01-14 02:29:00 UTC
I just now installed the latest versions of MS .NET, MonoDevelop and Mono in Windows 7. Using MonoDevelop and MS .NET the test program supplied by Zoltan Varga above ran fine. Switching to the Mono runtime, the program crashed. Clearly something can be done, because the MS runtime produced the expected result.
Comment 6 Marek Safar 2017-10-12 13:35:48 UTC
This also works with netcore on mac

Note You need to log in before you can comment on or make changes to this bug.