Bug 24968 - All characters are allowed as Unicode escape sequences within identifiers
Summary: All characters are allowed as Unicode escape sequences within identifiers
Status: RESOLVED FIXED
Alias: None
Product: Compilers
Classification: Mono
Component: C# (show other bugs)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: Marek Safar
URL:
Depends on:
Blocks:
 
Reported: 2014-12-01 17:28 UTC by Jon Skeet
Modified: 2015-05-18 08:05 UTC (History)
2 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:

Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED FIXED

Description Jon Skeet 2014-12-01 17:28:15 UTC
Description of Problem: The spec allows Unicode escape sequences within identifiers, but only if the escaped character is a valid character within the identifier. mcs allows *any* character


Steps to reproduce the problem:
1. Compile this line of code within a method:

    string x\u0020y = "";

Actual Results: It compiles, declaring a variable with identifier xy


Expected Results: Compile-time error.


How often does this happen? Always.


Additional Information:

Validated against version 3.3.0.0 (not listed in the version list above!) and against head.

Piece of code in error:
https://github.com/mono/mono/blob/effa4c07ba850bedbe1ff54b2a5df281c058ebcb/mcs/mcs/cs-tokenizer.cs#L3068

(If the escaped character isn't a valid identifier, it should put it back and break out of the loop.)
Comment 1 Marcin Kolny 2015-02-25 16:22:07 UTC
Right. According to documentation: https://msdn.microsoft.com/en-us/library/aa664670.aspx in the identifier can be used any unicode character from following character classes: Lu, Ll, Lt, Lm, Lo, Nl, Mn, Mc, Nd, Pc, and Cf (except first letter, which should be one of the character from classes: Lu, Ll, Lt, Lm, Lo or underscore).
I'm working on pull request now.
Comment 2 Marcin Kolny 2015-02-25 16:34:22 UTC
I also noted, that there is also problem with first character in the identifier. It will be fixed in my patch too.
Comment 3 Marcin Kolny 2015-02-25 17:40:29 UTC
Now I see that Nl is also letter-character, but IsLetter doesn't treat this characters as letter-character, so it should be explicitly checked in is_identifier_start_character method.
Comment 4 Marcin Kolny 2015-02-25 19:13:16 UTC
My pull request which solves problem: https://github.com/mono/mono/pull/1601
Comment 5 Marek Safar 2015-05-18 08:05:18 UTC
Fixed in master