Some Unicode characters are improperly accepted or dropped #12428

Open
opened 2026-01-31 03:15:22 +00:00 by claunia · 0 comments
Owner

Originally created by @chitoku-k on GitHub (Feb 5, 2021).

Originally assigned to: @DHowett on GitHub.

Environment

Windows build number: 10.0.19042.0
Windows Terminal version (if applicable): 1.5.x and 1.6.x
PowerShell: 5.1.19041.610 and 7.1.1

Steps to reproduce

  1. Run Windows Terminal built after https://github.com/microsoft/terminal/pull/8035 gets merged.
  2. Input 0123456789 (FULLWIDTH DIGIT) by any methods such as right click to paste, Ctrl + V, or from keyboard.

Expected behavior

  • 0123456789 is input.

Actual behavior

  • Only the first letter ( in this case) is input.

Detailed Explanation

The implementation of GetQuickCharWidth has been changed in https://github.com/microsoft/terminal/pull/8035 and affected the following invocations:

The former is totally acceptable because it falls back to looking up from Unicode table in CodepointWidthDetector later on; however, the latter one has been broken in this PR.

When the scanned key is considered invalid in CharToKeyEvents, it tries to detect character width by calling GetQuickCharWidth and make it process as keyboard events if the width is CodepointWidth::Wide. In the aforementioned PR, however, GetQuickCharWidth no longer returns CodepointWidth::Wide and instead returns CodepointWidth::Invalid for the characters other than ASCII, and results in SynthesizeNumpadEvents being called for 0123456789. Since this function processes the given characters as if it were typed with Alt key + numpad, the result becomes nondeterministic (such as some applications like cmd.exe process this normally but powershell.exe only accepts the first letter).

1df3182865/src/types/convert.cpp (L156)

One option is to return a new value like CodepointWidth::Unknown when GetQuickCharWidth cannot detect the character width immediately,

  CodepointWidth GetQuickCharWidth(const wchar_t wch) noexcept
  {
      if (0x20 <= wch && wch <= 0x7e)
      {
          /* ASCII */
          return CodepointWidth::Narrow;
      }
+     else if (wch < 0xffff)
+     {
+         return CodepointWidth::Unknown;
+     }
      return CodepointWidth::Invalid;
  }

because the definition of CodepointWidth::Invalid is not a valid unicode codepoint.

1df3182865/src/types/inc/convert.hpp (L23-L29)

In CharToKeyEvents, the expression should be corrected to:

- if (WI_IsFlagSet(CharType, C3_ALPHA) || GetQuickCharWidth(wch) == CodepointWidth::Wide)
+ if (WI_IsFlagSet(CharType, C3_ALPHA) || GetQuickCharWidth(wch) == CodepointWidth::Unknown)

and in CodepointWidthDetector::GetWidth(), it has to support the new value:

-         // If it's invalid, the quick width had no opinion, so go to the lookup table.
-         if (width == CodepointWidth::Invalid)
+         // If it's unknown or invalid, the quick width had no opinion, so go to the lookup table.
+         if (width == CodepointWidth::Unknown || width == CodepointWidth::Invalid)

I can make a PR if the way of fix I've suggested is acceptable. Thanks in advance.

Appendix

Note that this issue is not related to the issue in PSReadLine at all because it can be reproducible in applications launched from PowerShell.

Originally created by @chitoku-k on GitHub (Feb 5, 2021). Originally assigned to: @DHowett on GitHub. <!-- 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨 I ACKNOWLEDGE THE FOLLOWING BEFORE PROCEEDING: 1. If I delete this entire template and go my own path, the core team may close my issue without further explanation or engagement. 2. If I list multiple bugs/concerns in this one issue, the core team may close my issue without further explanation or engagement. 3. If I write an issue that has many duplicates, the core team may close my issue without further explanation or engagement (and without necessarily spending time to find the exact duplicate ID number). 4. If I leave the title incomplete when filing the issue, the core team may close my issue without further explanation or engagement. 5. If I file something completely blank in the body, the core team may close my issue without further explanation or engagement. All good? Then proceed! --> <!-- This bug tracker is monitored by Windows Terminal development team and other technical folks. **Important: When reporting BSODs or security issues, DO NOT attach memory dumps, logs, or traces to Github issues**. Instead, send dumps/traces to secure@microsoft.com, referencing this GitHub issue. If this is an application crash, please also provide a Feedback Hub submission link so we can find your diagnostic data on the backend. Use the category "Apps > Windows Terminal (Preview)" and choose "Share My Feedback" after submission to get the link. Please use this form and describe your issue, concisely but precisely, with as much detail as possible. --> # Environment ```none Windows build number: 10.0.19042.0 Windows Terminal version (if applicable): 1.5.x and 1.6.x PowerShell: 5.1.19041.610 and 7.1.1 ``` # Steps to reproduce <!-- A description of how to trigger this bug. --> 1. Run Windows Terminal built after https://github.com/microsoft/terminal/pull/8035 gets merged. 1. Input `0123456789` ([FULLWIDTH DIGIT](https://unicode.org/charts/PDF/UFF00.pdf)) by any methods such as right click to paste, `Ctrl + V`, or from keyboard. # Expected behavior <!-- A description of what you're expecting, possibly containing screenshots or reference material. --> - `0123456789` is input. # Actual behavior <!-- What's actually happening? --> - Only the first letter (`0` in this case) is input. # Detailed Explanation The implementation of `GetQuickCharWidth` has been changed in https://github.com/microsoft/terminal/pull/8035 and affected the following invocations: - to determine which type of width the given character has for rendering - [CodepointWidthDetector::GetWidth(const std::wstring_view)](https://github.com/microsoft/terminal/blob/1df3182865fb089bd653763cd0abbea811545365/src/types/CodepointWidthDetector.cpp#L340-L376) - to convert the given character(s) into a queue of key events for input - [CharToKeyEvents(const wchar_t, const unsigned int)](https://github.com/microsoft/terminal/blob/1df3182865fb089bd653763cd0abbea811545365/src/types/convert.cpp#L142-L175) The former is totally acceptable because it falls back to looking up from Unicode table in `CodepointWidthDetector` later on; however, the latter one has been broken in this PR. When the scanned key is considered invalid in `CharToKeyEvents`, it tries to detect character width by calling `GetQuickCharWidth` and make it process as keyboard events if the width is `CodepointWidth::Wide`. In the aforementioned PR, however, `GetQuickCharWidth` no longer returns `CodepointWidth::Wide` and instead returns `CodepointWidth::Invalid` for the characters other than ASCII, and results in `SynthesizeNumpadEvents` being called for `0123456789`. Since this function processes the given characters as if it were typed with `Alt` key + numpad, the result becomes nondeterministic (such as some applications like cmd.exe process this normally but powershell.exe only accepts the first letter). https://github.com/microsoft/terminal/blob/1df3182865fb089bd653763cd0abbea811545365/src/types/convert.cpp#L156 One option is to return a new value like `CodepointWidth::Unknown` when `GetQuickCharWidth` cannot detect the character width immediately, ```diff CodepointWidth GetQuickCharWidth(const wchar_t wch) noexcept { if (0x20 <= wch && wch <= 0x7e) { /* ASCII */ return CodepointWidth::Narrow; } + else if (wch < 0xffff) + { + return CodepointWidth::Unknown; + } return CodepointWidth::Invalid; } ``` because the definition of `CodepointWidth::Invalid` is **not a valid unicode codepoint**. https://github.com/microsoft/terminal/blob/1df3182865fb089bd653763cd0abbea811545365/src/types/inc/convert.hpp#L23-L29 In `CharToKeyEvents`, the expression should be corrected to: ```diff - if (WI_IsFlagSet(CharType, C3_ALPHA) || GetQuickCharWidth(wch) == CodepointWidth::Wide) + if (WI_IsFlagSet(CharType, C3_ALPHA) || GetQuickCharWidth(wch) == CodepointWidth::Unknown) ``` and in `CodepointWidthDetector::GetWidth()`, it has to support the new value: ```diff - // If it's invalid, the quick width had no opinion, so go to the lookup table. - if (width == CodepointWidth::Invalid) + // If it's unknown or invalid, the quick width had no opinion, so go to the lookup table. + if (width == CodepointWidth::Unknown || width == CodepointWidth::Invalid) ``` I can make a PR if the way of fix I've suggested is acceptable. Thanks in advance. ## Appendix Note that this issue is not related to the issue in PSReadLine at all because it can be reproducible in applications launched from PowerShell.
claunia added the Product-ConhostIssue-BugArea-InputNeeds-Tag-FixProduct-Terminal labels 2026-01-31 03:15:23 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#12428