Decomposed unicode characters take space of two characters #13859

Closed
opened 2026-01-31 03:54:09 +00:00 by claunia · 4 comments
Owner

Originally created by @samuliasmala on GitHub (May 20, 2021).

Windows Terminal version (or Windows build number)

1.7.1033.0

Other Software

No response

Steps to reproduce

Run the following command in Ubuntu bash:

echo $'"\u00E4"\n"\u0061\u0308"'

Expected Behavior

Because U+00E4 and U+0061 U+0308 are equivalent Unicode forms I was expecting the following output:
image

Actual Behavior

The actual output I got was following:
image

The second form which takes space of two characters is a combination of Latin Small Letter A and Combining Diaeresis which should be displayed as a single character. Now it's displayed as two characters which results in two issues:

  • Readability of such text suffers since sentences have half-spaces around the character
  • E.g. less command calculates the row length using single-character spacing which causes overlapping text and disappearing lines when line length is exceeded due to the Combining Diaeresis or other similar character
Originally created by @samuliasmala on GitHub (May 20, 2021). ### Windows Terminal version (or Windows build number) 1.7.1033.0 ### Other Software _No response_ ### Steps to reproduce Run the following command in Ubuntu bash: ```bash echo $'"\u00E4"\n"\u0061\u0308"' ``` ### Expected Behavior Because U+00E4 and U+0061 U+0308 are [equivalent Unicode forms](https://handwiki.org/wiki/Unicode_equivalence) I was expecting the following output: ![image](https://user-images.githubusercontent.com/14218719/118957956-a40dbb00-b969-11eb-92b4-53a2aa23e251.png) ### Actual Behavior The actual output I got was following: ![image](https://user-images.githubusercontent.com/14218719/118958091-c1db2000-b969-11eb-8ac5-f4fcf72875b7.png) The second form which takes space of two characters is a combination of [Latin Small Letter A](https://www.compart.com/en/unicode/U+0061) and [Combining Diaeresis](https://www.compart.com/en/unicode/U+0308) which should be displayed as a single character. Now it's displayed as two characters which results in two issues: - Readability of such text suffers since sentences have half-spaces around the character - E.g. `less` command calculates the row length using single-character spacing which causes overlapping text and disappearing lines when line length is exceeded due to the Combining Diaeresis or other similar character
claunia added the Area-OutputResolution-Duplicate labels 2026-01-31 03:54:09 +00:00
Author
Owner

@KalleOlaviNiemitalo commented on GitHub (May 20, 2021):

For a moment, I thought displaying the combining character in the same character cell would be problematic for ReadConsoleOutputW and related functions. But it's not really worse than supplementary characters.

@KalleOlaviNiemitalo commented on GitHub (May 20, 2021): For a moment, I thought displaying the combining character in the same character cell would be problematic for ReadConsoleOutputW and related functions. But it's not really worse than supplementary characters.
Author
Owner

@DHowett commented on GitHub (Jul 6, 2021):

Unfortunately, yeah... this is a longstanding issue with the console infrastructure. We're tracking resolution over in #1472.

In short: we measure each code unit much like a poor implementation of wcwidth would; we then allocate that much space for rendering, but pass the entire string to DirectWrite. It ends up "correct", but in a space too big for the glyph.

Combining characters and terminal emulation have not historically mixed very well. We have to come up with a better story, but there's going to be a very long tail of applications that are simply going to throw up their arms and scream/shout/run about or break. There's no such thing (historically!) (i'm not saying there shouldn't be 😄) as a character that takes up no space on the screen--a character for which wcwidth would be 0--to the terminals we're emulating.

For now, /dup #1472.

@DHowett commented on GitHub (Jul 6, 2021): Unfortunately, yeah... this is a longstanding issue with the console infrastructure. We're tracking resolution over in #1472. In short: we measure each code unit much like a poor implementation of wcwidth would; we then allocate that much space for rendering, but pass the entire string to DirectWrite. It ends up "correct", but in a space too big for the glyph. Combining characters and terminal emulation have not historically mixed very well. We have to come up with a better story, but there's going to be a very long tail of applications that are simply going to throw up their arms and scream/shout/run about or break. There's no such thing (historically!) (i'm not saying there shouldn't be :smile:) as a character that takes up no space on the screen--a character for which wcwidth would be 0--to the terminals we're emulating. For now, /dup #1472.
Author
Owner

@ghost commented on GitHub (Jul 6, 2021):

Hi! We've identified this issue as a duplicate of another one that already exists on this Issue Tracker. This specific instance is being closed in favor of tracking the concern over on the referenced thread. Thanks for your report!

@ghost commented on GitHub (Jul 6, 2021): Hi! We've identified this issue as a duplicate of another one that already exists on this Issue Tracker. This specific instance is being closed in favor of tracking the concern over on the referenced thread. Thanks for your report!
Author
Owner

@DHowett commented on GitHub (Jul 6, 2021):

Additional information in #8000!

@DHowett commented on GitHub (Jul 6, 2021): Additional information in #8000!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#13859