Emoji width wrong and varying with U+FE0F and U+FE0E ignored. #12325

Closed
opened 2026-01-31 03:12:27 +00:00 by claunia · 11 comments
Owner

Originally created by @christianparpart on GitHub (Jan 30, 2021).

Try this in Window Terminal (I am using version 1.4 btw):

echo -ne "M\U0001F600M\nM\U0001F600\uFE0FM\nM\U0001F600\uFE0EM\n"

This is how it looks:
image

  1. U+FE0E modifier is ignored.
  2. U+FE0F modifier generates a glyph with a different with than without modifier.

The default emoji presentation mode should be respected (here emoji, but there are others that are text).
And the width for emoji emoji presentation should be 2 (as per unicode emoji spec) whereas emoji text presentation (including U+FE0E override) should be text and cell width 1.

Originally created by @christianparpart on GitHub (Jan 30, 2021). Try this in Window Terminal (I am using version 1.4 btw): ```sh echo -ne "M\U0001F600M\nM\U0001F600\uFE0FM\nM\U0001F600\uFE0EM\n" ``` This is how it looks: ![image](https://user-images.githubusercontent.com/56763/106359324-5dd91100-6312-11eb-822a-484accf2b51b.png) 1. U+FE0E modifier is ignored. 2. U+FE0F modifier generates a glyph with a different with than without modifier. The default emoji presentation mode should be respected (here emoji, but there are others that are text). And the width for emoji emoji presentation should be 2 (as per unicode emoji spec) whereas emoji text presentation (including U+FE0E override) should be text and cell width 1.
Author
Owner

@j4james commented on GitHub (Jan 31, 2021):

emoji text presentation (including U+FE0E override) should be text and cell width 1.

I agree that the current behaviour is wrong, but I don't think what you're claiming here is correct. I think in all three cases the emoji width should be 2 cells wide. You shouldn't be able to change the width of a character in a terminal with a variation selector.

I've just checked a bunch of terminals (XTerm, Gnome Terminal, Rxvt, st, Konsole, Alacritty, Mlterm, Mintty), and none of them alter the emoji width. And Gnome Terminal was the only that seemed to treat the color and text representations differently, although the others always used the text representation rather than a color glyph.

@j4james commented on GitHub (Jan 31, 2021): > emoji text presentation (including U+FE0E override) should be text and cell width 1. I agree that the current behaviour is wrong, but I don't think what you're claiming here is correct. I think in all three cases the emoji width should be 2 cells wide. You shouldn't be able to change the width of a character in a terminal with a variation selector. I've just checked a bunch of terminals (XTerm, Gnome Terminal, Rxvt, st, Konsole, Alacritty, Mlterm, Mintty), and none of them alter the emoji width. And Gnome Terminal was the only that seemed to treat the color and text representations differently, although the others always used the text representation rather than a color glyph.
Author
Owner

@j4james commented on GitHub (Jan 31, 2021):

For reference, here's a screenshot showing the output from the various terminals I tested:
image

@j4james commented on GitHub (Jan 31, 2021): For reference, here's a screenshot showing the output from the various terminals I tested: ![image](https://user-images.githubusercontent.com/4181424/106372257-df0dc380-6365-11eb-81dc-4a7e2061cada.png)
Author
Owner

@christianparpart commented on GitHub (Jan 31, 2021):

Ah, hey @j4james ;-)
I'm not saying every terminal does that at its best ;-)

It is almost a year ago when I was about to literally eat the Unicode specs in order to implement emoji and especially multi codepoint grapheme support. I don't remember by heart where I was reading about that one problematic claim, and I may as well be wrong wrt. Emoji text presentation's width. I still think I am right though (I am trying to prove me... wrong? :-D)

Let's try that:

echo -ne "ABC\n\u00a9\n\u00a9\ufe0f\n\U0001F600\ufe0e\nABC\n"

image
(screenshot from Kitty)

This will show the copyright sign, I chose this now because the default emoji presentation style is indeed text and its width is one (East Asian Width = N := narrow).

If you try that in Kitty it actually does it the way I think it's right. (except I think there is a bug in kitty wrt "\U0001F600\uFE0F" showing only a white bubble at least on my screen, /cc @kovidgoyal).

image

I am trying to find myself again in the Unicode TS 51 though.

When I did the research early last year I was also looking how browsers render emojis (with monospace font surrounding them), and that seems to match my understanding too. Whether or not that's correct has to be found out though, but I would say that is exactly what users are expecting to see. It should not be rendered differently in the terminal, just because every emoji (regardless of VS15 or default presentation mode) has to be 2.

I am aware that emojis or unicode (especially multi codepoint grapheme clusters) are a sensible topic in the terminal land, but from the user's standpoint I think the above should be right. What do you think?

@christianparpart commented on GitHub (Jan 31, 2021): Ah, hey @j4james ;-) I'm not saying every terminal does that at its best ;-) It is almost a year ago when I was about to literally eat the Unicode specs in order to implement emoji and especially multi codepoint grapheme support. I don't remember by heart where I was reading about that one problematic claim, and I may as well be wrong wrt. Emoji text presentation's width. I still think I am right though (I am trying to prove me... wrong? :-D) Let's try that: ```sh echo -ne "ABC\n\u00a9\n\u00a9\ufe0f\n\U0001F600\ufe0e\nABC\n" ``` ![image](https://user-images.githubusercontent.com/56763/106386643-39466d00-63d6-11eb-895b-13bc12b443fc.png) (screenshot from Kitty) This will show the copyright sign, I chose this now because the default emoji presentation style is indeed text and its [width is one](https://www.unicode.org/Public/13.0.0/ucd/extracted/DerivedEastAsianWidth.txt) (East Asian Width = N := narrow). If you try that in Kitty it actually does it the way I think it's right. (except I think there is a bug in kitty wrt "\U0001F600\uFE0F" showing only a white bubble at least on my screen, /cc @kovidgoyal). ![image](https://user-images.githubusercontent.com/56763/106386601-12883680-63d6-11eb-977c-31bf6bdc51d7.png) I am trying to find myself again in the Unicode [TS 51](https://unicode.org/reports/tr51/) though. When I did the research early last year I was also looking how browsers render emojis (with monospace font surrounding them), and that seems to match my understanding too. Whether or not that's correct has to be found out though, but I would say that is exactly what users are expecting to see. It should not be rendered differently in the terminal, just because every emoji (regardless of VS15 or default presentation mode) has to be 2. I am aware that emojis or unicode (especially multi codepoint grapheme clusters) are a sensible topic in the terminal land, but from the user's standpoint I think the above should be right. What do you think?
Author
Owner

@kovidgoyal commented on GitHub (Jan 31, 2021):

@christianparpart you want to see https://github.com/kovidgoyal/kitty/issues/3211 for the bug you found in kitty.

@kovidgoyal commented on GitHub (Jan 31, 2021): @christianparpart you want to see https://github.com/kovidgoyal/kitty/issues/3211 for the bug you found in kitty.
Author
Owner

@kovidgoyal commented on GitHub (Jan 31, 2021):

As for the actual issue, as can be seen from kitty's behavior, I agree with @christianparpart

As per the unicode standard, variation selectors change the presentation base of the preceding character, if it has multiple presentation bases defined. The width of a character depends on its presentation. Ergo, variation selectors must change width. This is one (of the many) reasons one cannot use wcwidth() to calculate the width of text in a terminal, one must use wcswidth()

@kovidgoyal commented on GitHub (Jan 31, 2021): As for the actual issue, as can be seen from kitty's behavior, I agree with @christianparpart As per the unicode standard, variation selectors change the presentation base of the preceding character, if it has multiple presentation bases defined. The width of a character depends on its presentation. Ergo, variation selectors *must* change width. This is one (of the many) reasons one cannot use wcwidth() to calculate the width of text in a terminal, one must use wcswidth()
Author
Owner

@j4james commented on GitHub (Jan 31, 2021):

This subject has been discussed at length many times before, and I don't think anyone has ever proposed a sensible way to make it work. If you think you do have a solution, though, I suggest you continue the discussion in issue 9 of terminal-wg. But if you can't convince the terminal-wg community that you have a workable solution (and I didn't see any indication that you had), then I don't see why Windows Terminal would want to diverge from the standard that almost everyone else is currently following.

@j4james commented on GitHub (Jan 31, 2021): This subject has been discussed at length many times before, and I don't think anyone has ever proposed a sensible way to make it work. If you think you do have a solution, though, I suggest you continue the discussion in [issue 9 of terminal-wg](https://gitlab.freedesktop.org/terminal-wg/specifications/-/issues/9). But if you can't convince the terminal-wg community that you have a workable solution (and I didn't see any indication that you had), then I don't see why Windows Terminal would want to diverge from the standard that almost everyone else is currently following.
Author
Owner

@kovidgoyal commented on GitHub (Jan 31, 2021):

On Sun, Jan 31, 2021 at 07:43:18AM -0800, James Holderness wrote:

This subject has been discussed at length many times before, and I don't think anyone has ever proposed a sensible way to make it work. If you think you do have a solution, though, I suggest you continue the discussion in issue 9 of terminal-wg.

Yeah, use wcswidth, based on an actual standard, the one published by the
Unicode consrtium.

But if you can't convince the terminal-wg community that you have a workable solution (and I didn't see any indication that you had), then I don't see why Windows Terminal would want to diverge from the standard that almost everyone else is currently following.

Thanks, but I am not wasting any more of my life on terminal-wg. And
just FYI, there is no "standard" that everyone else follows. They just
use a broken hack (system wcwidth), and are too lazy to change. If you
want your terminal to follow the herd, feel free. Although given that
your terminal runs on windows which doesnt even have wcwidth, I have no idea
what standard you follow.

@kovidgoyal commented on GitHub (Jan 31, 2021): On Sun, Jan 31, 2021 at 07:43:18AM -0800, James Holderness wrote: > This subject has been discussed at length many times before, and I don't think anyone has ever proposed a sensible way to make it work. If you think you do have a solution, though, I suggest you continue the discussion in [issue 9 of terminal-wg](https://gitlab.freedesktop.org/terminal-wg/specifications/-/issues/9). Yeah, use wcswidth, based on an actual standard, the one published by the Unicode consrtium. > But if you can't convince the terminal-wg community that you have a workable solution (and I didn't see any indication that you had), then I don't see why Windows Terminal would want to diverge from the standard that almost everyone else is currently following. Thanks, but I am not wasting any more of my life on terminal-wg. And just FYI, there is no "standard" that everyone else follows. They just use a broken hack (system wcwidth), and are too lazy to change. If you want your terminal to follow the herd, feel free. Although given that your terminal runs on windows which doesnt even have wcwidth, I have no idea what standard you follow.
Author
Owner

@christianparpart commented on GitHub (Jan 31, 2021):

just a very slight FYI, I am having a draft lying around where I attempted to formalize:

  • grapheme cluster handling (continuous text writes, including cursor positioning, ...)
  • character width (including VS15/VS16 overrides, emoji default presentation)
  • feature detection, mode switch, future extensibility.

I am not publishing it yet, and it'll sure take some more months (I first have to finish another proposal), especially since I know how things can sometimes be discussed to death in TWG, yielding into exodus. I surely will do when I feel I've addressed what I think is needed for a "Terminal Unicode Core" support, I'll let you know then.

@christianparpart commented on GitHub (Jan 31, 2021): just a very slight FYI, I am having a draft lying around where I attempted to formalize: * grapheme cluster handling (continuous text writes, including cursor positioning, ...) * character width (including VS15/VS16 overrides, emoji default presentation) * feature detection, mode switch, future extensibility. I am not publishing it yet, and it'll sure take some more months (I first have to finish another proposal), especially since I know how things can sometimes be discussed to death in TWG, yielding into exodus. I surely will do when I feel I've addressed what I think is needed for a "Terminal Unicode Core" support, I'll let you know then.
Author
Owner

@zadjii-msft commented on GitHub (Feb 1, 2021):

Alright so there's an easy bug here, and a hard discussion.

The easy bug is the spacing with the U+FE0E/U+FE0F. That can be fixed easier, to match the behavior of other terminals.

The hard discussion - should the version selector change the width - I'm gonna leave that discussion for another place. This thread isn't really the place to build that consensus. If consensus is found, I'm happy to match the consensus of the terminal emulator space.

@zadjii-msft commented on GitHub (Feb 1, 2021): Alright so there's an easy bug here, and a hard discussion. The easy bug is the spacing with the `U+FE0E`/`U+FE0F`. That can be fixed _easier_, to match the behavior of other terminals. The hard discussion - should the version selector change the width - I'm gonna leave that discussion for another place. This thread isn't really the place to build that consensus. If consensus is found, I'm happy to match the consensus of the terminal emulator space.
Author
Owner

@DHowett commented on GitHub (Feb 4, 2021):

The spacing issue is another form of /dup #1472. We don't handle 0-width combining codepoints well.

@DHowett commented on GitHub (Feb 4, 2021): The spacing issue is another form of /dup #1472. We don't handle 0-width combining codepoints well.
Author
Owner

@ghost commented on GitHub (Feb 4, 2021):

Hi! We've identified this issue as a duplicate of another one that already exists on this Issue Tracker. This specific instance is being closed in favor of tracking the concern over on the referenced thread. Thanks for your report!

@ghost commented on GitHub (Feb 4, 2021): Hi! We've identified this issue as a duplicate of another one that already exists on this Issue Tracker. This specific instance is being closed in favor of tracking the concern over on the referenced thread. Thanks for your report!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#12325