Spurious spaces appear when printing some character from Unicode Private Use Area #19618

Closed
opened 2026-01-31 06:48:49 +00:00 by claunia · 12 comments
Owner

Originally created by @romkatv on GitHub (Apr 3, 2023).

Windows Terminal version

1.16.10261.0

Windows build number

10.0.19045.0

Other Software

WSL

Steps to reproduce

  1. Open bash or zsh in WSL in Windows Terminal.
  2. Run this command: printf '\UF0737\033[41mx\033[0m\n'

Expected Behavior

The output of the command should occupy two columns. The content of the first column is unspecified (it depends on your font). The second column should contain x.

image

Actual Behavior

The output occupies 3 columns: there is an extra space in the middle.

image

It may appear that the space is a part of the first character. This, however, is not the case, as can be demonstrated by running printf '\UF0737x\033[41my\033[0m\n'.

image

Not all characters from Unicode Private Use Area exhibit this issue. For example, printf '\UE617\033[41mx\033[0m\n' works as intended.

Originally created by @romkatv on GitHub (Apr 3, 2023). ### Windows Terminal version 1.16.10261.0 ### Windows build number 10.0.19045.0 ### Other Software WSL ### Steps to reproduce 1. Open bash or zsh in WSL in Windows Terminal. 2. Run this command: `printf '\UF0737\033[41mx\033[0m\n'` ### Expected Behavior The output of the command should occupy two columns. The content of the first column is unspecified (it depends on your font). The second column should contain `x`. ![image](https://user-images.githubusercontent.com/1282067/229511123-8c0d7243-98de-4fec-88f4-2a96a7c42d5c.png) ### Actual Behavior The output occupies 3 columns: there is an extra space in the middle. ![image](https://user-images.githubusercontent.com/1282067/229511278-23d834a3-7c53-49ec-b57e-1f72cd795e02.png) It may appear that the space is a part of the first character. This, however, is not the case, as can be demonstrated by running `printf '\UF0737x\033[41my\033[0m\n'`. ![image](https://user-images.githubusercontent.com/1282067/229513854-fdc41365-dc70-4b56-b5be-92db62196db0.png) Not all characters from Unicode Private Use Area exhibit this issue. For example, `printf '\UE617\033[41mx\033[0m\n'` works as intended.
claunia added the Issue-BugResolution-Duplicate labels 2026-01-31 06:48:49 +00:00
Author
Owner

@romkatv commented on GitHub (Apr 3, 2023):

As I mentioned above, font doesn't matter. To avoid confusion, here's a screenshot of all commands with Consolas:

image

The output of the last command is as expected. The output of the first two commands is incorrect (the space in the middle should not be there).

@romkatv commented on GitHub (Apr 3, 2023): As I mentioned above, font doesn't matter. To avoid confusion, here's a screenshot of all commands with Consolas: ![image](https://user-images.githubusercontent.com/1282067/229518534-4bba54c1-28ec-47c3-b249-38ecb0f609ad.png) The output of the last command is as expected. The output of the first two commands is incorrect (the space in the middle should not be there).
Author
Owner

@romkatv commented on GitHub (Apr 3, 2023):

Conhost.exe also suffers from this issue but differently.

image

The output of the second command is different from Windows Terminal but also incorrect.

@romkatv commented on GitHub (Apr 3, 2023): Conhost.exe also suffers from this issue but differently. ![image](https://user-images.githubusercontent.com/1282067/229523958-f1f65d74-02c2-45d1-b45d-cfece1266802.png) The output of the second command is different from Windows Terminal but also incorrect.
Author
Owner

@lhecker commented on GitHub (Apr 3, 2023):

This is a well-known issue that is very, very difficult to resolve, because it requires undoing like 2 decades of code built on UCS2 assumptions. In other words, this happens, because your code points are surrogate pairs and this code base assumes that each UTF-16 character is at least 1 column wide. A surrogate pair can thus not be narrower than 2 columns. I'm actively working on this issue however. It's a duplicate of #3546.

@lhecker commented on GitHub (Apr 3, 2023): This is a well-known issue that is very, very difficult to resolve, because it requires undoing like 2 decades of code built on UCS2 assumptions. In other words, this happens, because your code points are surrogate pairs and this code base assumes that each UTF-16 character is at least 1 column wide. A surrogate pair can thus not be narrower than 2 columns. I'm actively working on this issue however. It's a duplicate of #3546.
Author
Owner

@romkatv commented on GitHub (Apr 3, 2023):

Thanks for the link. This explains the output of printf '\UF0737x\033[41my\033[0m\n' in conhost.exe. However, the output in Windows Terminal is different, which suggests that it's doing something special. Could you give a hint that would explain the output of this command in Windows Terminal?

@romkatv commented on GitHub (Apr 3, 2023): Thanks for the link. This explains the output of `printf '\UF0737x\033[41my\033[0m\n'` in conhost.exe. However, the output in Windows Terminal is different, which suggests that it's doing something special. Could you give a hint that would explain the output of this command in Windows Terminal?
Author
Owner

@237dmitry commented on GitHub (Apr 3, 2023):

This depends on font and perhaps on Atlas Engine (enabled or not):

Screenshot 2023-04-03 204154

@237dmitry commented on GitHub (Apr 3, 2023): This depends on font and perhaps on Atlas Engine (enabled or not): ![Screenshot 2023-04-03 204154](https://user-images.githubusercontent.com/78153320/229586506-044030ae-829e-402c-b89f-d64e1f32605d.png)
Author
Owner

@lhecker commented on GitHub (Apr 3, 2023):

To be honest, I'm not 100% sure where the different behavior is coming from, and I don't think it's easy to determine. Your Windows 10 version uses a much much older version of the text processing code than Windows Terminal 1.16 and so there's a huge number of places that might be responsible for this.

I've just tested your repro on Windows Terminal Preview (1.17) by the way and it appears it doesn't reproduce anymore:
image

It doesn't matter whether I have AtlasEngine enabled or not. I'm pretty sure it was fixed by PR #14640, because it closes a suspiciously similar issue: #6162.

Since #6162 is so similar I'll close this issue as a duplicate. /dup #6162

@lhecker commented on GitHub (Apr 3, 2023): To be honest, I'm not 100% sure where the different behavior is coming from, and I don't think it's easy to determine. Your Windows 10 version uses a much much older version of the text processing code than Windows Terminal 1.16 and so there's a huge number of places that might be responsible for this. I've just tested your repro on Windows Terminal Preview (1.17) by the way and it appears it doesn't reproduce anymore: ![image](https://user-images.githubusercontent.com/2256941/229586540-9d7a6ed6-1545-45ed-b606-0a9b54c3908c.png) It doesn't matter whether I have AtlasEngine enabled or not. I'm pretty sure it was fixed by PR #14640, because it closes a suspiciously similar issue: #6162. Since #6162 is so similar I'll close this issue as a duplicate. /dup #6162
Author
Owner

@microsoft-github-policy-service[bot] commented on GitHub (Apr 3, 2023):

Hi! We've identified this issue as a duplicate of another one that already exists on this Issue Tracker. This specific instance is being closed in favor of tracking the concern over on the referenced thread. Thanks for your report!

@microsoft-github-policy-service[bot] commented on GitHub (Apr 3, 2023): Hi! We've identified this issue as a duplicate of another one that already exists on this Issue Tracker. This specific instance is being closed in favor of tracking the concern over on the referenced thread. Thanks for your report!
Author
Owner

@lhecker commented on GitHub (Apr 3, 2023):

BTW I should add that you'll find many more similar issues around our Unicode support, because what I said previously unfortunately still applies. It's one of my top priorities to address this. If you find any other Unicode issues, please do feel free to file more issues on us however!

@lhecker commented on GitHub (Apr 3, 2023): BTW I should add that you'll find many more similar issues around our Unicode support, because what I said previously unfortunately still applies. It's one of my top priorities to address this. If you find any other Unicode issues, please do feel free to file more issues on us however!
Author
Owner

@romkatv commented on GitHub (Apr 3, 2023):

This depends on font and perhaps on Atlas Engine

As I mentioned above, this does not depend on font. I didn't mention Atlas Engine but the answer is the same: it does not depend on it.

I've just tested your repro on Windows Terminal Preview (1.17) by the way and it appears it doesn't reproduce anymore

That's great to hear, and it makes a lot more sense than "this code base assumes that each UTF-16 character is at least 1 column wide", which contradicted my observations.

@romkatv commented on GitHub (Apr 3, 2023): > This depends on font and perhaps on Atlas Engine As I mentioned above, this does not depend on font. I didn't mention Atlas Engine but the answer is the same: it does not depend on it. > I've just tested your repro on Windows Terminal Preview (1.17) by the way and it appears it doesn't reproduce anymore That's great to hear, and it makes a lot more sense than "this code base assumes that each UTF-16 character is at least 1 column wide", which contradicted my observations.
Author
Owner

@DHowett commented on GitHub (Apr 3, 2023):

"this code base assumes that each UTF-16 character is at least 1 column wide"

You know, this is pretty close to the truth today.

Up until Windows Terminal 1.17, the text buffer assumed that each UTF-16 code unit¹ was at least one column wide.

Beyond 1.17, the text buffer assumes that each UTF-16 code point is at least one column wide. That is, we don't support zero-width characters or grapheme clusters composed of multiple code points.

¹ This is, of course, where "surrogate pairs require at least two columns" comes from. 🙂

@DHowett commented on GitHub (Apr 3, 2023): > "this code base assumes that each UTF-16 character is at least 1 column wide" You know, this is pretty close to the truth today. Up until Windows Terminal 1.17, the text buffer assumed that each UTF-16 **code unit**¹ was at least one column wide. Beyond 1.17, the text buffer assumes that each UTF-16 **code point** is at least one column wide. That is, we don't support zero-width characters or grapheme clusters composed of multiple code points. ¹ This is, of course, where "surrogate pairs require at least two columns" comes from. 🙂
Author
Owner

@romkatv commented on GitHub (Apr 5, 2023):

This doesn't sound like the full story. Here's what I'm seeing in Windows Terminal 1.16.10261.0.

image

As you can see, U+F0737 takes just one column.

Anyway, I'm glad that this issue is fixed in the future version. I'll eagerly await until my PC picks it up.

@romkatv commented on GitHub (Apr 5, 2023): This doesn't sound like the full story. Here's what I'm seeing in Windows Terminal 1.16.10261.0. ![image](https://user-images.githubusercontent.com/1282067/230143197-4a5dc5a1-7951-40b0-9a56-868112b02dc7.png) As you can see, U+F0737 takes just one column. Anyway, I'm glad that this issue is fixed in the future version. I'll eagerly await until my PC picks it up.
Author
Owner

@DHowett commented on GitHub (Apr 5, 2023):

Good observation!

Now, for the real secret. The rendering engine in 1.16 hasn't been informed about which columns to put which characters in, so it renders everything of the same color in a single run that gets compressed down to the advance width of every glyph included in that run.

If you add another color, it suddenly snaps that new run of text to the correct position:

image

this results in a couple of fun things:

an emoji composed of a number of joiners takes up 5, 7, or 9 columns

image

a line that contains mis-measured characters wraps at the wrong width

image

(this has another bug in it, from some 100-codeunit buffer we have also gotten rid of recently; plus, I realize that I broke the \U escape)

@DHowett commented on GitHub (Apr 5, 2023): Good observation! Now, for the real secret. The rendering engine in 1.16 hasn't been informed about which columns to put which characters in, so it renders everything of the same color in a single run that gets compressed down to the advance width of every glyph included in that run. If you add another color, it suddenly snaps that new run of text to the correct position: <img width="689" alt="image" src="https://user-images.githubusercontent.com/189190/230158726-abf00f38-066d-4fba-8db8-df67bd85af30.png"> this results in a couple of fun things: _an emoji composed of a number of joiners takes up 5, 7, or 9 columns_ <img width="64" alt="image" src="https://user-images.githubusercontent.com/189190/230159129-88e4b746-72ef-461d-9157-0d15fc7bcf0d.png"> _a line that contains mis-measured characters wraps at the wrong width_ <img width="979" alt="image" src="https://user-images.githubusercontent.com/189190/230159319-22bc7a38-6a86-4de9-a071-3c0ed7e73be8.png"> (this has another bug in it, from some 100-codeunit buffer we have _also_ gotten rid of recently; plus, I realize that I broke the `\U` escape)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#19618