Problem switching in and out of UTF-8 mode #21055

Open
opened 2026-01-31 07:31:52 +00:00 by claunia · 5 comments
Owner

Originally created by @KazDragon on GitHub (Jan 5, 2024).

Windows Terminal version

1.18.3181.0

Windows build number

10.0.22621.0

Other Software

No response

Steps to reproduce

Output the character sequence twice using a tool of your choice:
Esc % G E2 A3 BF Esc % @

Expected Behavior

Each time, the unicode character 28FF ⣿ (Braille Pattern Dots-12345678) will be printed.

Actual Behavior

The correct character is printed the first time, the second time (and any subsequent times) prints "⣿".

From my reading of the code, this is because Esc % @ (Select Default Character Set) correctly disengages UTF-8 by calling DesignateCodingSystem in ActionCsiDispatch. However, its companion seuqnece Esc % G (Select UTF-8 Character Set) is not handled and thus the sequence is never again rendered in Unicode.

Originally created by @KazDragon on GitHub (Jan 5, 2024). ### Windows Terminal version 1.18.3181.0 ### Windows build number 10.0.22621.0 ### Other Software _No response_ ### Steps to reproduce Output the character sequence twice using a tool of your choice: `Esc % G E2 A3 BF Esc % @` ### Expected Behavior Each time, the unicode character 28FF ⣿ (Braille Pattern Dots-12345678) will be printed. ### Actual Behavior The correct character is printed the first time, the second time (and any subsequent times) prints "⣿". From my reading of the code, this is because Esc % @ (Select Default Character Set) correctly disengages UTF-8 by calling DesignateCodingSystem in ActionCsiDispatch. However, its companion seuqnece Esc % G (Select UTF-8 Character Set) is not handled and thus the sequence is never again rendered in Unicode.
claunia added the Help WantedProduct-ConhostIssue-BugArea-VT labels 2026-01-31 07:31:53 +00:00
Author
Owner

@j4james commented on GitHub (Jan 5, 2024):

Unfortunately this is just a limitation of the way these DOCS sequences work in Windows Terminal. When you change the character set like this, you need to give it some time before you output any additional characters. For example, in a WSL bash shell, something like this should be more likely to work:

printf "\e%%G"; printf "\xe2\xa3\xbf\e%%@"

And depending on your use case, you may want to include a short sleep in there as well, to allow for potential packet merging if you're on a network connection. You may also need to flush the output at that point if your i/o is buffered.

The reason for this is because the terminal receives your output in a buffer which it immediately decodes in the current character set, before attempting to execute it. By the time it executes the sequence telling it to change the character set, it will have already decoded the remainder of the buffer using the original character set.

@j4james commented on GitHub (Jan 5, 2024): Unfortunately this is just a limitation of the way these DOCS sequences work in Windows Terminal. When you change the character set like this, you need to give it some time before you output any additional characters. For example, in a WSL bash shell, something like this should be more likely to work: ``` printf "\e%%G"; printf "\xe2\xa3\xbf\e%%@" ``` And depending on your use case, you may want to include a short sleep in there as well, to allow for potential packet merging if you're on a network connection. You may also need to flush the output at that point if your i/o is buffered. The reason for this is because the terminal receives your output in a buffer which it immediately decodes in the current character set, _before_ attempting to execute it. By the time it executes the sequence telling it to change the character set, it will have already decoded the remainder of the buffer using the original character set.
Author
Owner

@KazDragon commented on GitHub (Jan 6, 2024):

That's unfortunate, and the workaround would not be suitable for my use case - I'm not using the terminal strictly for text, but also as part of a graphical display (https://github.com/KazDragon/textray is my tech demo for checking out whether a terminal has any special needs, although most of the work I do involves interspersing text with control commands to move about and draw the screen and using box drawing characters to frame things.

I might be able to devise a different workaround (essentially either use unicode for everything or unicode for nothing depending on the terminal), but it would still be nice if the terminal did what it was told when it was told ;)

Thanks for the fast feedback!

@KazDragon commented on GitHub (Jan 6, 2024): That's unfortunate, and the workaround would not be suitable for my use case - I'm not using the terminal strictly for text, but also as part of a graphical display (https://github.com/KazDragon/textray is my tech demo for checking out whether a terminal has any special needs, although most of the work I do involves interspersing text with control commands to move about and draw the screen and using box drawing characters to frame things. I might be able to devise a different workaround (essentially either use unicode for everything or unicode for nothing depending on the terminal), but it would still be nice if the terminal did what it was told when it was told ;) Thanks for the fast feedback!
Author
Owner

@j4james commented on GitHub (Jan 6, 2024):

I might be able to devise a different workaround (essentially either use unicode for everything or unicode for nothing depending on the terminal), but it would still be nice if the terminal did what it was told when it was told ;)

I definitely agree it would be nice if this worked without the delay, but I suspect it might be quite complicated to fix, so I wouldn't want to bet on that happening anytime soon. If you can figure out a workaround that doesn't involve a lot of DOCS switching, I'd recommend you do that.

I should also note this isn't a problem for a regular character set change, e.g. an SCS sequence, or locking shift, to switch between say ASCII and a DEC graphics set. It's just the DOCS (coding system) change that is complicated, where you're switching from 8-bit to multi-byte (or vice versa).

@j4james commented on GitHub (Jan 6, 2024): > I might be able to devise a different workaround (essentially either use unicode for everything or unicode for nothing depending on the terminal), but it would still be nice if the terminal did what it was told when it was told ;) I definitely agree it would be nice if this worked without the delay, but I suspect it might be quite complicated to fix, so I wouldn't want to bet on that happening anytime soon. If you can figure out a workaround that doesn't involve a lot of DOCS switching, I'd recommend you do that. I should also note this isn't a problem for a regular character set change, e.g. an `SCS` sequence, or locking shift, to switch between say ASCII and a DEC graphics set. It's just the DOCS (coding system) change that is complicated, where you're switching from 8-bit to multi-byte (or vice versa).
Author
Owner

@lhecker commented on GitHub (Jan 8, 2024):

I definitely agree it would be nice if this worked without the delay, but I suspect it might be quite complicated to fix, so I wouldn't want to bet on that happening anytime soon.

The amount of data that was consumed is already an out parameter for DoWriteConsole and WriteConsoleWImplHelper. So WriteConsoleAImpl would only need to retry the write calls. The iteration position is available in StateMachine and ProcessString currently returns void, so it can be modified to return a size_t instead. This would go hand-in-hand with avoiding heap allocations during the narrow -> wide character translation (i.e. a 1GB ASCII write consuming 2GB of additional memory temporarily) by using a e.g. static 2MB buffer on the stack and doing chunked translations.
To be honest though, I don't think this will be a high priority for us to implement ourselves right now, due to the large-ish number of highly impactful bugs on our backlog.

@lhecker commented on GitHub (Jan 8, 2024): > I definitely agree it would be nice if this worked without the delay, but I suspect it might be quite complicated to fix, so I wouldn't want to bet on that happening anytime soon. The amount of data that was consumed is already an out parameter for `DoWriteConsole` and `WriteConsoleWImplHelper`. So `WriteConsoleAImpl` would only need to retry the write calls. The iteration position is available in `StateMachine` and `ProcessString` currently returns `void`, so it can be modified to return a `size_t` instead. This would go hand-in-hand with avoiding heap allocations during the narrow -> wide character translation (i.e. a 1GB ASCII write consuming 2GB of additional memory temporarily) by using a e.g. static 2MB buffer on the stack and doing chunked translations. To be honest though, I don't think this will be a high priority for us to implement ourselves right now, due to the large-ish number of highly impactful bugs on our backlog.
Author
Owner

@zadjii-msft commented on GitHub (Jan 9, 2024):

I'll leave it to the team to vote during triage, but generally I'm a proponent of "if this is a valuable fix we would accept if someone did it, then leave it open (even if we'll never get to it)".

Thanks for the discussion here folks!

@zadjii-msft commented on GitHub (Jan 9, 2024): I'll leave it to the team to vote during triage, but generally I'm a proponent of "if this is a valuable fix we would accept if someone did it, then leave it open (even if we'll never get to it)". Thanks for the discussion here folks!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#21055