REP of SP characters produces incorrect results #19910

Open
opened 2026-01-31 06:57:11 +00:00 by claunia · 2 comments
Owner

Originally created by @jdebp on GitHub (May 20, 2023).

Windows Terminal version

1.17.1023

Details

1.16.10261 had significant problems with REP of Unicode block graphics and other such characters. That's mostly fixed in 1.17.1023, with one exception: REP of Supplementary Plane characters still produces incorrect results, with the characters repeated in the wrong places on the line or sometimes not at all.

I've experienced this with MouseText characters like U+0001FB81 in particular, and have not (of course!) tested every SP code point.

I did some screenshots.

Originally created by @jdebp on GitHub (May 20, 2023). ### Windows Terminal version 1.17.1023 ### Details 1.16.10261 had significant problems with REP of Unicode block graphics and other such characters. That's mostly fixed in 1.17.1023, with one exception: REP of Supplementary Plane characters still produces incorrect results, with the characters repeated in the wrong places on the line or sometimes not at all. I've experienced this with MouseText characters like U+0001FB81 in particular, and have not (of course!) tested every SP code point. [I did some screenshots.](https://tty0.social/@JdeBP/110307055376583641)
claunia added the Help WantedArea-OutputIssue-BugProduct-Terminal labels 2026-01-31 06:57:12 +00:00
Author
Owner

@j4james commented on GitHub (May 20, 2023):

As currently implemented, REP just repeats the last wchar_t that was output (which on Windows is 16-bit), so for anything outside the BMP, you're just going to be repeating the second half of a surrogate pair.

If we don't want to support supplementary planes, maybe we should filter out anything in the surrogate range, rather than writing out garbage. That way apps could at least detect whether it was supported or not.

@j4james commented on GitHub (May 20, 2023): As currently implemented, `REP` just repeats the last `wchar_t` that was output (which on Windows is 16-bit), so for anything outside the BMP, you're just going to be repeating the second half of a surrogate pair. If we don't want to support supplementary planes, maybe we should filter out anything in the surrogate range, rather than writing out garbage. That way apps could at least detect whether it was supported or not.
Author
Owner

@j4james commented on GitHub (Jul 24, 2023):

@lhecker Regarding your comment here: https://github.com/microsoft/terminal/issues/15751#issuecomment-1647897705

I'm not entirely sure which part in our code base doesn't handle surrogate pairs, etc., but I suspect it can be discovered by setting a breakpoint on AdeptDispatch::_FillRect.

This issue isn't related to FillRect - it's a limitation of the StateMachine implementation, which works with 16-bit wchar_t elements. In the case of the REP operation, it's repeating the last such character. See here:

5daf4983d2/src/terminal/parser/OutputStateMachineEngine.cpp (L570-L581)

If that happens to be the second half a surrogate pair, it's not going to work correctly.

But this is just one symptom of the problem. For another example, compare the difference between these two statements, which use an emoji as the final character of a CSI sequence:

printf '[\e[✅]\n'
printf '[\e[💓]\n'

The first outputs [], while the second shows an error glyph [�]. Admittedly this is a ridiculous edge case, but if you are working on a rewrite of the state machine code, it would nice if we could fix the underlying problem, and not just hack in a workaround for REP. Not essential though.

@j4james commented on GitHub (Jul 24, 2023): @lhecker Regarding your comment here: https://github.com/microsoft/terminal/issues/15751#issuecomment-1647897705 > I'm not entirely sure which part in our code base doesn't handle surrogate pairs, etc., but I suspect it can be discovered by setting a breakpoint on `AdeptDispatch::_FillRect`. This issue isn't related to `FillRect` - it's a limitation of the `StateMachine` implementation, which works with 16-bit `wchar_t` elements. In the case of the `REP` operation, it's repeating the last such character. See here: https://github.com/microsoft/terminal/blob/5daf4983d2192b7be870bf5ae6a3afbb5945a9ce/src/terminal/parser/OutputStateMachineEngine.cpp#L570-L581 If that happens to be the second half a surrogate pair, it's not going to work correctly. But this is just one symptom of the problem. For another example, compare the difference between these two statements, which use an emoji as the final character of a `CSI` sequence: ``` printf '[\e[✅]\n' printf '[\e[💓]\n' ``` The first outputs `[]`, while the second shows an error glyph `[�]`. Admittedly this is a ridiculous edge case, but if you are working on a rewrite of the state machine code, it would nice if we could fix the underlying problem, and not just hack in a workaround for `REP`. Not essential though.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#19910