[PR #11690] Disable the acceptance of C1 control codes by default #28710

Open
opened 2026-01-31 09:30:17 +00:00 by claunia · 0 comments
Owner

Original Pull Request: https://github.com/microsoft/terminal/pull/11690

State: closed
Merged: Yes


There are some code pages with "unmapped" code points in the C1 range,
which results in them being translated into Unicode C1 control codes,
even though that is not their intended use. To avoid having these
characters triggering unintentional escape sequences, this PR now
disables C1 controls by default.

Switching to ISO-2022 encoding will re-enable them, though, since that
is the most likely scenario in which they would be required. They can
also be explicitly enabled, even in UTF-8 mode, with the DECAC1 escape
sequence.

What I've done is add a new mode to the StateMachine class that
controls whether C1 code points are interpreted as control characters or
not. When disabled, these code points are simply dropped from the
output, similar to the way a NUL is interpreted.

This isn't exactly the way they were handled in the v1 console (which I
think replaces them with the font notdef glyph), but it matches the
XTerm behavior, which seems more appropriate considering this is in VT
mode. And it's worth noting that Windows Explorer seems to work the same
way.

As mentioned above, the mode can be enabled by designating the ISO-2022
coding system with a DOCS sequence, and it will be disabled again when
UTF-8 is designated. You can also enable it explicitly with a DECAC1
sequence (originally this was actually a DEC printer sequence, but it
doesn't seem unreasonable to use it in a terminal).

I've also extended the operations that save and restore "cursor state"
(e.g. DECSC and DECRC) to include the state of the C1 parser mode,
since it's closely tied to the code page and character sets which are
also saved there. Similarly, when a DECSTR sequence resets the code
page and character sets, I've now made it reset the C1 mode as well.

I should note that the new StateMachine mode is controlled via a
generic SetParserMode method (with a matching API in the ConGetSet
interface) to allow for easier addition of other modes in the future.
And I've reimplemented the existing ANSI/VT52 mode in terms of these
generic methods instead of it having to have its own separate APIs.

Validation Steps Performed

Some of the unit tests for OSC sequences were using a C1 0x9C for the
string terminator, which doesn't work by default anymore. Since that's
not a good practice anyway, I thought it best to change those to a
standard 7-bit terminator. However, in tests that were explicitly
validating the C1 controls, I've just enabled the C1 parser mode at the
start of the tests in order to get them working again.

There were also some ANSI mode adapter tests that had to be updated to
account for the fact that it has now been reimplemented in terms of the
SetParserMode API.

I've added a new state machine test to validate the changes in behavior
when the C1 parser mode is enabled or disabled. And I've added an
adapter test to verify that the DesignateCodingSystems and
AcceptC1Controls methods toggle the C1 parser mode as expected.

I've manually verified the test cases in #10069 and #10310 to confirm
that they're no longer triggering control sequences by default.
Although, as I explained above, the C1 code points are completely
dropped from the output rather than displayed as notdef glyphs. I
think this is a reasonable compromise though.

Closes #10069
Closes #10310

**Original Pull Request:** https://github.com/microsoft/terminal/pull/11690 **State:** closed **Merged:** Yes --- There are some code pages with "unmapped" code points in the C1 range, which results in them being translated into Unicode C1 control codes, even though that is not their intended use. To avoid having these characters triggering unintentional escape sequences, this PR now disables C1 controls by default. Switching to ISO-2022 encoding will re-enable them, though, since that is the most likely scenario in which they would be required. They can also be explicitly enabled, even in UTF-8 mode, with the `DECAC1` escape sequence. What I've done is add a new mode to the `StateMachine` class that controls whether C1 code points are interpreted as control characters or not. When disabled, these code points are simply dropped from the output, similar to the way a `NUL` is interpreted. This isn't exactly the way they were handled in the v1 console (which I think replaces them with the font _notdef_ glyph), but it matches the XTerm behavior, which seems more appropriate considering this is in VT mode. And it's worth noting that Windows Explorer seems to work the same way. As mentioned above, the mode can be enabled by designating the ISO-2022 coding system with a `DOCS` sequence, and it will be disabled again when UTF-8 is designated. You can also enable it explicitly with a `DECAC1` sequence (originally this was actually a DEC printer sequence, but it doesn't seem unreasonable to use it in a terminal). I've also extended the operations that save and restore "cursor state" (e.g. `DECSC` and `DECRC`) to include the state of the C1 parser mode, since it's closely tied to the code page and character sets which are also saved there. Similarly, when a `DECSTR` sequence resets the code page and character sets, I've now made it reset the C1 mode as well. I should note that the new `StateMachine` mode is controlled via a generic `SetParserMode` method (with a matching API in the `ConGetSet` interface) to allow for easier addition of other modes in the future. And I've reimplemented the existing ANSI/VT52 mode in terms of these generic methods instead of it having to have its own separate APIs. ## Validation Steps Performed Some of the unit tests for OSC sequences were using a C1 `0x9C` for the string terminator, which doesn't work by default anymore. Since that's not a good practice anyway, I thought it best to change those to a standard 7-bit terminator. However, in tests that were explicitly validating the C1 controls, I've just enabled the C1 parser mode at the start of the tests in order to get them working again. There were also some ANSI mode adapter tests that had to be updated to account for the fact that it has now been reimplemented in terms of the `SetParserMode` API. I've added a new state machine test to validate the changes in behavior when the C1 parser mode is enabled or disabled. And I've added an adapter test to verify that the `DesignateCodingSystems` and `AcceptC1Controls` methods toggle the C1 parser mode as expected. I've manually verified the test cases in #10069 and #10310 to confirm that they're no longer triggering control sequences by default. Although, as I explained above, the C1 code points are completely dropped from the output rather than displayed as _notdef_ glyphs. I think this is a reasonable compromise though. Closes #10069 Closes #10310
claunia added the pull-request label 2026-01-31 09:30:17 +00:00
Sign in to join this conversation.
No Label pull-request
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#28710