Improve support for VT character sets #4723

Closed
opened 2026-01-30 23:55:00 +00:00 by claunia · 1 comment
Owner

Originally created by @j4james on GitHub (Oct 30, 2019).

Description of the new feature/enhancement

The current implementation only supports selecting into the G0 set. And even if other g-sets were supported, there's no mechanism for switching between them - none of the shift controls are supported.

For minimal VT100 support we need the following:

  • the ability to select into both the G0 and G1 sets
  • the ability to invoke these sets using the SI and SO control characters (AKA locking shifts)
  • support for 5 character sets: ASCII, Special Graphics, UK, and the two alternate ROM aliases (we currently only support ASCII and Special Graphics)

For VT102 compatibility we'd also need to to support the single shift sequences, SS2 and SS3, although they aren't of much practical value until VT200.

For VT200+ we'd need:

  • the ability to select into the G2 and G3 sets
  • a way to invoke these new sets with the LS2 and LS3 locking shifts
  • also support for invoking into the GR block with the LSxR locking shifts
  • an additional 12 character sets

Later levels mostly just add additional characters sets, but also include three new select sequences for use with 96-character sets (usually they're 94 characters).

I realise there is probably not a lot of demand for VT character sets, considering everyone just uses Unicode nowadays. But if our goal is to provide the same level of functionality as XTerm, and to be able to serve as an actual VT terminal emulator, then this is an area I think could be worth improving.

Proposed technical implementation details (optional)

I've been doing quite a lot of research on this subject, and I think I have a reasonable idea now of what needs to be done.

  • The TerminalOutput class needs additional fields to track the four different g-sets, the active sets in the GL and GR blocks, and potentially a temporary single shift for the GL block.
  • The DesignateCharset method needs additional parameters to specify the target g-set, and potentially more than just a wchar_t to identify the character sets for the higher VT levels.
  • The TerminalOutput class will also need one or more additional methods for invoking a particular set into the GL and GR blocks, i.e. implementing the locking shifts and single shifts.
  • I also think it'd be a good idea to create a constexpr helper class for building the character set mapping tables, instead of having to specify a full 96-character array every time. The majority of these sets are based on ASCII or Latin1 with just a few replacements.

Then at the dispatch level we need the following:

  • The ITermDispatch interface needs to be extended with methods to link the OutputStateMachineEngine through to the new methods in the TerminalOutput class.
  • In the OutputStateMachineEngine I'd recommend dropping the _GetDesignateType method, and just processing the various cases directly in the ActionEscDispatch switch statement. As it's currently implemented, it can't correctly handle the 96-character sets and is unnecessarily complicated.
  • If we wanted to support all of the higher level character sets, the ActionEscDispatch method would also need to support multiple intermediates, but that can perhaps be left for a later update, as part of a wider refactoring of the dispatch code.

When it comes to supporting the single shift escapes, there's an additional complication at the state machine level. In order to handle certain key sequences, the InputStateMachine requires that the SS3 escape be processed as a control sequence introducer. However, the OutputStateMachine needs the SS3 to be dispatched immediately, like most other C1 escapes.

I think the way this sort of thing is usually handled is with a method in the IStateMachineEngine interface, e.g. something like ParseControlSequenceAfterSs3, where the input engine returns true, and the output engine returns false. The StateMachine::_EventEscape implementation can then use that method to decide whether to move to enter the Ss3Entry state, or immediately dispatch, when encountering an SS3 indicator.

One other thing I've glossed over so far is how to handle the GR block. When the codepage is UTF-8 (which is the default in WSL), we don't get the raw input bytes from the 0x80+ range, so there's no way for us to map those values to the active GR character set. I think the simplest way to deal with this would be to auto switch the codepage to 1252 (which gives us clean input source in the A0-FF range) whenever an LSxR sequence is executed.

If we also want to provide a way for the user to switch back to UTF-8 mode, we may want to consider supporting the ISO 2022 DOCS sequences. This would also allow the user to explicitly switch the codepage to 1252, thus making the default GR block Latin-1 rather than UTF-8, which would be more compatible with a standard VT terminal.

We don't need to do all of these things at once, but this is the long term plan I had in mind.

Originally created by @j4james on GitHub (Oct 30, 2019). # Description of the new feature/enhancement The current implementation only supports selecting into the G0 set. And even if other g-sets were supported, there's no mechanism for switching between them - none of the shift controls are supported. For minimal VT100 support we need the following: - the ability to select into both the G0 and G1 sets - the ability to invoke these sets using the SI and SO control characters (AKA locking shifts) - support for 5 character sets: ASCII, Special Graphics, UK, and the two alternate ROM aliases (we currently only support ASCII and Special Graphics) For VT102 compatibility we'd also need to to support the single shift sequences, SS2 and SS3, although they aren't of much practical value until VT200. For VT200+ we'd need: - the ability to select into the G2 and G3 sets - a way to invoke these new sets with the LS2 and LS3 locking shifts - also support for invoking into the GR block with the LSxR locking shifts - an additional 12 character sets Later levels mostly just add additional characters sets, but also include three new select sequences for use with 96-character sets (usually they're 94 characters). I realise there is probably not a lot of demand for VT character sets, considering everyone just uses Unicode nowadays. But if our goal is to provide the same level of functionality as XTerm, and to be able to serve as an actual VT terminal emulator, then this is an area I think could be worth improving. # Proposed technical implementation details (optional) I've been doing quite a lot of research on this subject, and I think I have a reasonable idea now of what needs to be done. - The `TerminalOutput` class needs additional fields to track the four different g-sets, the active sets in the GL and GR blocks, and potentially a temporary single shift for the GL block. - The `DesignateCharset` method needs additional parameters to specify the target g-set, and potentially more than just a `wchar_t` to identify the character sets for the higher VT levels. - The `TerminalOutput` class will also need one or more additional methods for invoking a particular set into the GL and GR blocks, i.e. implementing the locking shifts and single shifts. - I also think it'd be a good idea to create a `constexpr` helper class for building the character set mapping tables, instead of having to specify a full 96-character array every time. The majority of these sets are based on ASCII or Latin1 with just a few replacements. Then at the dispatch level we need the following: - The `ITermDispatch` interface needs to be extended with methods to link the `OutputStateMachineEngine` through to the new methods in the `TerminalOutput` class. - In the `OutputStateMachineEngine` I'd recommend dropping the `_GetDesignateType` method, and just processing the various cases directly in the `ActionEscDispatch` switch statement. As it's currently implemented, it can't correctly handle the 96-character sets and is unnecessarily complicated. - If we wanted to support all of the higher level character sets, the `ActionEscDispatch` method would also need to support multiple intermediates, but that can perhaps be left for a later update, as part of a wider refactoring of the dispatch code. When it comes to supporting the single shift escapes, there's an additional complication at the state machine level. In order to handle certain key sequences, the `InputStateMachine` requires that the SS3 escape be processed as a control sequence introducer. However, the `OutputStateMachine` needs the SS3 to be dispatched immediately, like most other C1 escapes. I think the way this sort of thing is usually handled is with a method in the `IStateMachineEngine` interface, e.g. something like `ParseControlSequenceAfterSs3`, where the input engine returns true, and the output engine returns false. The `StateMachine::_EventEscape` implementation can then use that method to decide whether to move to enter the Ss3Entry state, or immediately dispatch, when encountering an SS3 indicator. One other thing I've glossed over so far is how to handle the GR block. When the codepage is UTF-8 (which is the default in WSL), we don't get the raw input bytes from the 0x80+ range, so there's no way for us to map those values to the active GR character set. I think the simplest way to deal with this would be to auto switch the codepage to 1252 (which gives us clean input source in the A0-FF range) whenever an LSxR sequence is executed. If we also want to provide a way for the user to switch back to UTF-8 mode, we may want to consider supporting the [ISO 2022 DOCS sequences](https://en.wikipedia.org/wiki/ISO/IEC_2022#Interaction_with_other_coding_systems). This would also allow the user to explicitly switch the codepage to 1252, thus making the default GR block Latin-1 rather than UTF-8, which would be more compatible with a standard VT terminal. We don't need to do all of these things at once, but this is the long term plan I had in mind.
claunia added the Product-ConhostResolution-Fix-CommittedIssue-TaskArea-VT labels 2026-01-30 23:55:00 +00:00
Author
Owner

@ghost commented on GitHub (Jun 18, 2020):

:tada:This issue was addressed in #4496, which has now been successfully released as Windows Terminal Preview v1.1.1671.0.🎉

Handy links:

@ghost commented on GitHub (Jun 18, 2020): :tada:This issue was addressed in #4496, which has now been successfully released as `Windows Terminal Preview v1.1.1671.0`.:tada: Handy links: * [Release Notes](https://github.com/microsoft/terminal/releases/tag/v1.1.1671.0) * [Store Download](https://www.microsoft.com/store/apps/9n0dx20hk701?cid=storebadge&ocid=badge)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#4723