Complete the 8-bit interface architecture #21081

Closed
opened 2026-01-31 07:32:35 +00:00 by claunia · 2 comments
Owner

Originally created by @j4james on GitHub (Jan 9, 2024).

Description of the new feature/enhancement

This is a DEC extension which introduced the concept of 96-character graphic sets (ISO Latin-1 being the first of those), and allowed for replacing the DEC supplemental set with Latin-1. That became known as the user-preference supplemental set (UPSS), and later models allowed you to replace it with other character sets as well.

We already support much of this functionality, but what we're missing is the DECAUPSS sequence which handles the UPSS character set assignment, and the DECRQUPSS sequence, which queries the active UPSS assignment. Also a couple of sequences which are used to initialize the G-set and GL/GR mappings for ANSI conformance.

I suspect this functionality isn't widely used, so I can understand if you don't want to include it, but it is a level 3 conformance requirement. It is also implemented by XTerm, so you can think of it as an improvement in our XTerm compatibility.

Proposed technical implementation details (optional)

In the TerminalOutput class we'll need a new wstring_view field to track the active UPSS translation table, and then whenever someone tries to designate character set ID <, we'll use that value instead of Latin1.

Originally created by @j4james on GitHub (Jan 9, 2024). # Description of the new feature/enhancement This is a DEC extension which introduced the concept of 96-character graphic sets (ISO Latin-1 being the first of those), and allowed for replacing the DEC supplemental set with Latin-1. That became known as the user-preference supplemental set (UPSS), and later models allowed you to replace it with other character sets as well. We already support much of this functionality, but what we're missing is the `DECAUPSS` sequence which handles the UPSS character set assignment, and the `DECRQUPSS` sequence, which queries the active UPSS assignment. Also a couple of sequences which are used to initialize the G-set and GL/GR mappings for ANSI conformance. I suspect this functionality isn't widely used, so I can understand if you don't want to include it, but it is a level 3 conformance requirement. It is also implemented by XTerm, so you can think of it as an improvement in our XTerm compatibility. # Proposed technical implementation details (optional) In the `TerminalOutput` class we'll need a new `wstring_view` field to track the active UPSS translation table, and then whenever someone tries to designate character set ID `<`, we'll use that value instead of `Latin1`.
claunia added the Issue-FeatureHelp WantedProduct-ConhostIn-PRArea-VT labels 2026-01-31 07:32:36 +00:00
Author
Owner

@j4james commented on GitHub (Jan 13, 2024):

There's something else that I wanted to address at the same time as this. It's related to the way we switch between UTF8 and 8-bit mode with a DOCS sequence.

At the time I implemented the DOCS support, I thought it would be a good idea to make the soft reset (DECSTR) switch the code page back to its initial state, because the character sets are somewhat dependent on that, and DECSTR is expected to reset the character sets. In retrospect, though, I think that was a mistake.

If you've used a DOCS sequence to switch to 8-bit mode, it's likely because you're trying to run a legacy app that produces 8-bit output. But if that app happens to use a DECSTR sequence at some point (which isn't that unlikely), it will unexpectedly switch back to UTF8 (or some DOS code page), and the rest of the output will now be corrupted.

There's a similar problem with the DECSC/DECRC sequences. They save and restore the character set designations, so again I made them save and restore the code page too. However, the way things are implemented, this can leave the system in a disjoint state where it thinks 8-bit is enabled, but the code page is actually UTF8 (or vice versa).

So for both DECSTR and DECRC, I think it might be best if they just had no effect on the code page, and the only way to reset things after using a DOCS sequence is with a hard reset (RIS).

It's also worth mentioning that XTerm doesn't restore UTF8 with DECSTR or DECRC, so this change would improve our XTerm compatibility.

@j4james commented on GitHub (Jan 13, 2024): There's something else that I wanted to address at the same time as this. It's related to the way we switch between UTF8 and 8-bit mode with a `DOCS` sequence. At the time I implemented the `DOCS` support, I thought it would be a good idea to make the soft reset (`DECSTR`) switch the code page back to its initial state, because the character sets are somewhat dependent on that, and `DECSTR` is expected to reset the character sets. In retrospect, though, I think that was a mistake. If you've used a `DOCS` sequence to switch to 8-bit mode, it's likely because you're trying to run a legacy app that produces 8-bit output. But if that app happens to use a `DECSTR` sequence at some point (which isn't that unlikely), it will unexpectedly switch back to UTF8 (or some DOS code page), and the rest of the output will now be corrupted. There's a similar problem with the `DECSC`/`DECRC` sequences. They save and restore the character set designations, so again I made them save and restore the code page too. However, the way things are implemented, this can leave the system in a disjoint state where it thinks 8-bit is enabled, but the code page is actually UTF8 (or vice versa). So for both `DECSTR` and `DECRC`, I think it might be best if they just had no effect on the code page, and the only way to reset things after using a `DOCS` sequence is with a hard reset (`RIS`). It's also worth mentioning that XTerm doesn't restore UTF8 with `DECSTR` or `DECRC`, so this change would improve our XTerm compatibility.
Author
Owner

@lhecker commented on GitHub (Jan 15, 2024):

Yeah I think this is a good idea, in particular if it gets us closer to something as established as xterm.

@lhecker commented on GitHub (Jan 15, 2024): Yeah I think this is a good idea, in particular if it gets us closer to something as established as xterm.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#21081