Implement UTF-8 <--> UTF-16 conversion in user mode. #5742

Closed
opened 2026-01-31 00:20:26 +00:00 by claunia · 0 comments
Owner

Originally created by @german-one on GitHub (Dec 31, 2019).

Description of the new feature/enhancement

There are disparate UTF-8 parsers. One implemented in Utf8ToWideCharParser and one in UTF8OutputPipeReader. The latter makes it rather difficult to unify UTF-8 parsing because it combines reading from a pipe with the handling of partial code units.

Proposed technical implementation details (optional)

Get rid of UTF8OutputPipeReader, move the pipe reading back to ConptyConnection::_OutputThread().

Implement UTF-8 <--> UTF-16 conversion in user mode. Enable to toggle between ignoring invalid UTF-8 and replacing it with U+FFFD. See #3378

Implement a re-usable partials handling.

Originally created by @german-one on GitHub (Dec 31, 2019). <!-- 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨 I ACKNOWLEDGE THE FOLLOWING BEFORE PROCEEDING: 1. If I delete this entire template and go my own path, the core team may close my issue without further explanation or engagement. 2. If I list multiple bugs/concerns in this one issue, the core team may close my issue without further explanation or engagement. 3. If I write an issue that has many duplicates, the core team may close my issue without further explanation or engagement (and without necessarily spending time to find the exact duplicate ID number). 4. If I leave the title incomplete when filing the issue, the core team may close my issue without further explanation or engagement. 5. If I file something completely blank in the body, the core team may close my issue without further explanation or engagement. All good? Then proceed! --> # Description of the new feature/enhancement <!-- A clear and concise description of what the problem is that the new feature would solve. Describe why and how a user would use this new functionality (if applicable). --> There are disparate UTF-8 parsers. One implemented in `Utf8ToWideCharParser` and one in `UTF8OutputPipeReader`. The latter makes it rather difficult to unify UTF-8 parsing because it combines reading from a pipe with the handling of partial code units. # Proposed technical implementation details (optional) <!-- A clear and concise description of what you want to happen. --> Get rid of `UTF8OutputPipeReader`, move the pipe reading back to `ConptyConnection::_OutputThread()`. Implement UTF-8 <--> UTF-16 conversion in user mode. Enable to toggle between ignoring invalid UTF-8 and replacing it with U+FFFD. See #3378 Implement a re-usable partials handling.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#5742