[PR #1850] [MERGED] Fix for UTF-8 partials in function ConhostConnection::_OutputThread. #24679

Open
opened 2026-01-31 09:04:44 +00:00 by claunia · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/microsoft/terminal/pull/1850
Author: @german-one
Created: 7/6/2019
Status: Merged
Merged: 7/16/2019
Merged by: @miniksa

Base: masterHead: master


📝 Commits (10+)

  • ea46579 Cache UTF-8 partials of ConhostConnection output pipe
  • 7c83c38 Revert "Cache UTF-8 partials of ConhostConnection output pipe"
  • 7a4a814 Fix for UTF-8 partials in functions ConhostConnection::_OutputThread and ApiRoutines::WriteConsoleOutputCharacterAImpl
  • a424a16 Revert "Fix for UTF-8 partials in functions ConhostConnection::_OutputThread and ApiRoutines::WriteConsoleOutputCharacterAImpl"
  • 85b5509 Fix for UTF-8 partials in function ConhostConnection::_OutputThread
  • 5780770 Fix for UTF-8 partials in function ConhostConnection::_OutputThread
  • d0e6e82 Fix for UTF-8 partials in function ConhostConnection::_OutputThread
  • d98c589 Fix for UTF-8 partials in function ConhostConnection::_OutputThread
  • 7df6609 Fix for UTF-8 partials in function ApiRoutines::WriteConsoleOutputCharacterAImpl
  • 507f526 Fix for UTF-8 partials in function ConhostConnection::_OutputThread

📊 Changes

7 files changed (+325 additions, -18 deletions)

View changed files

📝 src/cascadia/TerminalConnection/ConhostConnection.cpp (+16 -18)
src/types/UTF8OutPipeReader.cpp (+74 -0)
src/types/inc/UTF8OutPipeReader.hpp (+69 -0)
📝 src/types/lib/types.vcxproj (+2 -0)
📝 src/types/lib/types.vcxproj.filters (+6 -0)
📝 src/types/ut_types/Types.Unit.Tests.vcxproj (+1 -0)
src/types/ut_types/UTF8OutPipeReaderTests.cpp (+157 -0)

📄 Description

Summary of the Pull Request

ConhostConnection::_OutputThread shall take care of partial UTF-8 characters generated while buffering the stream read.

References

This PR may partially fix the occurrence of � characters as seen on screenshots in the following issues:
#386 #455 #666

PR Checklist

  • Closes #xxx
  • CLA signed. If not, go over here and sign the CLA
  • Tests added/passed
  • Requires documentation to be updated
  • I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #386

Detailed Description of the Pull Request / Additional comments

Code points are represented by a sequence of 1 to 4 bytes in UTF-8. Whenever UTF-8 text is getting buffered, code points that consist of multiple bytes may get split at the buffer boundaries. The buffer gets converted into a string of wchar_t where those partials are invalid and make MultiByteToWideChar replacing them with U+FFFD characters. The implementation needs to check whether or not the buffer ends with a partial character. If so, only convert the code points which are complete, and save the partial code units in a cache that gets prepended to the next chunk of text.

Remark:
The PR includes the removal of the UTF-8 Byte Order Mark if it is present at the beginning of the stream read.

Validation Steps Performed

Corresponding file UTF8OutPipeReaderTests.cpp added to the Types.Unit.Tests project.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/microsoft/terminal/pull/1850 **Author:** [@german-one](https://github.com/german-one) **Created:** 7/6/2019 **Status:** ✅ Merged **Merged:** 7/16/2019 **Merged by:** [@miniksa](https://github.com/miniksa) **Base:** `master` ← **Head:** `master` --- ### 📝 Commits (10+) - [`ea46579`](https://github.com/microsoft/terminal/commit/ea46579794aa984d3141c2de10e9363fd9db6331) Cache UTF-8 partials of ConhostConnection output pipe - [`7c83c38`](https://github.com/microsoft/terminal/commit/7c83c38d7356f74e0e3a683d601376c00d8b1f7f) Revert "Cache UTF-8 partials of ConhostConnection output pipe" - [`7a4a814`](https://github.com/microsoft/terminal/commit/7a4a814aa47dc603f7dcfcbeb2d23ba320a87a85) Fix for UTF-8 partials in functions `ConhostConnection::_OutputThread` and `ApiRoutines::WriteConsoleOutputCharacterAImpl` - [`a424a16`](https://github.com/microsoft/terminal/commit/a424a16b4c6c2749a7c23338fedbac189558710a) Revert "Fix for UTF-8 partials in functions `ConhostConnection::_OutputThread` and `ApiRoutines::WriteConsoleOutputCharacterAImpl`" - [`85b5509`](https://github.com/microsoft/terminal/commit/85b5509dc00d7fd533a4b1cbec3ce9e5c348d7df) Fix for UTF-8 partials in function `ConhostConnection::_OutputThread` - [`5780770`](https://github.com/microsoft/terminal/commit/57807707ed1242b7115885e37e97d843a9861f65) Fix for UTF-8 partials in function `ConhostConnection::_OutputThread` - [`d0e6e82`](https://github.com/microsoft/terminal/commit/d0e6e82945ba87092acc631132a3896f9dbca69c) Fix for UTF-8 partials in function `ConhostConnection::_OutputThread` - [`d98c589`](https://github.com/microsoft/terminal/commit/d98c5899fb3c4e7baae7146617dc914e25435684) Fix for UTF-8 partials in function `ConhostConnection::_OutputThread` - [`7df6609`](https://github.com/microsoft/terminal/commit/7df6609112ff366b8fd15304d588db6a8bdc54c9) Fix for UTF-8 partials in function `ApiRoutines::WriteConsoleOutputCharacterAImpl` - [`507f526`](https://github.com/microsoft/terminal/commit/507f526e308d40fb00ddbeb452bcb837aafe7034) Fix for UTF-8 partials in function `ConhostConnection::_OutputThread` ### 📊 Changes **7 files changed** (+325 additions, -18 deletions) <details> <summary>View changed files</summary> 📝 `src/cascadia/TerminalConnection/ConhostConnection.cpp` (+16 -18) ➕ `src/types/UTF8OutPipeReader.cpp` (+74 -0) ➕ `src/types/inc/UTF8OutPipeReader.hpp` (+69 -0) 📝 `src/types/lib/types.vcxproj` (+2 -0) 📝 `src/types/lib/types.vcxproj.filters` (+6 -0) 📝 `src/types/ut_types/Types.Unit.Tests.vcxproj` (+1 -0) ➕ `src/types/ut_types/UTF8OutPipeReaderTests.cpp` (+157 -0) </details> ### 📄 Description <!-- Enter a brief description/summary of your PR here. What does it fix/what does it change/how was it tested (even manually, if necessary)? --> ## Summary of the Pull Request `ConhostConnection::_OutputThread` shall take care of partial UTF-8 characters generated while buffering the stream read. <!-- Other than the issue solved, is this relevant to any other issues/existing PRs? --> ## References This PR may partially fix the occurrence of � characters as seen on screenshots in the following issues: #386 #455 #666 <!-- Please review the items on the PR checklist before submitting--> ## PR Checklist * [ ] Closes #xxx * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA * [x] Tests added/passed * [ ] Requires documentation to be updated * [x] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #386 <!-- Provide a more detailed description of the PR, other things fixed or any additional comments/features here --> ## Detailed Description of the Pull Request / Additional comments Code points are represented by a sequence of 1 to 4 bytes in UTF-8. Whenever UTF-8 text is getting buffered, code points that consist of multiple bytes may get split at the buffer boundaries. The buffer gets converted into a string of wchar_t where those partials are invalid and make `MultiByteToWideChar` replacing them with U+FFFD characters. The implementation needs to check whether or not the buffer ends with a partial character. If so, only convert the code points which are complete, and save the partial code units in a cache that gets prepended to the next chunk of text. ~~Remark: The PR includes the removal of the UTF-8 Byte Order Mark if it is present at the beginning of the stream read.~~ <!-- Describe how you validated the behavior. Add automated tests wherever possible, but list manual validation steps taken as well --> ## Validation Steps Performed Corresponding file `UTF8OutPipeReaderTests.cpp` added to the `Types.Unit.Tests` project. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
claunia added the pull-request label 2026-01-31 09:04:44 +00:00
Sign in to join this conversation.
No Label pull-request
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#24679