[PR #10966] [MERGED] Refactor u8u16 and u16u8 conversion functions #28327

Open
opened 2026-01-31 09:27:49 +00:00 by claunia · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/microsoft/terminal/pull/10966
Author: @german-one
Created: 8/17/2021
Status: Merged
Merged: 8/24/2021
Merged by: @undefined

Base: mainHead: dev/steffen/u8u16nobuffer


📝 Commits (7)

  • ce3b5e0 Get rid of buffering in the partials handling of u8u16 and u16u8
  • f1642fc draft feedback
  • 2480a36 PR feedback lhecker
  • 70c2120 pass UT, make initialization style consistent
  • 283ae94 meaning of codePointLen 0 and 1 is no lead byte
  • 8f50cb0 use CATCH_RETURN
  • 912eaa8 add notes and source of the branchless UTF-8 converter

📊 Changes

2 files changed (+174 additions, -303 deletions)

View changed files

📝 .github/actions/spelling/allow/names.txt (+2 -0)
📝 src/inc/til/u8u16convert.h (+172 -303)

📄 Description

  • Perform the handling of partial code points in the u8u16 and u16u8
    conversion functions without preparation in a preliminary buffer.
  • Simplify partials handling in u8u16 (perf).
  • Declare the parameters for the incoming data as referenced
    string_views.
  • Simplify templatization.
  • Simplify exception handling.

We complete the partial codepoint in the 4-bytes long cache and convert
it separately. This makes the cache ready for capturing the next
partials before the remaining string is converted. This way, we neither
need to copy the whole string into a buffer which contains complete
codepoints, nor do we need to allocate an unnecessarily long buffer
which exists for the life time of the state class instance.

Finding and capturing of partials is performed in a more linear code
using the evaluation of the length of a code point.

The parameters for the incoming data are now explicitely declared to be
referenced string_views.

CATCH_RETURN is used to improve the readability of the code.

Validation Steps Performed

  • manually tested
  • unit tests passed

Closes #10946

Co-authored-by: Leonard Hecker lhecker@microsoft.com


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/microsoft/terminal/pull/10966 **Author:** [@german-one](https://github.com/german-one) **Created:** 8/17/2021 **Status:** ✅ Merged **Merged:** 8/24/2021 **Merged by:** [@undefined](undefined) **Base:** `main` ← **Head:** `dev/steffen/u8u16nobuffer` --- ### 📝 Commits (7) - [`ce3b5e0`](https://github.com/microsoft/terminal/commit/ce3b5e06794a8c66f6dac95940859d08c31520d0) Get rid of buffering in the partials handling of u8u16 and u16u8 - [`f1642fc`](https://github.com/microsoft/terminal/commit/f1642fcb510aea2b66ac0bcf6afacaccafaa3651) draft feedback - [`2480a36`](https://github.com/microsoft/terminal/commit/2480a365883c84a6af521ecb553b298b287a8ae0) PR feedback lhecker - [`70c2120`](https://github.com/microsoft/terminal/commit/70c212033e52aa811397649ca9969b8d27852488) pass UT, make initialization style consistent - [`283ae94`](https://github.com/microsoft/terminal/commit/283ae94b2a4377305a071f70dd2bac4ecb7a7a65) meaning of `codePointLen` 0 and 1 is no lead byte - [`8f50cb0`](https://github.com/microsoft/terminal/commit/8f50cb0686d3a3bb0849b2269d4a2072275567e4) use `CATCH_RETURN` - [`912eaa8`](https://github.com/microsoft/terminal/commit/912eaa82b3bff61a0b9cb9099f6c93711b2ed36c) add notes and source of the branchless UTF-8 converter ### 📊 Changes **2 files changed** (+174 additions, -303 deletions) <details> <summary>View changed files</summary> 📝 `.github/actions/spelling/allow/names.txt` (+2 -0) 📝 `src/inc/til/u8u16convert.h` (+172 -303) </details> ### 📄 Description * Perform the handling of partial code points in the `u8u16` and `u16u8` conversion functions without preparation in a preliminary buffer. * Simplify partials handling in `u8u16` (perf). * Declare the parameters for the incoming data as referenced string_views. * Simplify templatization. * Simplify exception handling. We complete the partial codepoint in the 4-bytes long cache and convert it separately. This makes the cache ready for capturing the next partials before the remaining string is converted. This way, we neither need to copy the whole string into a buffer which contains complete codepoints, nor do we need to allocate an unnecessarily long buffer which exists for the life time of the state class instance. Finding and capturing of partials is performed in a more linear code using the evaluation of the length of a code point. The parameters for the incoming data are now explicitely declared to be referenced string_views. `CATCH_RETURN` is used to improve the readability of the code. ## Validation Steps Performed * manually tested * unit tests passed Closes #10946 Co-authored-by: Leonard Hecker <lhecker@microsoft.com> --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
claunia added the pull-request label 2026-01-31 09:27:49 +00:00
Sign in to join this conversation.
No Label pull-request
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#28327