[PR #14417] [MERGED] Rewrite Utf16Parser #30088

Open
opened 2026-01-31 09:38:35 +00:00 by claunia · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/microsoft/terminal/pull/14417
Author: @lhecker
Created: 11/20/2022
Status: Merged
Merged: 11/23/2022
Merged by: @undefined

Base: mainHead: dev/lhecker/utf16-parser-reparations


📝 Commits (4)

📊 Changes

25 files changed (+275 additions, -398 deletions)

View changed files

📝 src/buffer/out/OutputCellIterator.cpp (+3 -2)
📝 src/buffer/out/search.cpp (+7 -8)
📝 src/buffer/out/search.h (+2 -2)
📝 src/buffer/out/textBuffer.cpp (+3 -5)
📝 src/buffer/out/ut_textbuffer/ReflowTests.cpp (+0 -1)
📝 src/cascadia/TerminalControl/ControlCore.cpp (+0 -1)
📝 src/cascadia/TerminalControl/ControlInteractivity.cpp (+0 -1)
📝 src/cascadia/TerminalControl/TermControl.cpp (+0 -1)
📝 src/host/_output.cpp (+0 -1)
📝 src/host/conimeinfo.cpp (+4 -7)
📝 src/host/ut_host/Host.UnitTests.vcxproj (+0 -1)
📝 src/host/ut_host/Host.UnitTests.vcxproj.filters (+0 -3)
src/host/ut_host/Utf16ParserTests.cpp (+0 -211)
📝 src/host/ut_host/sources (+0 -1)
src/inc/til/unicode.h (+164 -0)
📝 src/terminal/input/terminalInput.cpp (+3 -4)
src/til/ut_til/UnicodeTests.cpp (+82 -0)
📝 src/til/ut_til/sources (+1 -0)
📝 src/til/ut_til/til.unit.tests.vcxproj (+2 -0)
📝 src/til/ut_til/til.unit.tests.vcxproj.filters (+4 -0)

...and 5 more files

📄 Description

This commit replaces Utf16Parser with <til/unicode.h> which includes:

  • til::utf16_iterator as a replacement for Utf16Parser::Parse
  • til::utf16_next as a replacement for Utf16Parser::ParseNext

This fixes 2 bugs with Utf16Parser:

  • Swallowing invalid surrogate pairs instead of turning them into U+FFFD.
  • std::vector<std::vector<wchar_t>>. It's now >12000% faster.

Validation Steps Performed

  • New unit tests pass
  • Searching for narrow/wide characters in conhost works

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/microsoft/terminal/pull/14417 **Author:** [@lhecker](https://github.com/lhecker) **Created:** 11/20/2022 **Status:** ✅ Merged **Merged:** 11/23/2022 **Merged by:** [@undefined](undefined) **Base:** `main` ← **Head:** `dev/lhecker/utf16-parser-reparations` --- ### 📝 Commits (4) - [`21e73e8`](https://github.com/microsoft/terminal/commit/21e73e8dd0e91ddbe1e2cfad2de60e46d3bebae2) Rewrite Utf16Parser - [`f82f263`](https://github.com/microsoft/terminal/commit/f82f2632a06f1a42e12a2ab1610acaa682daf8cc) Address feedback - [`2fef996`](https://github.com/microsoft/terminal/commit/2fef996e7a39025ac640158830b4968884c3d54c) Address feedback - [`4a0d16d`](https://github.com/microsoft/terminal/commit/4a0d16d1f9f36eb77a29fefdf816c6d3d99d0606) Revert changes to Search ### 📊 Changes **25 files changed** (+275 additions, -398 deletions) <details> <summary>View changed files</summary> 📝 `src/buffer/out/OutputCellIterator.cpp` (+3 -2) 📝 `src/buffer/out/search.cpp` (+7 -8) 📝 `src/buffer/out/search.h` (+2 -2) 📝 `src/buffer/out/textBuffer.cpp` (+3 -5) 📝 `src/buffer/out/ut_textbuffer/ReflowTests.cpp` (+0 -1) 📝 `src/cascadia/TerminalControl/ControlCore.cpp` (+0 -1) 📝 `src/cascadia/TerminalControl/ControlInteractivity.cpp` (+0 -1) 📝 `src/cascadia/TerminalControl/TermControl.cpp` (+0 -1) 📝 `src/host/_output.cpp` (+0 -1) 📝 `src/host/conimeinfo.cpp` (+4 -7) 📝 `src/host/ut_host/Host.UnitTests.vcxproj` (+0 -1) 📝 `src/host/ut_host/Host.UnitTests.vcxproj.filters` (+0 -3) ➖ `src/host/ut_host/Utf16ParserTests.cpp` (+0 -211) 📝 `src/host/ut_host/sources` (+0 -1) ➕ `src/inc/til/unicode.h` (+164 -0) 📝 `src/terminal/input/terminalInput.cpp` (+3 -4) ➕ `src/til/ut_til/UnicodeTests.cpp` (+82 -0) 📝 `src/til/ut_til/sources` (+1 -0) 📝 `src/til/ut_til/til.unit.tests.vcxproj` (+2 -0) 📝 `src/til/ut_til/til.unit.tests.vcxproj.filters` (+4 -0) _...and 5 more files_ </details> ### 📄 Description This commit replaces `Utf16Parser` with `<til/unicode.h>` which includes: * `til::utf16_iterator` as a replacement for `Utf16Parser::Parse` * `til::utf16_next` as a replacement for `Utf16Parser::ParseNext` This fixes 2 bugs with `Utf16Parser`: * Swallowing invalid surrogate pairs instead of turning them into U+FFFD. * `std::vector<std::vector<wchar_t>>`. It's now >12000% faster. ## Validation Steps Performed * New unit tests pass ✅ * Searching for narrow/wide characters in conhost works ✅ --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
claunia added the pull-request label 2026-01-31 09:38:35 +00:00
Sign in to join this conversation.
No Label pull-request
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#30088