[PR #13321] Introduce breaking changes to ReadConsoleOutput #29482

Open
opened 2026-01-31 09:35:10 +00:00 by claunia · 0 comments
Owner

Original Pull Request: https://github.com/microsoft/terminal/pull/13321

State: closed
Merged: Yes


#8000 will change the way we store text from a strict grid/matrix where
one UTF16 character or surrogate pair always equals 1 column with the
possibility of joining exactly 2 to a wide character pair, to a dynamic
buffer where 1 or more characters can form 1 or more columns in any
arbitrary combination. Our long term goal is to properly support both
complex grapheme clusters like Emojis and complex ligatures that a wider
than 2 columns. This change requires us to break our API as
ReadConsoleOutputA/W assumes the existence of exactly this grid/matrix
storage. Since we store wide characters like "い" as a single codepoint
that is simply marked as being 2 columns wide in the future, we cannot
reconstruct trailing DBCS characters that were written to the buffer
like we used to. On the other hand this new behavior allows us to
implement better Unicode support and most likely significantly improve
our performance.

Minor breaking changes

  • ReadConsoleOutputA will now always zero the high byte in
    (CHAR_INFO).Char.UnicodeChar. Only the .AsciiChar can be used
    then. This prevents users from storing "additional" data in the
    terminal buffer.
  • ReadConsoleOutputA will now zero the .AsciiChar if it fails to
    convert the Unicode character into an appropriate DBCS.
    • Example: It's possible to write "い" into a narrow column despite
      being a wide character. In these cases WriteConsoleOutputA will
      now return 0x00 instead of 0x44 (the lower half of い's code
      point 0x3044).

Major breaking changes

  • ReadConsoleOutputW will now repeat the leading Unicode character
    twice and ignore the trailing one.
    • Example 1: Writing the pair 0x3044 0xabcd with
      WriteConsoleOutputW used to yield the same 0x3044 0xabcd if read
      back with ReadConsoleOutputW. This worked because conhost
      effectively ignored the trailing codepoint, allowing one to
      "smuggle" data. In the future this trailing character will be
      discarded and produce 0x3044 0x3044 instead.
    • Example 2: Writing い with WriteConsoleOutputA can be done with
      code page 932 (Shift-JIS) and the DBCS 0x82 0xa2. If read back
      with ReadConsoleOutputW this would previously yield the two
      Unicode characters 0x3044 0xffff. After this commit it'll yield
      0x3044 0x3044.

Alternative approaches

It's possible to "tag"/"mark" written data as originating from
WriteConsoleOutputA/W so that it can be reconstructed accurately later
on. However this lead to implementation complexities that we're actively
trying to avoid in the new buffer implementation. Effectively
everything that touches the buffer's text would have to handle these
marks and either write or clear them. Given the most likely small amount
of users who depend on the current quirky behavior, it'd be an
unwarranted maintenance and performance burden and prevent Windows
Terminal to ever truly migrate to full Unicode support.

Validation Steps Performed

  • Adjusted feature tests complete successfully
**Original Pull Request:** https://github.com/microsoft/terminal/pull/13321 **State:** closed **Merged:** Yes --- #8000 will change the way we store text from a strict grid/matrix where one UTF16 character or surrogate pair always equals 1 column with the possibility of joining exactly 2 to a wide character pair, to a dynamic buffer where 1 or more characters can form 1 or more columns in any arbitrary combination. Our long term goal is to properly support both complex grapheme clusters like Emojis and complex ligatures that a wider than 2 columns. This change requires us to break our API as `ReadConsoleOutputA/W` assumes the existence of exactly this grid/matrix storage. Since we store wide characters like "い" as a single codepoint that is simply marked as being 2 columns wide in the future, we cannot reconstruct trailing DBCS characters that were written to the buffer like we used to. On the other hand this new behavior allows us to implement better Unicode support and most likely significantly improve our performance. ### Minor breaking changes * `ReadConsoleOutputA` will now always **zero** the high byte in `(CHAR_INFO).Char.UnicodeChar`. Only the `.AsciiChar` can be used then. This prevents users from storing "additional" data in the terminal buffer. * `ReadConsoleOutputA` will now **zero** the `.AsciiChar` if it fails to convert the Unicode character into an appropriate DBCS. * Example: It's possible to write "い" into a narrow column despite being a wide character. In these cases `WriteConsoleOutputA` will now return `0x00` instead of `0x44` (the lower half of い's code point `0x3044`). ### Major breaking changes * `ReadConsoleOutputW` will now repeat the leading Unicode character twice and ignore the trailing one. * Example 1: Writing the pair `0x3044 0xabcd` with `WriteConsoleOutputW` used to yield the same `0x3044 0xabcd` if read back with `ReadConsoleOutputW`. This worked because conhost effectively ignored the trailing codepoint, allowing one to "smuggle" data. In the future this trailing character will be discarded and produce `0x3044 0x3044` instead. * Example 2: Writing い with `WriteConsoleOutputA` can be done with code page 932 (Shift-JIS) and the DBCS `0x82 0xa2`. If read back with `ReadConsoleOutputW` this would previously yield the two Unicode characters `0x3044 0xffff`. After this commit it'll yield `0x3044 0x3044`. ### Alternative approaches It's possible to "tag"/"mark" written data as originating from `WriteConsoleOutputA/W` so that it can be reconstructed accurately later on. However this lead to implementation complexities that we're actively trying to avoid in the new buffer implementation. Effectively _everything_ that touches the buffer's text would have to handle these marks and either write or clear them. Given the most likely small amount of users who depend on the current quirky behavior, it'd be an unwarranted maintenance and performance burden and prevent Windows Terminal to ever truly migrate to full Unicode support. ## Validation Steps Performed * Adjusted feature tests complete successfully ✅
claunia added the pull-request label 2026-01-31 09:35:10 +00:00
Sign in to join this conversation.
No Label pull-request
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#29482