ReadFile/ReadConsoleA append U+FFFD to each emoji read in UTF-8 #23678

Open
opened 2026-01-31 08:49:01 +00:00 by claunia · 0 comments
Owner

Originally created by @ghost on GitHub (Oct 13, 2025).

Originally assigned to: @lhecker on GitHub.

Windows Terminal version

1.23.250825001

Windows build number

10.0.26100.0

Other Software

No response

Steps to reproduce

Test code:

// /std:c++latest /utf-8

#include <exception>
#include <print>
#include <string>

#include <windows.h>

namespace my {
template <auto... Errors> auto check(auto result) {
  static_assert(sizeof...(Errors) != 0);
  if ((... && (result != Errors))) {
    return result;
  }
  std::terminate();
}
void assert_equal(auto x, auto y) {
  if (x != y) {
    std::terminate();
  }
}
} // namespace my

int main() {
  my::check<FALSE>(::SetConsoleCP(CP_UTF8));
  my::check<FALSE>(::SetConsoleOutputCP(CP_UTF8));
  const auto std_input = my::check<INVALID_HANDLE_VALUE, nullptr>(
      ::GetStdHandle(STD_INPUT_HANDLE));
  const auto std_output = my::check<INVALID_HANDLE_VALUE, nullptr>(
      ::GetStdHandle(STD_OUTPUT_HANDLE));
  char c = {};
  std::string s = {};
  ::DWORD number_of_bytes_read = {};
  ::DWORD number_of_bytes_written = {};
  while (true) {
    my::check<FALSE>(
        ::ReadFile(std_input, &c, 1, &number_of_bytes_read, nullptr));
    if (number_of_bytes_read == 0) {
      break;
    }
    std::print("{:02x}{}", c, c == '\n' ? '\n' : ' ');
    s += c;
    if (c == '\n') {
      const auto number_of_bytes_to_write = static_cast<::DWORD>(s.size());
      my::assert_equal(number_of_bytes_to_write, s.size());
      my::check<FALSE>(::WriteFile(std_output, s.data(),
                                   number_of_bytes_to_write,
                                   &number_of_bytes_written, nullptr));
      my::assert_equal(number_of_bytes_written, number_of_bytes_to_write);
      s = {};
    }
  }
}

Run the above code inside Windows Terminal (or in conhost.exe - the result is the same). Type in some emojis; a possible output is:

😀
f0 9f 98 80 ef bf bd 0d 0a
😀�
😀😀
f0 9f 98 80 ef bf bd f0 9f 98 80 ef bf bd 0d 0a
😀�😀�
^Z

Observe that each emoji read is followed by the ef bf bd sequence (the UTF-8 encoding of the replacement character).

Expected Behavior

These replacement characters should not appear in the read byte stream.

Actual Behavior

For some unknown reason they do appear. If there is a bug in the test code above, please let me know.

Originally created by @ghost on GitHub (Oct 13, 2025). Originally assigned to: @lhecker on GitHub. ### Windows Terminal version 1.23.250825001 ### Windows build number 10.0.26100.0 ### Other Software _No response_ ### Steps to reproduce Test code: ```cpp // /std:c++latest /utf-8 #include <exception> #include <print> #include <string> #include <windows.h> namespace my { template <auto... Errors> auto check(auto result) { static_assert(sizeof...(Errors) != 0); if ((... && (result != Errors))) { return result; } std::terminate(); } void assert_equal(auto x, auto y) { if (x != y) { std::terminate(); } } } // namespace my int main() { my::check<FALSE>(::SetConsoleCP(CP_UTF8)); my::check<FALSE>(::SetConsoleOutputCP(CP_UTF8)); const auto std_input = my::check<INVALID_HANDLE_VALUE, nullptr>( ::GetStdHandle(STD_INPUT_HANDLE)); const auto std_output = my::check<INVALID_HANDLE_VALUE, nullptr>( ::GetStdHandle(STD_OUTPUT_HANDLE)); char c = {}; std::string s = {}; ::DWORD number_of_bytes_read = {}; ::DWORD number_of_bytes_written = {}; while (true) { my::check<FALSE>( ::ReadFile(std_input, &c, 1, &number_of_bytes_read, nullptr)); if (number_of_bytes_read == 0) { break; } std::print("{:02x}{}", c, c == '\n' ? '\n' : ' '); s += c; if (c == '\n') { const auto number_of_bytes_to_write = static_cast<::DWORD>(s.size()); my::assert_equal(number_of_bytes_to_write, s.size()); my::check<FALSE>(::WriteFile(std_output, s.data(), number_of_bytes_to_write, &number_of_bytes_written, nullptr)); my::assert_equal(number_of_bytes_written, number_of_bytes_to_write); s = {}; } } } ``` Run the above code inside Windows Terminal (or in `conhost.exe` - the result is the same). Type in some emojis; a possible output is: ``` 😀 f0 9f 98 80 ef bf bd 0d 0a 😀� 😀😀 f0 9f 98 80 ef bf bd f0 9f 98 80 ef bf bd 0d 0a 😀�😀� ^Z ``` Observe that each emoji read is followed by the `ef bf bd` sequence (the UTF-8 encoding of the [replacement character](https://en.wikipedia.org/w/index.php?title=Specials_(Unicode_block)&oldid=1312998926#Replacement_character)). ### Expected Behavior These replacement characters should not appear in the read byte stream. ### Actual Behavior For some unknown reason they do appear. If there is a bug in the test code above, please let me know.
claunia added the Product-ConhostArea-OutputIssue-BugImpact-CorrectnessPriority-2 labels 2026-01-31 08:49:02 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#23678