COOKED_READ doesn't return UTF-8 on *A APIs in CP_UTF8 #6397

Open
opened 2026-01-31 00:37:32 +00:00 by claunia · 0 comments
Owner

Originally created by @amyw-msft on GitHub (Feb 12, 2020).

Environment

Microsoft Windows [Version 10.0.18363.592]

Impact

This issue is affecting reading console input via the Universal C Runtime as well - _read, getchar, fread, scanf, etc. Using _cgets_s only works around this issue because it uses ReadConsoleW instead of ReadFile. This is also reported against the UCRT on Developer Community here: _read() cannot read UTF-8 but _cgets_s() can.

Steps to reproduce

When using ReadFile to read from a console handle, UTF-8 input is not correctly returned. Using ReadFile on other types of handles (files, pipes) can read UTF-8 without issue. SetConsoleCP and SetConsoleOutputCP do not appear to affect this behavior.

C:\Users\stwish\source\read_utf8>type win32_test.cpp
#include <Windows.h>
#include <stdio.h>

int main()
{
    SetConsoleCP(65001);
    SetConsoleOutputCP(65001);
    const HANDLE console_stdin = GetStdHandle(STD_INPUT_HANDLE);

    const size_t buf_count = 20;
    char buffer[buf_count]{};

    DWORD num_read;

    BOOL result = ReadFile(
        console_stdin,
        buffer,
        buf_count,
        &num_read,
        nullptr
        );

    printf("ReadFile returned '%d'\n", result);
    for (int i = 0; i < 20; i++)
    {
        printf("%02x ", (unsigned char)buffer[i]);
    }

    return 0;
}
C:\Users\stwish\source\read_utf8>cl /nologo /EHsc /MT win32_test.cpp /Zi
win32_test.cpp
C:\Users\stwish\source\read_utf8>win32_test.exe
我是中文字符
ReadFile returned '1'
00 00 00 00 00 00 0d 0a 00 00 00 00 00 00 00 00 00 00 00 00
C:\Users\stwish\source\read_utf8>echo 我是中文字符 | win32_test.exe
ReadFile returned '1'
e6 88 91 e6 98 af e4 b8 ad e6 96 87 e5 ad 97 e7 ac a6 20 0d
C:\Users\stwish\source\read_utf8>type input.txt
我是中文字符

C:\Users\stwish\source\read_utf8>type input.txt | win32_test.exe
ReadFile returned '1'
e6 88 91 e6 98 af e4 b8 ad e6 96 87 e5 ad 97 e7 ac a6 00 00

Expected behavior

Running win32_test.exe and entering '我是中文字符' input on the console should return e6 88 91 e6 98 af e4 b8 ad e6 96 87 e5 ad 97 e7 ac a6 0d 0a as this is the UTF-8 representation of that string, plus CR LF.

Actual behavior

Running win32_test.exe and entering '我是中文字符' input on the console will return 6 null characters and CR LF, but still returns that the read operation was successful.

Originally created by @amyw-msft on GitHub (Feb 12, 2020). # Environment ``` Microsoft Windows [Version 10.0.18363.592] ``` # Impact This issue is affecting reading console input via the Universal C Runtime as well - `_read`, `getchar`, `fread`, `scanf`, etc. Using `_cgets_s` only works around this issue because it uses `ReadConsoleW` instead of `ReadFile`. This is also reported against the UCRT on Developer Community here: [_read() cannot read UTF-8 but _cgets_s() can](https://developercommunity.visualstudio.com/content/problem/910961/-read-cannot-read-utf-8-but-cgets-s-can.html). # Steps to reproduce When using `ReadFile` to read from a console handle, UTF-8 input is not correctly returned. Using `ReadFile` on other types of handles (files, pipes) can read UTF-8 without issue. `SetConsoleCP` and `SetConsoleOutputCP` do not appear to affect this behavior. ``` C:\Users\stwish\source\read_utf8>type win32_test.cpp #include <Windows.h> #include <stdio.h> int main() { SetConsoleCP(65001); SetConsoleOutputCP(65001); const HANDLE console_stdin = GetStdHandle(STD_INPUT_HANDLE); const size_t buf_count = 20; char buffer[buf_count]{}; DWORD num_read; BOOL result = ReadFile( console_stdin, buffer, buf_count, &num_read, nullptr ); printf("ReadFile returned '%d'\n", result); for (int i = 0; i < 20; i++) { printf("%02x ", (unsigned char)buffer[i]); } return 0; } ``` ``` C:\Users\stwish\source\read_utf8>cl /nologo /EHsc /MT win32_test.cpp /Zi win32_test.cpp ``` ``` C:\Users\stwish\source\read_utf8>win32_test.exe 我是中文字符 ReadFile returned '1' 00 00 00 00 00 00 0d 0a 00 00 00 00 00 00 00 00 00 00 00 00 ``` ``` C:\Users\stwish\source\read_utf8>echo 我是中文字符 | win32_test.exe ReadFile returned '1' e6 88 91 e6 98 af e4 b8 ad e6 96 87 e5 ad 97 e7 ac a6 20 0d ``` ``` C:\Users\stwish\source\read_utf8>type input.txt 我是中文字符 C:\Users\stwish\source\read_utf8>type input.txt | win32_test.exe ReadFile returned '1' e6 88 91 e6 98 af e4 b8 ad e6 96 87 e5 ad 97 e7 ac a6 00 00 ``` # Expected behavior Running `win32_test.exe` and entering '我是中文字符' input on the console should return `e6 88 91 e6 98 af e4 b8 ad e6 96 87 e5 ad 97 e7 ac a6 0d 0a` as this is the UTF-8 representation of that string, plus CR LF. # Actual behavior Running `win32_test.exe` and entering '我是中文字符' input on the console will return 6 null characters and CR LF, but still returns that the read operation was successful.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#6397