WriteConsoleA in CP_UTF8 cannot handle receiving lead, continuation and trail in separate operations #6723

Closed
opened 2026-01-31 00:45:41 +00:00 by claunia · 4 comments
Owner

Originally created by @DHowett-MSFT on GitHub (Mar 3, 2020).

In til::u8state, when we decrement backIter, we get a checked iterator exception (in debug mode). In release mode, we just throw away the partial because it fails the "if lead byte found" check.

Throw on --backiter:

2d707f102b/src/inc/til/u8u16convert.h (L88)

Here is a test that demonstrates the failing behavior:

void Utf8Utf16ConvertTests::TestU8ToU16OneByOne()
{
    const std::string u8String1_1{ '\xF0' }; // U+1F4F7 CAMERA (4 bytes)
    const std::string u8String1_2{ '\x9F' };
    const std::string u8String1_3{ '\x93' };
    const std::string u8String1_4{ '\xB7' };

    const std::wstring u16StringComp1{
        gsl::narrow_cast<wchar_t>(0xD83D), // U+1F4F7 CAMERA (surrogate pair)
        gsl::narrow_cast<wchar_t>(0xDCF7)
    };

    til::u8state state{};

    std::wstring u16Out1{};
    VERIFY_SUCCEEDED(til::u8u16(u8String1_1, u16Out1, state));
    VERIFY_ARE_EQUAL(L"", u16Out1); // There should be no output for the first three bytes
    VERIFY_SUCCEEDED(til::u8u16(u8String1_2, u16Out1, state));
    VERIFY_ARE_EQUAL(L"", u16Out1);
    VERIFY_SUCCEEDED(til::u8u16(u8String1_3, u16Out1, state));
    VERIFY_ARE_EQUAL(L"", u16Out1);
    VERIFY_SUCCEEDED(til::u8u16(u8String1_4, u16Out1, state));
    VERIFY_ARE_EQUAL(u16StringComp1, u16Out1);
}

In debug builds:

StartGroup: Utf8Utf16ConvertTests::TestU8ToU16OneByOne
Verify: SUCCEEDED(til::u8u16(u8String1_1, u16Out1, state))
Verify: AreEqual(L"", u16Out1)
Error: TAEF: [HRESULT 0x800706BE] A failure occurred while running a test operation: 'Utf8Utf16ConvertTests::TestU8ToU16OneByOne'. (The process hosting the test code was unexpectedly terminated with exit code 0x00000003 while invoking a test operation.)
EndGroup: Utf8Utf16ConvertTests::TestU8ToU16OneByOne [Failed]

In release builds:

StartGroup: Utf8Utf16ConvertTests::TestU8ToU16OneByOne
Verify: SUCCEEDED(til::u8u16(u8String1_1, u16Out1, state))
Verify: AreEqual(L"", u16Out1)
Verify: SUCCEEDED(til::u8u16(u8String1_2, u16Out1, state))
Error: Verify: AreEqual(L"", u16Out1) - Values (, �) [File: E:\src\OpenConsole\src\til\ut_til\u8u16convertTests.cpp, Function: Utf8Utf16ConvertTests::TestU8ToU16OneByOne, Line: 188]
EndGroup: Utf8Utf16ConvertTests::TestU8ToU16OneByOne [Failed]

/cc @german-one

Originally created by @DHowett-MSFT on GitHub (Mar 3, 2020). In `til::u8state`, when we decrement backIter, we get a checked iterator exception (in debug mode). In release mode, we just throw away the partial because it fails the "if lead byte found" check. Throw on `--backiter`: https://github.com/microsoft/terminal/blob/2d707f102bf27d455e15dcc6001337a8da6869c2/src/inc/til/u8u16convert.h#L88 Here is a test that demonstrates the failing behavior: ```cpp void Utf8Utf16ConvertTests::TestU8ToU16OneByOne() { const std::string u8String1_1{ '\xF0' }; // U+1F4F7 CAMERA (4 bytes) const std::string u8String1_2{ '\x9F' }; const std::string u8String1_3{ '\x93' }; const std::string u8String1_4{ '\xB7' }; const std::wstring u16StringComp1{ gsl::narrow_cast<wchar_t>(0xD83D), // U+1F4F7 CAMERA (surrogate pair) gsl::narrow_cast<wchar_t>(0xDCF7) }; til::u8state state{}; std::wstring u16Out1{}; VERIFY_SUCCEEDED(til::u8u16(u8String1_1, u16Out1, state)); VERIFY_ARE_EQUAL(L"", u16Out1); // There should be no output for the first three bytes VERIFY_SUCCEEDED(til::u8u16(u8String1_2, u16Out1, state)); VERIFY_ARE_EQUAL(L"", u16Out1); VERIFY_SUCCEEDED(til::u8u16(u8String1_3, u16Out1, state)); VERIFY_ARE_EQUAL(L"", u16Out1); VERIFY_SUCCEEDED(til::u8u16(u8String1_4, u16Out1, state)); VERIFY_ARE_EQUAL(u16StringComp1, u16Out1); } ``` In debug builds: ``` StartGroup: Utf8Utf16ConvertTests::TestU8ToU16OneByOne Verify: SUCCEEDED(til::u8u16(u8String1_1, u16Out1, state)) Verify: AreEqual(L"", u16Out1) Error: TAEF: [HRESULT 0x800706BE] A failure occurred while running a test operation: 'Utf8Utf16ConvertTests::TestU8ToU16OneByOne'. (The process hosting the test code was unexpectedly terminated with exit code 0x00000003 while invoking a test operation.) EndGroup: Utf8Utf16ConvertTests::TestU8ToU16OneByOne [Failed] ``` In release builds: ``` StartGroup: Utf8Utf16ConvertTests::TestU8ToU16OneByOne Verify: SUCCEEDED(til::u8u16(u8String1_1, u16Out1, state)) Verify: AreEqual(L"", u16Out1) Verify: SUCCEEDED(til::u8u16(u8String1_2, u16Out1, state)) Error: Verify: AreEqual(L"", u16Out1) - Values (, �) [File: E:\src\OpenConsole\src\til\ut_til\u8u16convertTests.cpp, Function: Utf8Utf16ConvertTests::TestU8ToU16OneByOne, Line: 188] EndGroup: Utf8Utf16ConvertTests::TestU8ToU16OneByOne [Failed] ``` /cc @german-one
Author
Owner

@DHowett-MSFT commented on GitHub (Mar 3, 2020):

Oh no, do we need to take _buffer into account when we're backing up over the string to find lead bytes? Because we have a lead from a previous call, and we might only be getting a partial.

We could move _buffer.append(in, 0u, remainingLength) up above the check, and then do all our work on _buffer instead of in. Then we chop the end off _buffer when we're done.

We'll still need to fix the bad iterator decrement, because somebody could still just send a continuation byte.

@DHowett-MSFT commented on GitHub (Mar 3, 2020): Oh no, do we need to take `_buffer` into account when we're backing up over the string to find lead bytes? Because we have a lead from a previous call, and we might only be getting a partial. We could move `_buffer.append(in, 0u, remainingLength)` up above the check, and then do all our work on `_buffer` instead of `in`. Then we chop the end off `_buffer` when we're done. We'll still need to fix the bad iterator decrement, because somebody _could_ still just send a continuation byte.
Author
Owner

@german-one commented on GitHub (Mar 3, 2020):

Oh, that's a nasty bug. I guess it will work with initializing auto backIter = in.end(); and --backiter at the beginning of the loop body. And yes I agree, moving the appending up is also necessary.

@german-one commented on GitHub (Mar 3, 2020): Oh, that's a nasty bug. I guess it will work with initializing `auto backIter = in.end();` and `--backiter` at the beginning of the loop body. And yes I agree, moving the appending up is also necessary.
Author
Owner

@german-one commented on GitHub (Mar 4, 2020):

@DHowett-MSFT Sorry I've stolen your unit test method 😉

@german-one commented on GitHub (Mar 4, 2020): @DHowett-MSFT Sorry I've stolen your unit test method 😉
Author
Owner

@DHowett-MSFT commented on GitHub (Mar 4, 2020):

I'm happy you've stolen it! :D

@DHowett-MSFT commented on GitHub (Mar 4, 2020): I'm happy you've stolen it! :D
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#6723