OpenConsole crash when pasting emoji #11893

Closed
opened 2026-01-31 03:00:22 +00:00 by claunia · 18 comments
Owner

Originally created by @amoldeshpande on GitHub (Dec 26, 2020).

Originally assigned to: @zadjii-msft on GitHub.

Environment

Windows build number:  10.0.19041.630]
Windows Terminal version (if applicable):

Any other software?
tcsh (native Windows version)

# Steps to reproduce

download tcsh-x64.exe from https://github.com/amoldeshpande/tcsh/releases/tag/20210103.10 

Launch tcsh.  Doesn't have to be in terminal.  Paste any emoji into it.  It will crash conhost.exe on Windows 10 2004 and OpenConsole.exe if you launch within Windows Terminal

# Expected behavior

Should paste an emoji

# Actual behavior


Edited: repro steps with tcsh. Cannnot repro it with my test program anymore

Originally created by @amoldeshpande on GitHub (Dec 26, 2020). Originally assigned to: @zadjii-msft on GitHub. <!-- 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨 I ACKNOWLEDGE THE FOLLOWING BEFORE PROCEEDING: 1. If I delete this entire template and go my own path, the core team may close my issue without further explanation or engagement. 2. If I list multiple bugs/concerns in this one issue, the core team may close my issue without further explanation or engagement. 3. If I write an issue that has many duplicates, the core team may close my issue without further explanation or engagement (and without necessarily spending time to find the exact duplicate ID number). 4. If I leave the title incomplete when filing the issue, the core team may close my issue without further explanation or engagement. 5. If I file something completely blank in the body, the core team may close my issue without further explanation or engagement. All good? Then proceed! --> <!-- This bug tracker is monitored by Windows Terminal development team and other technical folks. **Important: When reporting BSODs or security issues, DO NOT attach memory dumps, logs, or traces to Github issues**. Instead, send dumps/traces to secure@microsoft.com, referencing this GitHub issue. If this is an application crash, please also provide a Feedback Hub submission link so we can find your diagnostic data on the backend. Use the category "Apps > Windows Terminal (Preview)" and choose "Share My Feedback" after submission to get the link. Please use this form and describe your issue, concisely but precisely, with as much detail as possible. --> # Environment ```none Windows build number: 10.0.19041.630] Windows Terminal version (if applicable): Any other software? tcsh (native Windows version) # Steps to reproduce download tcsh-x64.exe from https://github.com/amoldeshpande/tcsh/releases/tag/20210103.10 Launch tcsh. Doesn't have to be in terminal. Paste any emoji into it. It will crash conhost.exe on Windows 10 2004 and OpenConsole.exe if you launch within Windows Terminal # Expected behavior Should paste an emoji # Actual behavior Edited: repro steps with tcsh. Cannnot repro it with my test program anymore
Author
Owner

@j4james commented on GitHub (Dec 27, 2020):

Irrespective of whether emojis are expected to work in these conditions, it seems very wrong to me that an error condition here would cause the entire process to terminate. I'm convinced that half of the FAIL_FAST calls in the code are unintentional, and the original author was likely expecting an error to be returned, or an exception to be thrown (much like all the places we're mistakenly using gsl::narrow instead of gsl::narrow_cast).

@j4james commented on GitHub (Dec 27, 2020): Irrespective of whether emojis are expected to work in these conditions, it seems very wrong to me that an error condition here would cause the entire process to terminate. I'm convinced that half of the `FAIL_FAST` calls in the code are unintentional, and the original author was likely expecting an error to be returned, or an exception to be thrown (much like all the places we're mistakenly using `gsl::narrow` instead of `gsl::narrow_cast`).
Author
Owner

@KalleOlaviNiemitalo commented on GitHub (Dec 27, 2020):

I cannot reproduce the crash when right-click pasting 😀 (U+1F600) or 👨‍👩‍👧‍👦 (U+1F468 U+200D U+1F469 U+200D U+1F467 U+200D U+1F466) to OpenConsole.exe copied from Windows Terminal 1.4.3243.0, with the demo program built for x86, on Windows 10 version 20H2 (19042.685), where the setting "Beta: Use Unicode UTF-8 for worldwide language support" is off.

I don't see why SetFileApisToANSI would even matter here; doesn't it only affect the encoding of file names, which the demo program does not use?

@KalleOlaviNiemitalo commented on GitHub (Dec 27, 2020): I cannot reproduce the crash when right-click pasting 😀 (U+1F600) or 👨‍👩‍👧‍👦 (U+1F468 U+200D U+1F469 U+200D U+1F467 U+200D U+1F466) to OpenConsole.exe copied from Windows Terminal 1.4.3243.0, with the demo program built for x86, on Windows 10 version 20H2 (19042.685), where the setting "Beta: Use Unicode UTF-8 for worldwide language support" is off. I don't see why SetFileApisToANSI would even matter here; doesn't it only affect the encoding of file names, which the demo program does not use?
Author
Owner

@j4james commented on GitHub (Dec 27, 2020):

@KalleOlaviNiemitalo I think you need to paste more than 100 characters (the length of the INPUT_RECORD buffer that was passed to ReadConsoleInput) in order to trigger the error. I could reproduce it by pasting the following sequence:

😀💘💥👍🎅🦄🍄🌎🚖🚀💣😀💘💥👍🎅🦄🍄🌎🚖🚀💣😀💘💥👍🎅🦄🍄🌎🚖🚀💣😀💘💥👍🎅🦄🍄🌎🚖🚀💣😀💘💥👍🎅🦄🍄🌎🚖🚀💣

@j4james commented on GitHub (Dec 27, 2020): @KalleOlaviNiemitalo I think you need to paste more than 100 characters (the length of the `INPUT_RECORD` buffer that was passed to `ReadConsoleInput`) in order to trigger the error. I could reproduce it by pasting the following sequence: 😀💘💥👍🎅🦄🍄🌎🚖⚓🚀⌚💣😀💘💥👍🎅🦄🍄🌎🚖⚓🚀⌚💣😀💘💥👍🎅🦄🍄🌎🚖⚓🚀⌚💣😀💘💥👍🎅🦄🍄🌎🚖⚓🚀⌚💣😀💘💥👍🎅🦄🍄🌎🚖⚓🚀⌚💣
Author
Owner

@amoldeshpande commented on GitHub (Dec 27, 2020):

I have been using a single 😁 to crash it.

@amoldeshpande commented on GitHub (Dec 27, 2020): I have been using a single 😁 to crash it.
Author
Owner

@j4james commented on GitHub (Dec 27, 2020):

In that case I can't reproduce it either. What's the commit hash of the revision that you're testing with? And in case it makes any difference, what are the build details - x86 or x64, debug or release? Also, what keyboard locale are you using?

@j4james commented on GitHub (Dec 27, 2020): In that case I can't reproduce it either. What's the commit hash of the revision that you're testing with? And in case it makes any difference, what are the build details - x86 or x64, debug or release? Also, what keyboard locale are you using?
Author
Owner

@j4james commented on GitHub (Dec 27, 2020):

I just realised that the crash that I was seeing was actually in the DirectReadData::Notify method rather than the _DoGetConsoleInput function, but the code is essentially identical.

This is in DirectReadData::Notify:
d09fdd61cb/src/host/readDataDirect.cpp (L168-L184)

And this is in _DoGetConsoleInput:
d09fdd61cb/src/host/directio.cpp (L226-L242)

Either of those will crash if the number of readEvents is more than 1 greater than the _eventReadCount, and that is easy to achieve with a multibyte codepage. I can't see how the crash could be triggered with a single character, though, unless ReadConsoleInput was called with a much smaller record buffer.

@j4james commented on GitHub (Dec 27, 2020): I just realised that the crash that I was seeing was actually in the `DirectReadData::Notify` method rather than the `_DoGetConsoleInput` function, but the code is essentially identical. This is in `DirectReadData::Notify`: https://github.com/microsoft/terminal/blob/d09fdd61cbb11b7ef2ccdd4820349ffe898ad583/src/host/readDataDirect.cpp#L168-L184 And this is in `_DoGetConsoleInput`: https://github.com/microsoft/terminal/blob/d09fdd61cbb11b7ef2ccdd4820349ffe898ad583/src/host/directio.cpp#L226-L242 Either of those will crash if the number of `readEvents` is more than 1 greater than the `_eventReadCount`, and that is easy to achieve with a multibyte codepage. I can't see how the crash could be triggered with a single character, though, unless `ReadConsoleInput` was called with a much smaller record buffer.
Author
Owner

@KalleOlaviNiemitalo commented on GitHub (Dec 27, 2020):

If I understand correctly, InputBuffer::_ReadBuffer tries to predict how many UTF-16 input events fit in the specified number of OEM input events, but it assumes the OEM encoding uses at most two bytes per character: https://github.com/microsoft/terminal/blob/d09fdd61cbb11b7ef2ccdd4820349ffe898ad583/src/host/inputBuffer.cpp#L394-L405

According to https://github.com/microsoft/terminal/issues/7589#issuecomment-712492563, the estimated number of bytes per character even depends on the font.

Then, SplitToOem translates the UTF-16 input events to OEM input events. If the configured code page is UTF-8, then it uses CESU-8, which may need three bytes per UTF-16 code unit. _DoGetConsoleInput cannot return the excess events to its caller and does not know how to put them back to the input buffer, either.

If InputBuffer::_ReadBuffer supported UTF-8, then that would avoid the crash, I think. However, the resulting input events would still have CESU-8 rather than UTF-8, unless that is converted in some other layer that I cannot find now.

Also, I wonder if the StoreReadPartialByteSequence call should perhaps be skipped if IsPeek is true.

@KalleOlaviNiemitalo commented on GitHub (Dec 27, 2020): If I understand correctly, InputBuffer::_ReadBuffer tries to predict how many UTF-16 input events fit in the specified number of OEM input events, but it assumes the OEM encoding uses at most two bytes per character: <https://github.com/microsoft/terminal/blob/d09fdd61cbb11b7ef2ccdd4820349ffe898ad583/src/host/inputBuffer.cpp#L394-L405> According to <https://github.com/microsoft/terminal/issues/7589#issuecomment-712492563>, the estimated number of bytes per character even depends on the font. Then, SplitToOem translates the UTF-16 input events to OEM input events. If the configured code page is UTF-8, then it uses CESU-8, which may need three bytes per UTF-16 code unit. _DoGetConsoleInput cannot return the excess events to its caller and does not know how to put them back to the input buffer, either. If InputBuffer::_ReadBuffer supported UTF-8, then that would avoid the crash, I think. However, the resulting input events would still have CESU-8 rather than UTF-8, unless that is converted in some other layer that I cannot find now. Also, I wonder if the StoreReadPartialByteSequence call should perhaps be skipped if IsPeek is true.
Author
Owner

@amoldeshpande commented on GitHub (Dec 27, 2020):

dangit, I can't reproduce it any more with my simple piece of code.

The more involved repro consists of running code in this branch https://github.com/amoldeshpande/tcsh/tree/unicode under Windows Terminal or OpenConsole.exe and pasting the emoji. This is 100% still repro-able with font of "MesloLGM NF" . Have not tried any other fonts IIRC with this shell.

(Nerd Fonts from here https://github.com/ryanoasis/nerd-fonts/releases)

In answer to a previous question, I used an x64 debug build with standard US-EN keyboard layout.

I will keep trying to make my simple repro crash again meanwhile.

@amoldeshpande commented on GitHub (Dec 27, 2020): dangit, I can't reproduce it any more with my simple piece of code. The more involved repro consists of running code in this branch https://github.com/amoldeshpande/tcsh/tree/unicode under Windows Terminal or OpenConsole.exe and pasting the emoji. This is 100% still repro-able with font of "MesloLGM NF" . Have not tried any other fonts IIRC with this shell. (Nerd Fonts from here https://github.com/ryanoasis/nerd-fonts/releases) In answer to a previous question, I used an x64 debug build with standard US-EN keyboard layout. I will keep trying to make my simple repro crash again meanwhile.
Author
Owner

@j4james commented on GitHub (Dec 27, 2020):

My guess is that there was a mode change involved somewhere (i.e. SetConsoleMode) that persisted from another test case or app. Then when you restarted OpenConsole, the modes went back to the defaults, so the error couldn't be reproduced.

@j4james commented on GitHub (Dec 27, 2020): My guess is that there was a mode change involved somewhere (i.e. `SetConsoleMode`) that persisted from another test case or app. Then when you restarted OpenConsole, the modes went back to the defaults, so the error couldn't be reproduced.
Author
Owner

@amoldeshpande commented on GitHub (Dec 27, 2020):


Editing to update repro steps:

download tcsh-x64.exe from https://github.com/amoldeshpande/tcsh/releases/tag/20210103.10 

Launch tcsh.  Doesn't have to be in terminal.  Paste any emoji into it.  It will crash conhost.exe on Windows 10 2004 and OpenConsole.exe if you launch within Windows Terminal
@amoldeshpande commented on GitHub (Dec 27, 2020): ~~~Here's a link to the shell executable https://1drv.ms/u/s!AopJdbqzqHFho-J1E-4rzg4DiYBqBg?e=g5xZEZ launching this in Windows Terminal and pasting an emoji should crash OpenConsole. Font can be default, I think.~~~ Editing to update repro steps: download tcsh-x64.exe from https://github.com/amoldeshpande/tcsh/releases/tag/20210103.10 Launch tcsh. Doesn't have to be in terminal. Paste any emoji into it. It will crash conhost.exe on Windows 10 2004 and OpenConsole.exe if you launch within Windows Terminal
Author
Owner

@zadjii-msft commented on GitHub (Feb 26, 2021):

Okay since it looks like there's a repro and a pretty well identified piece of code responsible, I'm yanking triage. Looks like we didn't end up getting to this in 1.7, so I'm just gonna throw in with 2.0. Thanks everyone!

@zadjii-msft commented on GitHub (Feb 26, 2021): Okay since it looks like there's a repro and a pretty well identified piece of code responsible, I'm yanking triage. Looks like we didn't end up getting to this in 1.7, so I'm just gonna throw in with 2.0. Thanks everyone!
Author
Owner

@zadjii-msft commented on GitHub (Aug 24, 2021):

Hey uh, is anyone still hitting this? I can't get this to repro anymore as of main, in Terminal or in conhost, in cmd or powershell, in codepage 437, 936 or 65001. Maybe this got fixed in the last 9 months?

@zadjii-msft commented on GitHub (Aug 24, 2021): Hey uh, is anyone still hitting this? I can't get this to repro anymore as of `main`, in Terminal or in conhost, in cmd or powershell, in codepage 437, 936 or 65001. Maybe this got fixed in the last 9 months?
Author
Owner

@amoldeshpande commented on GitHub (Aug 24, 2021):

I can't even compile main because of missing wil/nt_result_macros.h or something, so I have no idea. I have terminal 1.9.1942.0 on Windows 10 and it still crashes 🤷

@amoldeshpande commented on GitHub (Aug 24, 2021): I can't even compile main because of missing `wil/nt_result_macros.h` or something, so I have no idea. I have terminal `1.9.1942.0` on Windows 10 and it still crashes 🤷
Author
Owner

@DHowett commented on GitHub (Aug 24, 2021):

You need to update your submodules. git submodule update. 😄

@DHowett commented on GitHub (Aug 24, 2021): You need to update your submodules. `git submodule update`. :smile:
Author
Owner

@j4james commented on GitHub (Aug 24, 2021):

I can still reproduce a crash on a recentish build of OpenConsole (90ff261c35), but as I mentioned above, my crash is in DirectReadData::Notify, which isn't exactly the same as the OP.

To reproduce, I'm using this little test app in a conhost cmd shell:

#include <Windows.h>

int main(int argc, char** argv)
{
    HANDLE ih = GetStdHandle(STD_INPUT_HANDLE);

    DWORD mode;
    GetConsoleMode(ih, &mode);
    SetConsoleMode(ih, mode | ENABLE_LINE_INPUT | ENABLE_ECHO_INPUT | ENABLE_PROCESSED_INPUT);

    SetConsoleCP(CP_UTF8);
    SetConsoleOutputCP(CP_UTF8);
    SetFileApisToANSI();

    while (true)
    {
        INPUT_RECORD inputRecord[100];
        DWORD inputCount;
        ReadConsoleInputA(ih, inputRecord, 100, &inputCount);
    }
}

And once it's running I paste a bunch of emojis like this:
😀💘💥👍🎅🦄🍄🌎🚖🚀💣😀💘💥👍🎅🦄🍄🌎🚖🚀💣😀💘💥👍🎅🦄🍄🌎🚖🚀💣😀💘💥👍🎅🦄🍄🌎🚖🚀💣😀💘💥👍🎅🦄🍄🌎🚖🚀💣

@j4james commented on GitHub (Aug 24, 2021): I can still reproduce a crash on a recentish build of OpenConsole (90ff261c35a1b3e50267576f5cbdc66733731101), but as I mentioned above, my crash is in `DirectReadData::Notify`, which isn't exactly the same as the OP. To reproduce, I'm using this little test app in a conhost cmd shell: ``` #include <Windows.h> int main(int argc, char** argv) { HANDLE ih = GetStdHandle(STD_INPUT_HANDLE); DWORD mode; GetConsoleMode(ih, &mode); SetConsoleMode(ih, mode | ENABLE_LINE_INPUT | ENABLE_ECHO_INPUT | ENABLE_PROCESSED_INPUT); SetConsoleCP(CP_UTF8); SetConsoleOutputCP(CP_UTF8); SetFileApisToANSI(); while (true) { INPUT_RECORD inputRecord[100]; DWORD inputCount; ReadConsoleInputA(ih, inputRecord, 100, &inputCount); } } ``` And once it's running I paste a bunch of emojis like this: 😀💘💥👍🎅🦄🍄🌎🚖⚓🚀⌚💣😀💘💥👍🎅🦄🍄🌎🚖⚓🚀⌚💣😀💘💥👍🎅🦄🍄🌎🚖⚓🚀⌚💣😀💘💥👍🎅🦄🍄🌎🚖⚓🚀⌚💣😀💘💥👍🎅🦄🍄🌎🚖⚓🚀⌚💣
Author
Owner

@amoldeshpande commented on GitHub (Aug 24, 2021):

after an hour of compiling it now claims I don't have a .NET SDK installed. I'm tired of this crap. You can close the bug if you like. The repro I gave is pretty simple (you can use any current release of tcsh) and if it doesn't crash, consider me satisfied.

@amoldeshpande commented on GitHub (Aug 24, 2021): after an hour of compiling it now claims I don't have a .NET SDK installed. I'm tired of this crap. You can close the bug if you like. The repro I gave is pretty simple (you can use any current release of tcsh) and if it doesn't crash, consider me satisfied.
Author
Owner

@zadjii-msft commented on GitHub (Feb 2, 2022):

Hmmmmmm. So it sounds like we kinda need to do a SplitToOem in InputBuffer::Read, before we even move the records to the Notify call. Or something like a pre-emptive scan in Read, like checking how many events each event will be translated into. I wonder how easy this would be to write a test for...

@zadjii-msft commented on GitHub (Feb 2, 2022): Hmmmmmm. So it sounds like we kinda need to do a `SplitToOem` in `InputBuffer::Read`, before we even move the records to the `Notify` call. Or something like a pre-emptive scan in `Read`, like checking how many events each event will be translated into. I wonder how easy this would be to write a test for...
Author
Owner

@zadjii-msft commented on GitHub (Feb 2, 2022):

Okay so I think InputBuffer::Read is only called in two places:

  • directio.cpp:189
  • readDataDirect.cpp:135

Both do the same thing. If called with !unicode, then they will call SplitToOem on the events read from the buffer, to expand them out to chars. So I think it's okay to pre-emptively trim the buffer in the !unicode case inside of IB::Read, because we know the caller will be expanding them.

Wow those two functions are really basically the same code aren't they...

@zadjii-msft commented on GitHub (Feb 2, 2022): Okay so I think `InputBuffer::Read` is only called in two places: * `directio.cpp:189` * `readDataDirect.cpp:135` Both do the same thing. If called with `!unicode`, then they will call `SplitToOem` on the events read from the buffer, to expand them out to chars. So I think it's okay to pre-emptively trim the buffer in the `!unicode` case inside of `IB::Read`, because we know the caller will be expanding them. Wow those two functions are really basically the same code aren't they...
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#11893