Issue with Kanji #13772

Closed
opened 2026-01-31 03:51:41 +00:00 by claunia · 11 comments
Owner

Originally created by @LexPrima on GitHub (May 10, 2021).

Windows Terminal version (or Windows build number)

1.7.1033.0

Other Software

No response

Steps to reproduce

Screenshot 2021-05-10 181248

This file name, track 36, contains Kanji. OneDrive shows an error but displays the kanji. Explorer doesn't show the kanjis at all. cmd.exe (top left) shows with placeholder and Terminal (boton left) does something really weird. Looks like it kinda fuses its name with Track 35.

Expected Behavior

Showing the actual filename

Actual Behavior

Behaving weird. Pic related

Originally created by @LexPrima on GitHub (May 10, 2021). ### Windows Terminal version (or Windows build number) 1.7.1033.0 ### Other Software _No response_ ### Steps to reproduce ![Screenshot 2021-05-10 181248](https://user-images.githubusercontent.com/12571344/117690947-adee2c00-b1bb-11eb-9242-011b8e639933.jpg) This file name, track 36, contains Kanji. OneDrive shows an error but displays the kanji. Explorer doesn't show the kanjis at all. cmd.exe (top left) shows with placeholder and Terminal (boton left) does something really weird. Looks like it kinda fuses its name with Track 35. ### Expected Behavior Showing the actual filename ### Actual Behavior Behaving weird. Pic related
Author
Owner

@zadjii-msft commented on GitHub (May 11, 2021):

Can you copy and paste that actual character here into github?

How are you trying to output the character to the console/terminal? presumably with dir?

What's the output of chcp?

@zadjii-msft commented on GitHub (May 11, 2021): Can you copy and paste that actual character here into github? How are you trying to output the character to the console/terminal? presumably with `dir`? What's the output of `chcp`?
Author
Owner

@LexPrima commented on GitHub (May 11, 2021):

It's actually not kanji. It was a Japanese song, so I wrongly assumed it be kanji. They are U+0090 and U+008D

And yes output is dir.

chcp outputs: Aktive Codepage: 850.

@LexPrima commented on GitHub (May 11, 2021): It's actually not kanji. It was a Japanese song, so I wrongly assumed it be kanji. They are U+0090 and U+008D And yes output is dir. chcp outputs: Aktive Codepage: 850.
Author
Owner

@j4james commented on GitHub (May 11, 2021):

This looks like a variation of #4363, only in this case it's C1 control characters. If a control character finds its way into the text buffer then we need to convert it into a printable glyph before passing it over conpty. Otherwise it's going to end up being interpreted by the VT engine on the other end of the pipe, which is not what we want.

@j4james commented on GitHub (May 11, 2021): This looks like a variation of #4363, only in this case it's C1 control characters. If a control character finds its way into the text buffer then we need to convert it into a printable glyph before passing it over conpty. Otherwise it's going to end up being interpreted by the VT engine on the other end of the pipe, which is not what we want.
Author
Owner

@DHowett commented on GitHub (May 13, 2021):

It's curious that we're not emitting this through conpty as a C1 control character and not in fully-realized UTF-8. 🤔

Can you capture the output to a file, and share that with us?

dir > blah.txt
type blah.txt
REM - ensure that it still reproduces in the copy^ :)
@DHowett commented on GitHub (May 13, 2021): It's curious that we're not emitting this through conpty as a C1 control character and not in fully-realized UTF-8. 🤔 Can you capture the output to a file, and share that with us? ``` dir > blah.txt type blah.txt REM - ensure that it still reproduces in the copy^ :) ```
Author
Owner

@DHowett commented on GitHub (May 13, 2021):

We're wondering whether the filenames are encoded in SHIFT-JIS (and that's why OneDrive's failing and has replaced them with an unexpected character?)

@DHowett commented on GitHub (May 13, 2021): We're wondering whether the filenames are encoded in SHIFT-JIS (and that's why OneDrive's failing and has replaced them with an unexpected character?)
Author
Owner

@LexPrima commented on GitHub (May 13, 2021):

output.txt

@LexPrima commented on GitHub (May 13, 2021): [output.txt](https://github.com/microsoft/terminal/files/6474788/output.txt)
Author
Owner

@j4james commented on GitHub (May 13, 2021):

It looks like those control characters have just been converted to question marks when piped to the file. However, this is really quite easy to reproduce. If you open a WSL bash shell, you can do something like this to create an equivalent file:

echo Test > $'\u0090Shinjitsu no Shi \u008D(TV-Size).flac'

Then go and view that file from a cmd shell in conhost and Windows Terminal.

In conhost, the C1 controls are displayed as error glyphs, and in Windows Terminal they are interpreted. The U+0090 is the start of a DCS sequence, so that eats all the following characters up to the U+008D. And U+008D is a Reverse Index control, which moves the cursor up to the previous line. You then end up overwriting the start of the previous filename with the remaining characters - (TV-Size).flac.

@j4james commented on GitHub (May 13, 2021): It looks like those control characters have just been converted to question marks when piped to the file. However, this is really quite easy to reproduce. If you open a WSL bash shell, you can do something like this to create an equivalent file: ```echo Test > $'\u0090Shinjitsu no Shi \u008D(TV-Size).flac'``` Then go and view that file from a cmd shell in conhost and Windows Terminal. In conhost, the C1 controls are displayed as error glyphs, and in Windows Terminal they are interpreted. The `U+0090` is the start of a DCS sequence, so that eats all the following characters up to the `U+008D`. And `U+008D` is a _Reverse Index_ control, which moves the cursor up to the previous line. You then end up overwriting the start of the previous filename with the remaining characters - `(TV-Size).flac`.
Author
Owner

@j4james commented on GitHub (May 13, 2021):

It's curious that we're not emitting this through conpty as a C1 control character and not in fully-realized UTF-8. 🤔

I'm assuming it is converted to UTF-8, but that doesn't change its meaning. It probably starts out as a single byte value in code page 850 (e.g. 0x90), is converted to a two byte UCS-2 value (0x00 0x90) when read in by an ANSI API and stored in the text buffer, and finally converted to a UTF-8 sequence (0xC2 0x90) when written out to the conpty pipe. No matter what the encoding, though, it's still the same value. And on the other side of the pipe, that UTF-8 sequence is just going to be converted back to the UCS-2 code point and interpreted as the C1 control character U+0090.

@j4james commented on GitHub (May 13, 2021): > It's curious that we're not emitting this through conpty as a C1 control character and not in fully-realized UTF-8. 🤔 I'm assuming it is converted to UTF-8, but that doesn't change its meaning. It probably starts out as a single byte value in code page 850 (e.g. `0x90`), is converted to a two byte UCS-2 value (`0x00 0x90`) when read in by an ANSI API and stored in the text buffer, and finally converted to a UTF-8 sequence (`0xC2 0x90`) when written out to the conpty pipe. No matter what the encoding, though, it's still the same value. And on the other side of the pipe, that UTF-8 sequence is just going to be converted back to the UCS-2 code point and interpreted as the C1 control character `U+0090`.
Author
Owner

@zadjii-msft commented on GitHub (May 14, 2021):

Okay, great explanation! This sounds exactly like #4363 to me then!

/dup #4363

@zadjii-msft commented on GitHub (May 14, 2021): Okay, great explanation! This sounds exactly like #4363 to me then! /dup #4363
Author
Owner

@ghost commented on GitHub (May 14, 2021):

Hi! We've identified this issue as a duplicate of another one that already exists on this Issue Tracker. This specific instance is being closed in favor of tracking the concern over on the referenced thread. Thanks for your report!

@ghost commented on GitHub (May 14, 2021): Hi! We've identified this issue as a duplicate of another one that already exists on this Issue Tracker. This specific instance is being closed in favor of tracking the concern over on the referenced thread. Thanks for your report!
Author
Owner

@ghost commented on GitHub (Feb 3, 2022):

:tada:This issue was addressed in #11690, which has now been successfully released as Windows Terminal Preview v1.13.10336.0.🎉

Handy links:

@ghost commented on GitHub (Feb 3, 2022): :tada:This issue was addressed in #11690, which has now been successfully released as `Windows Terminal Preview v1.13.10336.0`.:tada: Handy links: * [Release Notes](https://github.com/microsoft/terminal/releases/tag/v1.13.10336.0) * [Store Download](https://www.microsoft.com/store/apps/9n8g5rfz9xk3?cid=storebadge&ocid=badge)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#13772