Feature Request: Character Set translation #14787

Closed
opened 2026-01-31 04:19:25 +00:00 by claunia · 6 comments
Owner

Originally created by @doggy8088 on GitHub (Aug 4, 2021).

Description of the new feature/enhancement

In PuTTY, there is a convenient feature called Character set translation. See below:

image

Sometimes we have to deal with some files or programs that can only deal with non-unicode characters. In Windows Terminal, it can only deal with UTF-8 character set. This limitation will lead us unable to deal with that kind of situations.

Proposed technical implementation details (optional)

Originally created by @doggy8088 on GitHub (Aug 4, 2021). <!-- 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨 I ACKNOWLEDGE THE FOLLOWING BEFORE PROCEEDING: 1. If I delete this entire template and go my own path, the core team may close my issue without further explanation or engagement. 2. If I list multiple bugs/concerns in this one issue, the core team may close my issue without further explanation or engagement. 3. If I write an issue that has many duplicates, the core team may close my issue without further explanation or engagement (and without necessarily spending time to find the exact duplicate ID number). 4. If I leave the title incomplete when filing the issue, the core team may close my issue without further explanation or engagement. 5. If I file something completely blank in the body, the core team may close my issue without further explanation or engagement. All good? Then proceed! --> # Description of the new feature/enhancement <!-- A clear and concise description of what the problem is that the new feature would solve. Describe why and how a user would use this new functionality (if applicable). --> In PuTTY, there is a convenient feature called **Character set translation**. See below: ![image](https://user-images.githubusercontent.com/88981/128149861-9fb4045c-39ae-4548-867b-2ee8e64149f9.png) Sometimes we have to deal with some files or programs that can only deal with non-unicode characters. In Windows Terminal, it can only deal with UTF-8 character set. This limitation will lead us unable to deal with that kind of situations. # Proposed technical implementation details (optional) <!-- A clear and concise description of what you want to happen. -->
Author
Owner

@DHowett commented on GitHub (Aug 5, 2021):

I'm curious about your specific use case. Windows Terminal uses the Windows console host to handle all of its translation, and that supports all of the codepages and data encodings that the Windows console has always supported. For example, this does work:

#include <windows.h>
int main() {
	SetConsoleOutputCP(932);
	WriteFile(GetStdHandle(STD_OUTPUT_HANDLE), "\xA6", 1, nullptr, nullptr);
	return 0;
}

image

0xA6 is the Shift-JIS encoding (notably: not UTF-8!) for that glyph.

@DHowett commented on GitHub (Aug 5, 2021): I'm curious about your specific use case. Windows Terminal uses the Windows console host to handle all of its translation, and that supports all of the codepages and data encodings that the Windows console has always supported. For example, this does work: ```cpp #include <windows.h> int main() { SetConsoleOutputCP(932); WriteFile(GetStdHandle(STD_OUTPUT_HANDLE), "\xA6", 1, nullptr, nullptr); return 0; } ``` ![image](https://user-images.githubusercontent.com/189190/128409629-4f077403-f9b7-46a5-941d-23a0b64d86e0.png) 0xA6 is the Shift-JIS encoding (notably: not UTF-8!) for that glyph.
Author
Owner

@DHowett commented on GitHub (Aug 5, 2021):

(Which application are you using that is not functioning properly without the ability to switch the encoding?)

@DHowett commented on GitHub (Aug 5, 2021): (Which application are you using that is not functioning properly without the ability to switch the encoding?)
Author
Owner

@doggy8088 commented on GitHub (Aug 6, 2021):

@DHowett I can reproduce the problem I met.

  1. Login into any Linux

    $ lsb_release -a
    No LSB modules are available.
    Distributor ID: Ubuntu
    Description:    Ubuntu 20.04.2 LTS
    Release:        20.04
    Codename:       focal
    
  2. Download a file encoded in Big5 charset.

    curl https://blog.miniasp.com/big5-example.txt -o big5-example.txt
    
  3. Simply cat it out. It can't display correctly.

    image

  4. If I change it to UTF-8, then all okay!

    image

  5. I also tried to change LANG and LC_ALL, still not working.

    image

@doggy8088 commented on GitHub (Aug 6, 2021): @DHowett I can reproduce the problem I met. 1. Login into any Linux ```sh $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.2 LTS Release: 20.04 Codename: focal ``` 2. Download a file encoded in Big5 charset. ```sh curl https://blog.miniasp.com/big5-example.txt -o big5-example.txt ``` 3. Simply `cat` it out. It can't display correctly. ![image](https://user-images.githubusercontent.com/88981/128450027-f516da51-12f5-4265-9c2e-1cafb98b659a.png) 4. If I change it to UTF-8, then all okay! ![image](https://user-images.githubusercontent.com/88981/128450003-2a91764c-5f7f-4ccd-9b40-c53bf7183c58.png) 5. I also tried to change `LANG` and `LC_ALL`, still not working. ![image](https://user-images.githubusercontent.com/88981/128450198-aad46e82-d478-4126-ac9d-d360dfe0fd33.png)
Author
Owner

@doggy8088 commented on GitHub (Aug 6, 2021):

I also tried luit to translate big5 to utf8, but it looks weird. For the same command, I can run correctly from x-terminal in elementary OS. It probably Windows Terminal's issue.

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.2 LTS
Release:        20.04
Codename:       focal
$ locale -a
C
C.UTF-8
POSIX
en_US.utf8
zh_HK.utf8
zh_TW
zh_TW.big5
zh_TW.utf8
$ cat big5-example.txt | iconv -f big5 -t utf8
測試
$ luit -encoding big5 cat big5-example.txt
4z8U
$ LC_ALL=zh_TW.big5 luit cat big5-example.txt
4z8U
$ LC_ALL=zh_TW.big5 luit -encoding big5 cat big5-example.txt
4z8U

image

@doggy8088 commented on GitHub (Aug 6, 2021): I also tried `luit` to translate big5 to utf8, but it looks weird. For the same command, I can run correctly from `x-terminal` in elementary OS. It probably Windows Terminal's issue. ```sh $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.2 LTS Release: 20.04 Codename: focal $ locale -a C C.UTF-8 POSIX en_US.utf8 zh_HK.utf8 zh_TW zh_TW.big5 zh_TW.utf8 $ cat big5-example.txt | iconv -f big5 -t utf8 測試 $ luit -encoding big5 cat big5-example.txt 4z8U $ LC_ALL=zh_TW.big5 luit cat big5-example.txt 4z8U $ LC_ALL=zh_TW.big5 luit -encoding big5 cat big5-example.txt 4z8U ``` ![image](https://user-images.githubusercontent.com/88981/128459971-0816d206-02f7-40f6-9c22-82c4fd69b957.png)
Author
Owner

@zadjii-msft commented on GitHub (Aug 10, 2023):

Hey I'm trying to reconcile this with #1802, #15678, et. al. I feel like we may have missed an important note:

Login into any Linux

how are you logging in? WSL? ssh.exe? ssh in WSL? Sommething else?

@zadjii-msft commented on GitHub (Aug 10, 2023): Hey I'm trying to reconcile this with #1802, #15678, et. al. I feel like we may have missed an important note: > Login into any Linux _how_ are you logging in? WSL? `ssh.exe`? `ssh` in WSL? Sommething else?
Author
Owner

@microsoft-github-policy-service[bot] commented on GitHub (Aug 27, 2023):

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment.

@microsoft-github-policy-service[bot] commented on GitHub (Aug 27, 2023): This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for **4 days**. It will be closed if no further activity occurs **within 3 days of this comment**.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#14787