Pasting Unicode characters in Cygwin Bash loses the first element of the surrogate pair #11455

Open
opened 2026-01-31 02:48:07 +00:00 by claunia · 0 comments
Owner

Originally created by @Mariusz-W on GitHub (Nov 17, 2020).

GNU bash, version 4.4.12(3)-release (x86_64-unknown-cygwin)
LC_CTYPE=en_US.UTF-8

Windows Terminal
Version: 1.4.3141.0

Attempting to paste, with the mouse right button or with Ctrl-Shift-V, a Unicode character from the Unicode Supplemental Planes in Cygwin bash shell has the following effect.

Instead of pasting the character, it pastes the 2nd element of the surrogate pair representing that character in the UTF-16BE encoding.

Example: attempting to paste 🀄 it pastes instead �

Character 🀄 has the Unicode code point: 1F004
UTF-16BE encodes 🀄 as the surrogate pair: D83C DC04

I see the 2nd surrogate element directly, when I paste the character within Cygwin ‘vim’.

I see the 2nd surrogate element re-encoded as the corresponding UTF-8 three bytes sequence, when I paste the character into the Cygwin bash command line. In the above example that happens to be the sequence : EF BF BD .

This suggest that the problem is caused by the fact that the 1st element of the surrogate pair is not passed by the WindowsTerminal to Cygwin Bash along with the 2nd element. If both elements were passed, the problem would likely go away.

Originally created by @Mariusz-W on GitHub (Nov 17, 2020). GNU bash, version 4.4.12(3)-release (x86_64-unknown-cygwin) LC_CTYPE=en_US.UTF-8 Windows Terminal Version: 1.4.3141.0 Attempting to paste, with the mouse right button or with Ctrl-Shift-V, a Unicode character from the Unicode Supplemental Planes in Cygwin bash shell has the following effect. Instead of pasting the character, it pastes the 2nd element of the surrogate pair representing that character in the UTF-16BE encoding. Example: attempting to paste 🀄 it pastes instead � Character 🀄 has the Unicode code point: 1F004 UTF-16BE encodes 🀄 as the surrogate pair: D83C DC04 I see the 2nd surrogate element directly, when I paste the character within Cygwin ‘vim’. I see the 2nd surrogate element re-encoded as the corresponding UTF-8 three bytes sequence, when I paste the character into the Cygwin bash command line. In the above example that happens to be the sequence : EF BF BD . This suggest that the problem is caused by the fact that the 1st element of the surrogate pair is not passed by the WindowsTerminal to Cygwin Bash along with the 2nd element. If both elements were passed, the problem would likely go away.
claunia added the Resolution-ExternalNeeds-Tag-FixNeeds-Attention labels 2026-01-31 02:48:07 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#11455