Special Unicode glyphs #23350

Open
opened 2026-01-31 08:39:48 +00:00 by claunia · 5 comments
Owner

Originally created by @Pepsiman-12 on GitHub (Jun 10, 2025).

Screenshot_٢٠٢٥-٠٦-١٠-٢١-٤٥-٠٢-٧٩٠_com.whatsapp~2.jpg

What are the Unicode codepoints of these 2 weird looking glyphs (Both halves of the Arabic lam alif ligature)?
And also, do these glyphs have a stylistic set?

Originally created by @Pepsiman-12 on GitHub (Jun 10, 2025). ![Screenshot_٢٠٢٥-٠٦-١٠-٢١-٤٥-٠٢-٧٩٠_com.whatsapp~2.jpg](https://github.com/user-attachments/assets/0a8f221f-1cd7-4f30-a895-fc518a62d14a) What are the Unicode codepoints of these 2 weird looking glyphs (Both halves of the Arabic lam alif ligature)? And also, do these glyphs have a stylistic set?
claunia added the Area-OutputIssue-BugProduct-TerminalPriority-2 labels 2026-01-31 08:39:48 +00:00
Author
Owner

@DHowett commented on GitHub (Jun 10, 2025):

It would be helpful if you could copy/paste them as actual text here, rather than a screenshot 🙂

@DHowett commented on GitHub (Jun 10, 2025): It would be helpful if you could copy/paste them as actual text here, rather than a screenshot 🙂
Author
Owner

@Pepsiman-12 commented on GitHub (Jun 11, 2025):

The text in the screenshot says لآ لإ لأ لا.

I think these glyphs are rendered in some specific stylistic set, right?

@Pepsiman-12 commented on GitHub (Jun 11, 2025): The text in the screenshot says لآ لإ لأ لا. I think these glyphs are rendered in some specific stylistic set, right?
Author
Owner

@lhecker commented on GitHub (Jun 11, 2025):

@DHowett can correct me on that, but this appears to be the Cascadia Code style for Arabic. The problem is that we assign 1 cell for each grapheme in the ل glyph cluster. It's supposed to overlap like this (a screenshot from Chromium):

Image

If you compare this with Windows Terminal, you can see where the rendering is coming from:

Image

This is a difficult problem because there's no Unicode specification yet for how to handle widths of Arabic clusters. I checked the Rust unicode-rs crate (maintained by Manishearth who works on i18n at Google (= way more knowledgeable than us, I'm sure)), and found that it assigns 1 column to the glyph cluster.

As such I'll consider this a bug in our implementation. Let's see if I can do something about that...

@lhecker commented on GitHub (Jun 11, 2025): @DHowett can correct me on that, but this appears to be the Cascadia Code style for Arabic. The problem is that we assign 1 cell for each grapheme in the `ل` glyph cluster. It's supposed to overlap like this (a screenshot from Chromium): ![Image](https://github.com/user-attachments/assets/12dc8b80-3f38-41ea-8a7c-821d35e4e903) If you compare this with Windows Terminal, you can see where the rendering is coming from: ![Image](https://github.com/user-attachments/assets/0b2f25eb-daa3-490b-be6d-a1b3438ef90d) This is a difficult problem because there's no Unicode specification yet for how to handle widths of Arabic clusters. I checked the [Rust `unicode-rs` crate](https://github.com/unicode-rs/unicode-width) (maintained by Manishearth who works on i18n at Google (= way more knowledgeable than us, I'm sure)), and found that it assigns 1 column to the glyph cluster. As such I'll consider this a bug in our implementation. Let's see if I can do something about that...
Author
Owner

@lhecker commented on GitHub (Jun 11, 2025):

Hmm this is not possible to fix right now. This is because Lam-Alef ligatures like this consist of 2 graphemes that occupy 1 column. If you backspace over such a ligature the cursor would (should?) currently move by 0 columns in the terminal, which is wholly unexpected.

Does this mean that the cursor should sometimes move by 2 grapheme clusters? But if we do that, how should e.g. "cooked read" in CMD behave? If you press left/right arrow in front of this ligature, should the cursor move by 1 grapheme and visually not move at all or always move by 2 graphemes?

Edit: The conclusion of our discussion is that in the context of terminals, grapheme clusters have a min. width of 1. As such, the Lam-Alef ligature should form 1 "grapheme cluster" from our POV.

@lhecker commented on GitHub (Jun 11, 2025): Hmm this is not possible to fix right now. This is because Lam-Alef ligatures like this consist of 2 graphemes that occupy 1 column. If you backspace over such a ligature the cursor would (should?) currently move by 0 columns in the terminal, which is wholly unexpected. Does this mean that the cursor should sometimes move by 2 grapheme clusters? But if we do that, how should e.g. "cooked read" in CMD behave? If you press left/right arrow in front of this ligature, should the cursor move by 1 grapheme and visually not move at all or always move by 2 graphemes? Edit: The conclusion of our discussion is that in the context of terminals, grapheme clusters have a min. width of 1. As such, the Lam-Alef ligature should form 1 "grapheme cluster" from our POV.
Author
Owner

@Pepsiman-12 commented on GitHub (Jun 16, 2025):

@DHowett can correct me on that, but this appears to be the Cascadia Code style for Arabic. The problem is that we assign 1 cell for each grapheme in the ل glyph cluster. It's supposed to overlap like this (a screenshot from Chromium):

Image

If you compare this with Windows Terminal, you can see where the rendering is coming from:

Image

This is a difficult problem because there's no Unicode specification yet for how to handle widths of Arabic clusters. I checked the Rust unicode-rs crate (maintained by Manishearth who works on i18n at Google (= way more knowledgeable than us, I'm sure)), and found that it assigns 1 column to the glyph cluster.

As such I'll consider this a bug in our implementation. Let's see if I can do something about that...

Oh yeah... You're right

@Pepsiman-12 commented on GitHub (Jun 16, 2025): > @DHowett can correct me on that, but this appears to be the Cascadia Code style for Arabic. The problem is that we assign 1 cell for each grapheme in the `ل` glyph cluster. It's supposed to overlap like this (a screenshot from Chromium): > > ![Image](https://github.com/user-attachments/assets/12dc8b80-3f38-41ea-8a7c-821d35e4e903) > > If you compare this with Windows Terminal, you can see where the rendering is coming from: > > ![Image](https://github.com/user-attachments/assets/0b2f25eb-daa3-490b-be6d-a1b3438ef90d) > > This is a difficult problem because there's no Unicode specification yet for how to handle widths of Arabic clusters. I checked the [Rust `unicode-rs` crate](https://github.com/unicode-rs/unicode-width) (maintained by Manishearth who works on i18n at Google (= way more knowledgeable than us, I'm sure)), and found that it assigns 1 column to the glyph cluster. > > As such I'll consider this a bug in our implementation. Let's see if I can do something about that... Oh yeah... You're right
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#23350