Some hyphen ligatures break #5383

Closed
opened 2026-01-31 00:12:09 +00:00 by claunia · 7 comments
Owner

Originally created by @MythreyaK on GitHub (Dec 4, 2019).

Environment

Windows build number: 10.0.18363.0
Windows Terminal version (if applicable): 0.7.3291.0

Steps to reproduce

  1. Use a monospace font with ligatures. I used Fira Code.
  2. Type these characters, mixing up the order -> --> <- <-- <!---
  3. I can't exactly out the pattern that breaks some ligatures, but the one sequence in the image should work 100% of the time (well at the least here on my system).
    image

Expected behavior

I think all the ligatures should be rendered the same way, since other ligature based issue suggest that the prompt 'doesn't know' the meaning of the text displayed.

Actual behavior

Refer to the image above.

Originally created by @MythreyaK on GitHub (Dec 4, 2019). <!-- 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨 I ACKNOWLEDGE THE FOLLOWING BEFORE PROCEEDING: 1. If I delete this entire template and go my own path, the core team may close my issue without further explanation or engagement. 2. If I list multiple bugs/concerns in this one issue, the core team may close my issue without further explanation or engagement. 3. If I write an issue that has many duplicates, the core team may close my issue without further explanation or engagement (and without necessarily spending time to find the exact duplicate ID number). 4. If I leave the title incomplete when filing the issue, the core team may close my issue without further explanation or engagement. 5. If I file something completely blank in the body, the core team may close my issue without further explanation or engagement. All good? Then proceed! --> <!-- This bug tracker is monitored by Windows Terminal development team and other technical folks. **Important: When reporting BSODs or security issues, DO NOT attach memory dumps, logs, or traces to Github issues**. Instead, send dumps/traces to secure@microsoft.com, referencing this GitHub issue. If this is an application crash, please also provide a Feedback Hub submission link so we can find your diagnostic data on the backend. Use the category "Apps > Windows Terminal (Preview)" and choose "Share My Feedback" after submission to get the link. Please use this form and describe your issue, concisely but precisely, with as much detail as possible. --> # Environment ```none Windows build number: 10.0.18363.0 Windows Terminal version (if applicable): 0.7.3291.0 ``` # Steps to reproduce 1. Use a monospace font with ligatures. I used Fira Code. 2. Type these characters, mixing up the order `->` `-->` `<-` `<--` `<!---` 3. I can't exactly out the pattern that breaks some ligatures, but the one sequence in the image should work 100% of the time (well at the least here on my system). ![image](https://user-images.githubusercontent.com/26112391/70174931-d555e500-16fb-11ea-93c1-1dfca37ea3cf.png) # Expected behavior <!-- A description of what you're expecting, possibly containing screenshots or reference material. --> I _think_ all the ligatures should be rendered the same way, since other ligature based issue suggest that the prompt 'doesn't know' the meaning of the text displayed. # Actual behavior <!-- What's actually happening? --> Refer to the image above.
claunia added the Needs-Tag-Fix label 2026-01-31 00:12:09 +00:00
Author
Owner

@DHowett-MSFT commented on GitHub (Dec 4, 2019):

Oh, this is because the characters change colors in the middle of the ligated section. That’s on purpose!

@DHowett-MSFT commented on GitHub (Dec 4, 2019): Oh, this is because the characters change colors in the middle of the ligated section. That’s on purpose!
Author
Owner

@MythreyaK commented on GitHub (Dec 4, 2019):

Oh okay! My apologies, didn't know that!

@MythreyaK commented on GitHub (Dec 4, 2019): Oh okay! My apologies, didn't know that!
Author
Owner

@MythreyaK commented on GitHub (Dec 14, 2019):

I have an unrelated noob question if I may; where is the text being shaped and rendered, the raw text being interpreted as UTF-8 where required and then drawing these to the screen with ligature logic and VT-sequence based foreground/background? I'm thinking, after looking at the project structure descriprion, all this is completely handled by the renderer projects.

Also, does DirectWrite internally handle the ligature logic, text shape and layout logic completely?

@MythreyaK commented on GitHub (Dec 14, 2019): I have an unrelated noob question if I may; where is the text being shaped and rendered, the raw text being interpreted as UTF-8 where required and then drawing these to the screen with ligature logic and VT-sequence based foreground/background? I'm thinking, after looking at the project structure descriprion, all this is completely handled by the renderer projects. Also, does DirectWrite internally handle the ligature logic, text shape and layout logic completely?
Author
Owner

@DHowett-MSFT commented on GitHub (Dec 21, 2019):

@MythreyaK sorry for waiting so long! So, the text is being entirely shaped and rendered by DirectX in the Terminal. What we're really doing is this:

  1. Text comes in as UTF-8 with VT sequences (consider \e[31mHello \e[32mWorld\e[1m!\e[m)
  2. We transform that UTF-8 to UTF-16 (to fit our internal machinery, which we inherited from the Windows Console)
  3. We parse the VT and dispatch a bunch of callbacks (SetGraphicsRendition(31), PrintString("Hello"), etc.)
  4. Those callbacks produce a text buffer filled with attributed text. This attributed text looks like...
Hello World!
|     |    |` Normal, length 108
|     |    |
|     |    ` Green + Bold, length 1
|     ` Green, length 5
`Red, length 6

or more concisely,

Hello World!
r6g5gb1n108
  1. That text buffer is used as the primary internal representation of the text. It's very fast to manipulate and doesn't use much memory because it's run-length encoded. (One downside, though, is that it is cell-based: we traditionally consider one character as something that fits in either one or two cells. See #1472 for more discussion there.)

  2. At some later point, the renderer accumulates damaged regions and queries their contents. It takes these region contents as a bunch of attributed text runs and accumulates them into "clusters"

  3. Those clusters are somewhat of an optimization: the attributed text is stored as rows of text and rows of attributes, disconnected from eachother. The clusters, however, are packed differently: ("Hello ", red), ("World", green), ("!", green, bold) (normal until EOL is discarded or cleared, not rendered as spaces). They're harder to manipulate as character strings.

  4. Those clusters are passed to the various rendering engines.

    • The DX renderer takes them, analyzes them, shapes them and draws them.
    • The GDI renderer (legacy console) takes them, blats them into a PolyTextOut call and lets the kernel deal with it.
    • The VT renderer (which we use in the pseudoconsole host) takes them and undoes steps 7 through 2 in reverse order and turns them back into VT sequences.

Eventually, clusters will be more fine-grained: a transition between double-width CJK characters and standard ASCII characters could mark a "complex" to "simple" boundary and let us exhibit finer control over DirectWrite. Cells that eventually contain grapheme clusters composed of multiple characters will eventually be a single "Cluster" and therefore rendered to a single known quantity of cells.

Right now, a lot of our issues with combining characters comes from the fact that the buffer doesn't know when to glue characters together and therefore the renderer doesn't know when they have been glued together.

In addition, because we're applying shaping to the glyphs after dwrite has measured and generated sizing info for them, we occasionally mess up text rendering after ligatures. There's another workitem lying around for that one. 😄

Hope that helps!

@DHowett-MSFT commented on GitHub (Dec 21, 2019): @MythreyaK sorry for waiting so long! So, the text is being entirely shaped and rendered by DirectX in the Terminal. What we're really doing is this: 1. Text comes in as UTF-8 with VT sequences (consider `\e[31mHello \e[32mWorld\e[1m!\e[m`) 2. We transform that UTF-8 to UTF-16 (to fit our internal machinery, which we inherited from the Windows Console) 3. We parse the VT and dispatch a bunch of callbacks (`SetGraphicsRendition(31)`, `PrintString("Hello")`, etc.) 4. Those callbacks produce a text buffer filled with attributed text. This attributed text looks like... ``` Hello World! | | |` Normal, length 108 | | | | | ` Green + Bold, length 1 | ` Green, length 5 `Red, length 6 ``` or more concisely, ``` Hello World! r6g5gb1n108 ``` 5. That text buffer is used as the primary internal representation of the text. It's very fast to manipulate and doesn't use much memory because it's run-length encoded. _(One downside, though, is that it *is* cell-based: we traditionally consider one character as something that fits in either one or two cells. See #1472 for more discussion there.)_ 6. At some later point, the renderer accumulates damaged regions and queries their contents. It takes these region contents as a bunch of attributed text runs and accumulates them into "clusters" 7. Those clusters are somewhat of an optimization: the attributed text is stored as rows of text and rows of attributes, disconnected from eachother. The clusters, however, are packed differently: `("Hello ", red), ("World", green), ("!", green, bold)` (normal until EOL is discarded or cleared, not rendered as spaces). They're harder to manipulate as character strings. 8. Those clusters are passed to the various rendering engines. * The DX renderer takes them, analyzes them, shapes them and draws them. * The GDI renderer (legacy console) takes them, blats them into a `PolyTextOut` call and lets the kernel deal with it. * The VT renderer (which we use in the pseudoconsole host) takes them and undoes steps 7 through 2 in reverse order and turns them back into VT sequences. Eventually, clusters will be more fine-grained: a transition between double-width CJK characters and standard ASCII characters could mark a "complex" to "simple" boundary and let us exhibit finer control over DirectWrite. Cells that eventually contain grapheme clusters composed of multiple characters will eventually be a single "Cluster" and therefore rendered to a single known quantity of cells. Right now, a lot of our issues with combining characters comes from the fact that the buffer doesn't know when to glue characters together and therefore the renderer doesn't know when they _have_ been glued together. In addition, because we're applying shaping to the glyphs after dwrite has measured and generated sizing info for them, we occasionally mess up text rendering after ligatures. There's another workitem lying around for that one. :smile: Hope that helps!
Author
Owner

@MythreyaK commented on GitHub (Dec 21, 2019):

It's okay, I understand it can get hectic at times!

Thank you very much for the detailed response! It really puts into perspective the organization of the codebase and how it works! I realized text rendering isn't as simple as I thought it was! Would this be a right place for any further queries I might have?

@MythreyaK commented on GitHub (Dec 21, 2019): It's okay, I understand it can get hectic at times! Thank you very much for the detailed response! It really puts into perspective the organization of the codebase and how it works! I realized text rendering isn't as simple as I thought it was! Would this be a right place for any further queries I might have?
Author
Owner

@DHowett-MSFT commented on GitHub (Dec 21, 2019):

I’m happy to answer here, but responses may be slow on account of the holidays over here.

@DHowett-MSFT commented on GitHub (Dec 21, 2019): I’m happy to answer here, but responses may be slow on account of the holidays over here.
Author
Owner

@MythreyaK commented on GitHub (Dec 21, 2019):

That's okay! Thank you very much! Merry Christmas and a Happy new year!

@MythreyaK commented on GitHub (Dec 21, 2019): That's okay! Thank you very much! Merry Christmas and a Happy new year!
Sign in to join this conversation.
No Label Needs-Tag-Fix
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#5383