The text layout code is expecting the quantity of glyphs is equal to the cells allocated for each line #4957

Closed
opened 2026-01-31 00:01:32 +00:00 by claunia · 17 comments
Owner

Originally created by @reli-msft on GitHub (Nov 12, 2019).

The issue is found in CustomTextLayout:

Instead of carefully process the clusters input, CustomTextLayout drops nearly all the information from clusters, and only keeping how many columns a cluster occupy, and later it is expecting that the glyph quantity equals to the quantity of clusters. This may lead to incorrect rendering result for many cases (complex script, ligatures, etc.)

Bug #610, #696 may be caused by this.

Originally created by @reli-msft on GitHub (Nov 12, 2019). The issue is found in `CustomTextLayout`: Instead of carefully process the `clusters` input, `CustomTextLayout` drops nearly all the information from `clusters`, and only keeping how many columns a cluster occupy, and later it is expecting that the glyph quantity equals to the quantity of clusters. This may lead to incorrect rendering result for many cases (complex script, ligatures, etc.) Bug #610, #696 may be caused by this.
claunia added the Area-RenderingIssue-BugNeeds-Tag-FixProduct-TerminalPriority-1 labels 2026-01-31 00:01:32 +00:00
Author
Owner

@reli-msft commented on GitHub (Nov 13, 2019):

Running the following script in Node.JS shows a lot of problems in current text layout code:

console.log(`
Combining Diacritics    | [a\u0300\u0304]
Polytonic Greek         | [ᾊ] [Α\u0313\u0300\u0345]
Ideographic Variation   | [東京都葛\u{E0101}飾区] [奈良県葛\u{E0100}城市]
Composite Hangul        | [\u1100\u116A\u11B3]
Combining Dakuten       | [か\u309A]
Emoji vs Text           | [\u2602] [\u2602\uFE0E] [\u2602\uFE0F]
Emoji                   | [\u{1F469}]
Emoji with Skin Tone    | [\u{1F469}\u{1F3FB}]
Emoji Variation and ZWJ | [\u{1F469}\u{1F3FB}\u200D\u{1F4BB}]
`)

Terminal results:
image

Chrome Dev Tools results:
image

Edge F12 results:
image

Word results:
image

@reli-msft commented on GitHub (Nov 13, 2019): Running the following script in Node.JS shows _a lot of_ problems in current text layout code: ```javascript console.log(` Combining Diacritics | [a\u0300\u0304] Polytonic Greek | [ᾊ] [Α\u0313\u0300\u0345] Ideographic Variation | [東京都葛\u{E0101}飾区] [奈良県葛\u{E0100}城市] Composite Hangul | [\u1100\u116A\u11B3] Combining Dakuten | [か\u309A] Emoji vs Text | [\u2602] [\u2602\uFE0E] [\u2602\uFE0F] Emoji | [\u{1F469}] Emoji with Skin Tone | [\u{1F469}\u{1F3FB}] Emoji Variation and ZWJ | [\u{1F469}\u{1F3FB}\u200D\u{1F4BB}] `) ``` Terminal results: ![image](https://user-images.githubusercontent.com/50213618/68721821-8e475a80-0568-11ea-8fd3-55134cf860f6.png) Chrome Dev Tools results: ![image](https://user-images.githubusercontent.com/50213618/68721842-9c957680-0568-11ea-8676-d2725b9a6980.png) Edge F12 results: ![image](https://user-images.githubusercontent.com/50213618/68721864-a9b26580-0568-11ea-89e7-d554465ec848.png) Word results: ![image](https://user-images.githubusercontent.com/50213618/68722050-28a79e00-0569-11ea-86fa-57efb5c873b3.png)
Author
Owner

@skyline75489 commented on GitHub (Nov 13, 2019):

Possibly also #2191

@skyline75489 commented on GitHub (Nov 13, 2019): Possibly also #2191
Author
Owner

@skyline75489 commented on GitHub (Nov 13, 2019):

Let me add Firefox result here, so that people don't forget that there's a browser out there called "Firefox":

图片

@skyline75489 commented on GitHub (Nov 13, 2019): Let me add Firefox result here, so that people don't forget that there's a browser out there called "Firefox": ![图片](https://user-images.githubusercontent.com/4710575/68726167-a0bea400-05fb-11ea-9e66-29e9b9b1709c.png)
Author
Owner

@DHowett-MSFT commented on GitHub (Nov 13, 2019):

@skyline75489 my hero

@DHowett-MSFT commented on GitHub (Nov 13, 2019): @skyline75489 my hero
Author
Owner

@skyline75489 commented on GitHub (Nov 13, 2019):

@DHowett-MSFT my man. I'm seeing regression on current master 306e751639

图片

@skyline75489 commented on GitHub (Nov 13, 2019): @DHowett-MSFT my man. I'm seeing regression on current master https://github.com/microsoft/terminal/commit/306e75163929161b130872dcae890686da57790d ![图片](https://user-images.githubusercontent.com/4710575/68740302-ba74e100-0625-11ea-8ca4-c02fbf7cc099.png)
Author
Owner

@skyline75489 commented on GitHub (Nov 13, 2019):

I've used git bisect and found that ddcc06e911 is the first bad commit. I've got no idea about what #3474 even is..

@skyline75489 commented on GitHub (Nov 13, 2019): I've used `git bisect` and found that ddcc06e911a18250c8544c7ea83011451e8256a2 is the first bad commit. I've got no idea about what #3474 even is..
Author
Owner

@skyline75489 commented on GitHub (Nov 13, 2019):

I've tried this on my laptop and sadly the regression is the same:

图片

Emoji is noticeably broken. Japanese characters are different, too.

@skyline75489 commented on GitHub (Nov 13, 2019): I've tried this on my laptop and sadly the regression is the same: ![图片](https://user-images.githubusercontent.com/4710575/68770861-e615bc80-0661-11ea-840f-dab6528b9841.png) Emoji is noticeably broken. Japanese characters are different, too.
Author
Owner

@zadjii-msft commented on GitHub (Nov 13, 2019):

@skyline75489 Thanks for the report! I'm going to move that regression to it's own thread (#3553), considering you've narrowed the regression down to a particular commit. Broadly our text rendering needs help with M chars->N glyphs scenarios like this issue describes, but we probably shouldn't have things getting worse on us.

@zadjii-msft commented on GitHub (Nov 13, 2019): @skyline75489 Thanks for the report! I'm going to move that regression to it's own thread (#3553), considering you've narrowed the regression down to a particular commit. _Broadly_ our text rendering needs help with M chars->N glyphs scenarios like this issue describes, but we probably shouldn't have things getting worse on us.
Author
Owner

@reli-msft commented on GitHub (Nov 16, 2019):

This is heavily related to this issue: https://github.com/microsoft/terminal/issues/780
There's WriteCharsLegacy and Terminal::_WriteBuffer, and they should be unified into one function that writes char stream to the back buffer with proper itemization.

@reli-msft commented on GitHub (Nov 16, 2019): **This** is heavily related to this issue: https://github.com/microsoft/terminal/issues/780 There's `WriteCharsLegacy` and `Terminal::_WriteBuffer`, and they should be unified into one function that writes char stream to the back buffer with _proper_ itemization.
Author
Owner

@skyline75489 commented on GitHub (Nov 16, 2019):

Without #780 we can still improve the current implementation like #3578 . I'm not sure whether that's worth the effort, though. It's more like scratching the surface.

@skyline75489 commented on GitHub (Nov 16, 2019): Without #780 we can still improve the current implementation like #3578 . I'm not sure whether that's worth the effort, though. It's more like scratching the surface.
Author
Owner

@reli-msft commented on GitHub (Nov 16, 2019):

@skyline75489
We need to change the way we write buffers to perform correct line wrap and cursor movement.
For WT there are two buffers being used: one for CONHOST, which is the "Ground Truth", and one for WT, and the write subroutine is written differently.

So what's wrong with WriteCharsLegacy? Well...

  • It does not process surrogate pairs properly.
  • Combining characters are not written into the "previous" cell. They are treated separately and the cursor is moved.

And to provide a proper text layout, cells should...

  • Have some metadata to guide the client that whether they should be shaped together with adjacent cells.
  • If the real width of glyphs is different from the desired width, whar should we do.
@reli-msft commented on GitHub (Nov 16, 2019): @skyline75489 We need to change the way we write buffers to perform correct line wrap and cursor movement. For WT there are two buffers being used: one for `CONHOST`, which is the "Ground Truth", and one for WT, and the write subroutine is written differently. So what's wrong with `WriteCharsLegacy`? Well... * It does not process surrogate pairs properly. * Combining characters are not written into the "previous" cell. They are treated separately and the cursor is moved. And to provide a _proper_ text layout, cells should... * Have some metadata to guide the client that whether they should be shaped together with adjacent cells. * If the real width of glyphs is different from the desired width, whar should we do.
Author
Owner

@skyline75489 commented on GitHub (Nov 16, 2019):

@reli-msft Gotta admit, the whole Unicode experience is harder to fix than I initially anticipated. I'm not sure the exact plan Console team have. But I'm positive that we are heading to the right direction.

@skyline75489 commented on GitHub (Nov 16, 2019): @reli-msft Gotta admit, the whole Unicode experience is harder to fix than I initially anticipated. I'm not sure the exact plan Console team have. But I'm positive that we are heading to the right direction.
Author
Owner

@reli-msft commented on GitHub (Nov 16, 2019):

@skyline75489
Everybody are underestimating how complex text could be.

Let me count all the features that a proper rich text box should support:

  • Input methods, including...
    • Composing IME
    • Speech
    • Handwriting
    • Emoji picker
  • Complex text rendering, including...
    • Bidi
    • Shaping
    • Complex-script-aware editing
    • Complex-script-aware find, replace, etc.
  • Accessibility

And what they can choose to support:

  • Advanced typography, including...
    • About line breaking...
      • Hyphenation
      • Optimized line break (Knuth-plass, etc.)
    • About microtypography
      • OpenType features (ligatures, etc.)
      • OpenType variations
      • Multiple master fonts
      • Proper font fallback ← It is not a simple lookup-at-each-character process
    • Advanced Middle East features, including...
      • Kashida
    • Advanced Far East features, including...
      • Kinsoku Shori
      • Auto space insertion
      • Kumimoji
      • Warichu
      • Ruby
    • Proper vertical layout, including...
      • Yoko-in-Tate
  • Inline objects and paragraph-like objects, including...
    • Images
    • Hyperlinks
    • Math equations ← This is really hard.
@reli-msft commented on GitHub (Nov 16, 2019): @skyline75489 Everybody are underestimating how complex text could be. Let me count all the features that a proper rich text box should support: - Input methods, including... - Composing IME - Speech - Handwriting - Emoji picker - Complex text rendering, including... - Bidi - Shaping - Complex-script-aware editing - Complex-script-aware find, replace, etc. - Accessibility And what they can choose to support: - Advanced typography, including... - About line breaking... - Hyphenation - Optimized line break (Knuth-plass, etc.) - About microtypography - OpenType features (ligatures, etc.) - OpenType variations - Multiple master fonts - Proper font fallback ← It is not a simple lookup-at-each-character process - Advanced Middle East features, including... - Kashida - Advanced Far East features, including... - Kinsoku Shori - Auto space insertion - Kumimoji - Warichu - Ruby - Proper vertical layout, including... - Yoko-in-Tate - Inline objects and paragraph-like objects, including... - Images - Hyperlinks - Math equations ← This is really hard.
Author
Owner

@skyline75489 commented on GitHub (Nov 16, 2019):

@reli-msft I'd trust you with the whole Unicode thingy myself. I'm a real rookie on this. You can tell from my PR ... Anyway let @DHowett-MSFT take us to where we should eventually be.

@skyline75489 commented on GitHub (Nov 16, 2019): @reli-msft I'd trust you with the whole Unicode thingy myself. I'm a real rookie on this. You can tell from my PR ... Anyway let @DHowett-MSFT take us to where we should eventually be.
Author
Owner

@DHowett-MSFT commented on GitHub (Mar 3, 2020):

Latest results from master

image

@DHowett-MSFT commented on GitHub (Mar 3, 2020): Latest results from master ![image](https://user-images.githubusercontent.com/14316954/75734034-b4c20a80-5cab-11ea-89cb-e91e955623a4.png)
Author
Owner

@reli-msft commented on GitHub (Mar 3, 2020):

@DHowett-MSFT Looks better... Sort of.
Still needs to update our knowledge about Hangul Jamos' width

@reli-msft commented on GitHub (Mar 3, 2020): @DHowett-MSFT Looks better... Sort of. Still needs to update our knowledge about Hangul Jamos' width
Author
Owner

@DHowett commented on GitHub (Jun 27, 2024):

You know, with the completion of #16916 I believe we've finally cracked this nut.

Image

@DHowett commented on GitHub (Jun 27, 2024): You know, with the completion of #16916 I believe we've finally cracked this nut. ![Image](https://github.com/user-attachments/assets/46a4b77b-9c29-4d1d-82ee-6b573b2b2df5)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#4957