Fine-grained DWrite text analysis based on text complexity #12573

Closed
opened 2026-01-31 03:19:09 +00:00 by claunia · 10 comments
Owner

Originally created by @skyline75489 on GitHub (Feb 14, 2021).

Description of the new feature/enhancement

Inspired by https://github.com/microsoft/cascadia-code/issues/411, certain ASCII characters sometimes break the simplicity of the entire text, depending on the font being used. The current implementation skips dwrite analysis when the entire text is simple:

if (!_isEntireTextSimple)
{
    // Call each of the analyzers in sequence, recording their results.
    RETURN_IF_FAILED(_fontRenderData->Analyzer()->AnalyzeLineBreakpoints(this, 0, textLength, this));
    RETURN_IF_FAILED(_fontRenderData->Analyzer()->AnalyzeBidi(this, 0, textLength, this));
    RETURN_IF_FAILED(_fontRenderData->Analyzer()->AnalyzeScript(this, 0, textLength, this));
    RETURN_IF_FAILED(_fontRenderData->Analyzer()->AnalyzeNumberSubstitution(this, 0, textLength, this));
    // Perform our custom font fallback analyzer that mimics the pattern of the real analyzers.
    RETURN_IF_FAILED(_AnalyzeFontFallback(this, 0, textLength));
}

With for example Fira Code, in most cases the optimization only applies to lines with 120 spaces, which is not good.

Proposed technical implementation details (optional)

GetTextComplexity can provide a breakdown report of the text, showing which specific range of the text is simple, we should be able to utilize it like this:

for (auto range : complexRanges)
{
    // Call each of the analyzers in sequence, recording their results.
    RETURN_IF_FAILED(_fontRenderData->Analyzer()->AnalyzeLineBreakpoints(this, range, this));
    RETURN_IF_FAILED(_fontRenderData->Analyzer()->AnalyzeBidi(this, range , this));
    RETURN_IF_FAILED(_fontRenderData->Analyzer()->AnalyzeScript(this, range , this));
    RETURN_IF_FAILED(_fontRenderData->Analyzer()->AnalyzeNumberSubstitution(this, range, this));
    // Perform our custom font fallback analyzer that mimics the pattern of the real analyzers.
    RETURN_IF_FAILED(_AnalyzeFontFallback(this, range));
}

See #6695 for the introduction of text complexity analysis.

Originally created by @skyline75489 on GitHub (Feb 14, 2021). <!-- 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨 I ACKNOWLEDGE THE FOLLOWING BEFORE PROCEEDING: 1. If I delete this entire template and go my own path, the core team may close my issue without further explanation or engagement. 2. If I list multiple bugs/concerns in this one issue, the core team may close my issue without further explanation or engagement. 3. If I write an issue that has many duplicates, the core team may close my issue without further explanation or engagement (and without necessarily spending time to find the exact duplicate ID number). 4. If I leave the title incomplete when filing the issue, the core team may close my issue without further explanation or engagement. 5. If I file something completely blank in the body, the core team may close my issue without further explanation or engagement. All good? Then proceed! --> # Description of the new feature/enhancement Inspired by https://github.com/microsoft/cascadia-code/issues/411, certain ASCII characters sometimes break the simplicity of the entire text, depending on the font being used. The current implementation skips dwrite analysis when the entire text is simple: ```cpp if (!_isEntireTextSimple) { // Call each of the analyzers in sequence, recording their results. RETURN_IF_FAILED(_fontRenderData->Analyzer()->AnalyzeLineBreakpoints(this, 0, textLength, this)); RETURN_IF_FAILED(_fontRenderData->Analyzer()->AnalyzeBidi(this, 0, textLength, this)); RETURN_IF_FAILED(_fontRenderData->Analyzer()->AnalyzeScript(this, 0, textLength, this)); RETURN_IF_FAILED(_fontRenderData->Analyzer()->AnalyzeNumberSubstitution(this, 0, textLength, this)); // Perform our custom font fallback analyzer that mimics the pattern of the real analyzers. RETURN_IF_FAILED(_AnalyzeFontFallback(this, 0, textLength)); } ``` With for example `Fira Code`, in most cases the optimization only applies to lines with 120 spaces, which is not good. <!-- A clear and concise description of what the problem is that the new feature would solve. Describe why and how a user would use this new functionality (if applicable). --> # Proposed technical implementation details (optional) `GetTextComplexity` can provide a breakdown report of the text, showing which specific range of the text is simple, we should be able to utilize it like this: ```cpp for (auto range : complexRanges) { // Call each of the analyzers in sequence, recording their results. RETURN_IF_FAILED(_fontRenderData->Analyzer()->AnalyzeLineBreakpoints(this, range, this)); RETURN_IF_FAILED(_fontRenderData->Analyzer()->AnalyzeBidi(this, range , this)); RETURN_IF_FAILED(_fontRenderData->Analyzer()->AnalyzeScript(this, range , this)); RETURN_IF_FAILED(_fontRenderData->Analyzer()->AnalyzeNumberSubstitution(this, range, this)); // Perform our custom font fallback analyzer that mimics the pattern of the real analyzers. RETURN_IF_FAILED(_AnalyzeFontFallback(this, range)); } ``` See #6695 for the introduction of text complexity analysis. <!-- A clear and concise description of what you want to happen. -->
Author
Owner

@skyline75489 commented on GitHub (Feb 14, 2021):

This should also help users who use non-English locales, for example avoid analyze entirely:

版权所有 (C) Microsoft Corporation。保留所有权利。

@skyline75489 commented on GitHub (Feb 14, 2021): This should also help users who use non-English locales, for example avoid analyze entirely: `版权所有 (C) Microsoft Corporation。保留所有权利。`
Author
Owner

@skyline75489 commented on GitHub (Feb 14, 2021):

/cc @miniksa for both sanity & technical check

@skyline75489 commented on GitHub (Feb 14, 2021): /cc @miniksa for both sanity & technical check
Author
Owner

@skyline75489 commented on GitHub (Feb 15, 2021):

I've done some experiment and I found that the text complexity is not the same as run splitting. For example with the following text:

版权所有 (C) Microsoft Corporation。保留所有权利。

The text complexity analysis reports (a, b is pos, length pair) :

  • 0, 4: Complex
  • 4, 26: Simple
  • 30, 8: Complex
  • 38, 70: Simple

The run analysis split it into the following runs:

  • 0, 6
  • 6, 25
  • 31, 77

We might also need some sort of RLE implementation to find it a run is entire simple and then optimize the shaping process for the run.

@skyline75489 commented on GitHub (Feb 15, 2021): I've done some experiment and I found that the text complexity is not the same as run splitting. For example with the following text: 版权所有 (C) Microsoft Corporation。保留所有权利。 The text complexity analysis reports (a, b is pos, length pair) : * 0, 4: Complex * 4, 26: Simple * 30, 8: Complex * 38, 70: Simple The run analysis split it into the following runs: * 0, 6 * 6, 25 * 31, 77 We might also need some sort of RLE implementation to find it a run is entire simple and then optimize the shaping process for the run.
Author
Owner

@miniksa commented on GitHub (Feb 16, 2021):

I agree that we should make use of the additional analysis information to improve performance in this way.

I do think that we could just further split the Runs and give them an additional simple-or-not parameter (bool) during the initial _AnalyzeTextComplexity that is just picked up during _AnalyzeRuns to determine the full analysis or skip and again during _ShapeGlyphRuns to determine the quick-mapping or slow-mapping to glyphs. In lieu of the whole thing being simple, a Run would be simple or not.

I'm not quite sure why your example maps as it does. Are some of those characters UTF-16 surrogate pairs?

@miniksa commented on GitHub (Feb 16, 2021): I agree that we should make use of the additional analysis information to improve performance in this way. I do think that we could just further split the `Run`s and give them an additional simple-or-not parameter (`bool`) during the initial `_AnalyzeTextComplexity` that is just picked up during `_AnalyzeRuns` to determine the full analysis or skip and again during `_ShapeGlyphRuns` to determine the quick-mapping or slow-mapping to glyphs. In lieu of the whole thing being simple, a `Run` would be simple or not. I'm not quite sure why your example maps as it does. Are some of those characters UTF-16 surrogate pairs?
Author
Owner

@skyline75489 commented on GitHub (Feb 16, 2021):

those are just normal Chinese characters. Originally I thought text complexity analysis would split the text the same way as run splitting. Just want to add an example to show that it’s not.

a Run would be simple or no

This is likely undetermined. In the example above:

“版权所有 (”

This is a Run. But according to text complexity, the first 4 characters are complex, the last 2 characters are simple. This is what frustrates me. We can’t just simply know a Run is simple or not easily and optimize based on that.

@skyline75489 commented on GitHub (Feb 16, 2021): those are just normal Chinese characters. Originally I thought text complexity analysis would split the text the same way as run splitting. Just want to add an example to show that it’s not. > a Run would be simple or no This is likely undetermined. In the example above: “版权所有 (” This is a Run. But according to text complexity, the first 4 characters are complex, the last 2 characters are simple. This is what frustrates me. We can’t just simply know a Run is simple or not easily and optimize based on that.
Author
Owner

@miniksa commented on GitHub (Feb 17, 2021):

Yeah but what I'm saying is that we can just call _SetCurrentRun and _SplitCurrentRun inside of _AnalyzeTextComplexity when we start listening to the length of the complexity and add the additional data.

So then you have a [0,4) complex run. [6,8) simple run. [8, 26) simple run. etc. etc.

@miniksa commented on GitHub (Feb 17, 2021): Yeah but what I'm saying is that we can just call `_SetCurrentRun` and `_SplitCurrentRun` inside of `_AnalyzeTextComplexity` when we start listening to the length of the complexity and add the additional data. So then you have a [0,4) complex run. [6,8) simple run. [8, 26) simple run. etc. etc.
Author
Owner

@skyline75489 commented on GitHub (Feb 17, 2021):

Doesn’t that bring more fragmentation into the process? Will it affect the line breaking and script analysis result? I need to dig more into this...

获取 Outlook for iOShttps://aka.ms/o0ukef

@skyline75489 commented on GitHub (Feb 17, 2021): Doesn’t that bring more fragmentation into the process? Will it affect the line breaking and script analysis result? I need to dig more into this... 获取 Outlook for iOS<https://aka.ms/o0ukef>
Author
Owner

@miniksa commented on GitHub (Feb 17, 2021):

To your questions: oh probably. It's worth a try though to see if it just works. Sometimes the simple answer is "good enough". If it turns out to not be, we can refine further from there. Feel free to try/dig!

@miniksa commented on GitHub (Feb 17, 2021): To your questions: oh probably. It's worth a try though to see if it just works. Sometimes the simple answer is "good enough". If it turns out to not be, we can refine further from there. Feel free to try/dig!
Author
Owner

@skyline75489 commented on GitHub (Jul 20, 2021):

Can we reopen this? #9202 was reverted.

#10036 is a unsuccessful attempt to patch #9202.

@skyline75489 commented on GitHub (Jul 20, 2021): Can we reopen this? #9202 was reverted. #10036 is a unsuccessful attempt to patch #9202.
Author
Owner

@lhecker commented on GitHub (Oct 12, 2022):

AtlasEngine does this! 💖

@lhecker commented on GitHub (Oct 12, 2022): AtlasEngine does this! 💖
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#12573