[PR #9202] Initial implementation of fine-grained text analysis #27468

Open
opened 2026-01-31 09:22:08 +00:00 by claunia · 0 comments
Owner

Original Pull Request: https://github.com/microsoft/terminal/pull/9202

State: closed
Merged: Yes


This PR aims to optimize the text analysis process by breaking the text
into simple & complex runs according to the result of
GetTextComplexity. For simple runs, we can skip certain processing
steps to improve the analysis performance.

Previous to this PR, we rely on the result of AnalyzeBidi,
AnalyzeScript and AnalyzeNumberSubstitution to both break the text
into different runs and attach the corresponding
bidi/script/number_substitution information to the run. Thanks to #6695
we have the chance to skip the expensive analysis process when we found
the entire text is determined to be simple.

Inspired by https://github.com/microsoft/cascadia-code/issues/411 and
discussions in #9156, I found that the "entire text simplicity" is often
hard to meet. In order to fully utilize the complexity information of
the text, we need to first break the text into simple & complex ranges.
These ranges are also the initial runs prior to the
bidi/script/number_substitution analysis. This way we can skip the text
analysis for simple runs to speed up the process.

VALIDATION
Build & run cmatrix, cacafire, cat big.txt with it.

Initial simple run PR: #6695
Closes #9156

**Original Pull Request:** https://github.com/microsoft/terminal/pull/9202 **State:** closed **Merged:** Yes --- This PR aims to optimize the text analysis process by breaking the text into simple & complex runs according to the result of `GetTextComplexity`. For simple runs, we can skip certain processing steps to improve the analysis performance. Previous to this PR, we rely on the result of `AnalyzeBidi`, `AnalyzeScript` and `AnalyzeNumberSubstitution` to both break the text into different runs and attach the corresponding bidi/script/number_substitution information to the run. Thanks to #6695 we have the chance to skip the expensive analysis process when we found the *entire text* is determined to be simple. Inspired by https://github.com/microsoft/cascadia-code/issues/411 and discussions in #9156, I found that the "entire text simplicity" is often hard to meet. In order to fully utilize the complexity information of the text, we need to first break the text into simple & complex ranges. These ranges are also the initial runs prior to the bidi/script/number_substitution analysis. This way we can skip the text analysis for simple runs to speed up the process. VALIDATION Build & run cmatrix, cacafire, cat big.txt with it. Initial simple run PR: #6695 Closes #9156
claunia added the pull-request label 2026-01-31 09:22:08 +00:00
Sign in to join this conversation.
No Label pull-request
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#27468