Funadmental Flaws inherent in the design of AtlasEngine #20654

Closed
opened 2026-01-31 07:20:15 +00:00 by claunia · 5 comments
Owner

Originally created by @AffluentOwl on GitHub (Oct 11, 2023).

Windows Terminal version

No response

Windows build number

No response

Other Software

No response

Steps to reproduce

  1. Install an OpenType font using contextual alternates like Numderline. https://thume.ca/numderline/
  2. Set the terminal font to this font
  3. Type 123456 in the terminal

Expected Behavior

The underlines properly display under digits in the thousands places as configured by the font.

Actual Behavior

The text displays as 123456 with no underlines.

Originally created by @AffluentOwl on GitHub (Oct 11, 2023). ### Windows Terminal version _No response_ ### Windows build number _No response_ ### Other Software _No response_ ### Steps to reproduce 1. Install an OpenType font using contextual alternates like Numderline. https://thume.ca/numderline/ 2. Set the terminal font to this font 3. Type 123456 in the terminal ### Expected Behavior The underlines properly display under digits in the thousands places as configured by the font. ### Actual Behavior The text displays as 123456 with no underlines. ![](https://blog.janestreet.com/commas-in-big-numbers-everywhere/numderline_header2.png)
claunia added the Needs-TriageIssue-Bug labels 2026-01-31 07:20:15 +00:00
Author
Owner

@AffluentOwl commented on GitHub (Oct 11, 2023):

Correct me if something has changed in the implementation, but the general concept underlying AtlasEngine is that a series of Unicode points (extended grapheme cluster). The basic assumption inherent to the design is that Unicode points far away, cannot affect the rendering of other Unicode points. However, this just isn't the way Unicode/OpenType text rendering is specified.

Some exceptions with some brief research:

  1. The font for example shows the case when OpenType GSUB tables allow a number in the thousands position to receive an underline, but the exact same number in the ones position does not. And in particular, this example shows how even standard ASCII characters cannot be special cased along a fast rendering path.
  2. While this might be considered a stylistic enhancement in English, this feature is required in languages like Arabic, but because that's a script language and entire words turn into a single grapheme this might not show bugs easily in the current implementation, however, I suspect with more careful investigation the underlying issue could be found in another language (or perhaps even in Arabic) where two graphemes depend upon each across a grapheme boundary.
  3. In the general case, OpenType tables provide a Turing complete language which can operate over strings of infinite size, so no code point in a string is safe to be pre-shaped or shaped in isolation.
  4. Consider that SVG's can be stored in OpenType fonts, including animations, which prevents static pre-rendering.
  5. Far reaching unicode points like U+206E (national digit shapes), which affect all glyphs which appear after it in a run, and would override the nominal number 1 glyph, with the digits for that language, like the special glyphs in arabic(and many other languages) for the number. These act a lot like ANSI colors in that they can appear anywhere in a document and affect random spans. And really any character in Unicode Chapter 4.12 Characters with Unusual Properties - Complex expression format control (scoped). Especially the Bidirectional Ordering Controls Chapter and Stateful Format Controls 23.2 + 23.3.

Input

U+202E THIS IS A TEST 123 U+202C 789 

Becomes

‮ THIS IS A TEST 123 ‬ 789 

The crux of the issue is that the concept of AtlasEngine fundamentally violates the documented requirements of Uniscribe to operate on "entire paragraphs" of text at a time, as the smallest possible unit. Some of these issues likely can't be fixed without rewriting Uniscribe. Others might be worth fixing by rewritting small parts of Uniscribe outside of it (like range tracking the RTL stack as colors are done now). And this will be an ongoing issue, where for each version of Unicode that is released, will need to be reviewed for new exceptions the Terminal needs to implement rather than being transparently taken care of by Uniscribe.

However, if the performance benefits are deemed worth the non-conformance with OpenType / Uniscribe for certain scenarios -- it should be documented exactly what features are missing from the Windows Terminal's custom implementation of OpenType and Uniscribe so users can make an informed decision if AtlasEngine is best for their use case or if their use case demands higher correctness.

[1] https://unicode.org/reports/tr29/
[2] https://blog.janestreet.com/commas-in-big-numbers-everywhere/
[3] https://litherum.blogspot.com/2019/03/addition-font.html
[4] https://colorfonts.langustefonts.com/howto.html
[5] https://www.unicode.org/versions/Unicode15.0.0/ch04.pdf
[6] https://www.unicode.org/versions/Unicode15.0.0/ch23.pdf
[7] https://learn.microsoft.com/en-us/windows/win32/intl/displaying-text-with-uniscribe

@AffluentOwl commented on GitHub (Oct 11, 2023): Correct me if something has changed in the implementation, but the general concept underlying AtlasEngine is that a series of Unicode points (extended grapheme cluster). The basic assumption inherent to the design is that Unicode points far away, cannot affect the rendering of other Unicode points. However, this just isn't the way Unicode/OpenType text rendering is specified. Some exceptions with some brief research: 1. The font for example shows the case when OpenType GSUB tables allow a number in the thousands position to receive an underline, but the exact same number in the ones position does not. And in particular, this example shows how even standard ASCII characters cannot be special cased along a fast rendering path. 2. While this might be considered a stylistic enhancement in English, this feature is required in languages like Arabic, but because that's a script language and entire words turn into a single grapheme this might not show bugs easily in the current implementation, however, I suspect with more careful investigation the underlying issue could be found in another language (or perhaps even in Arabic) where two graphemes depend upon each across a grapheme boundary. 3. In the general case, OpenType tables provide a Turing complete language which can operate over strings of infinite size, so no code point in a string is safe to be pre-shaped or shaped in isolation. 4. Consider that SVG's can be stored in OpenType fonts, including animations, which prevents static pre-rendering. 5. Far reaching unicode points like U+206E (national digit shapes), which affect all glyphs which appear after it in a run, and would override the nominal number 1 glyph, with the digits for that language, like the special glyphs in arabic(and many other languages) for the number. These act a lot like ANSI colors in that they can appear anywhere in a document and affect random spans. And really any character in Unicode Chapter 4.12 Characters with Unusual Properties - Complex expression format control (scoped). Especially the Bidirectional Ordering Controls Chapter and Stateful Format Controls 23.2 + 23.3. Input ``` U+202E THIS IS A TEST 123 U+202C 789 ``` Becomes ``` ‮ THIS IS A TEST 123 ‬ 789 ``` The crux of the issue is that the concept of AtlasEngine fundamentally violates the documented requirements of Uniscribe to operate on "entire paragraphs" of text at a time, as the smallest possible unit. Some of these issues likely can't be fixed without rewriting Uniscribe. Others might be worth fixing by rewritting small parts of Uniscribe outside of it (like range tracking the RTL stack as colors are done now). And this will be an ongoing issue, where for each version of Unicode that is released, will need to be reviewed for new exceptions the Terminal needs to implement rather than being transparently taken care of by Uniscribe. However, if the performance benefits are deemed worth the non-conformance with OpenType / Uniscribe for certain scenarios -- it should be documented exactly what features are missing from the Windows Terminal's custom implementation of OpenType and Uniscribe so users can make an informed decision if AtlasEngine is best for their use case or if their use case demands higher correctness. [1] https://unicode.org/reports/tr29/ [2] https://blog.janestreet.com/commas-in-big-numbers-everywhere/ [3] https://litherum.blogspot.com/2019/03/addition-font.html [4] https://colorfonts.langustefonts.com/howto.html [5] https://www.unicode.org/versions/Unicode15.0.0/ch04.pdf [6] https://www.unicode.org/versions/Unicode15.0.0/ch23.pdf [7] https://learn.microsoft.com/en-us/windows/win32/intl/displaying-text-with-uniscribe
Author
Owner

@zadjii-msft commented on GitHub (Oct 11, 2023):

Huh, interesting...
image

Can you share what version of the Terminal you're using, and your settings.json file? We're pretty sure this is supposed to work 😄

@zadjii-msft commented on GitHub (Oct 11, 2023): Huh, interesting... ![image](https://github.com/microsoft/terminal/assets/18356694/f20006fe-1200-4b15-9c06-c39cdd94b1ee) Can you share what version of the Terminal you're using, and your [settings.json file](https://github.com/microsoft/terminal/wiki/Frequently-Asked-Questions-(FAQ)#where-can-i-find-the-settings-file)? We're pretty sure this is supposed to work 😄
Author
Owner

@DHowett commented on GitHub (Oct 11, 2023):

(For the rest of your notes that don't pertain specifically to Numderline but to Unicode clustering, shaping, and our compliance as a whole, thanks for writing them up so concisely! We'll need to wait until @lhecker is back from his time off before we have a comprehensive response though.)

@DHowett commented on GitHub (Oct 11, 2023): (For the rest of your notes that don't pertain specifically to Numderline but to Unicode clustering, shaping, and our compliance as a whole, thanks for writing them up so concisely! We'll need to wait until @lhecker is back from his time off before we have a comprehensive response though.)
Author
Owner

@AffluentOwl commented on GitHub (Oct 11, 2023):

Sorry I wasn't able to come up with a good minimal repro yet of the purest form of what I wanted to demonstrate, as I was running into other bugs (design choices?). Playing with this more, I think that the current implementation seems to be turning real lines from the file into psuedo-lines, that break on N bytes of data instead of N glyphs of data (or some measured width). This completely breaks the rendering in the middle of glyphs. (regardless of AtlasEngine)

1 - Line Break on Bytes

1..100 | ForEach-Object { Write-Host "a" -NoNewLine }; 1..10 | ForEach-Object { Write-Host "`u{0364}`u{0365}" -NoNewLine }

Actual

image

Expected

Notepad
image

Edge/Chrome/Harfbuzz
image

Word/Uniscribe (Clips to line extents, but renders vertically)
image

Breaking the combining glyph is definitely undesirable, but it's stacking all the combining marks on top of each other is due to some flag passed to Uniscribe, perhaps designed to constrain line height, but Word shows it could be changed to chrome style rendering with anti-aliased text + transparency.

2 - No Unicode Line Breaking

The next related issue is that the line breaks do not use anything close to the Unicode Line Breaking rules. So this happens:

1..10 | ForEach-Object { Write-Host "111000" -NoNewLine }; Write-Host " " -NoNewLine; 1..10 | ForEach-Object { Write-Host "111000" -NoNewLine

Actual

image

Expected

Notepad
image

I think this could be more arguably justified, or perhaps given as an option to users to use proper Unicode line breaking or not. But the main issue which becomes obvious is that the underlines no longer underline the expected sets of 3 digits.

3 - Irreversible window resizes

Also the way lines attempt to be recombined when resizing the window feels quite janky if the user has no scrollback buffer, because as the user widens and narrows the window, they lose their data, as the resize operation is not isomorphic. To me, the notion of a true logical line understood by the system would feel more natural. The user has no way to guarantee they can scroll back, since they might not be able to control if 1 long line consumes their whole 1000 lines of scrollback buffer.

4 - Scoped Control Characters

I tested with the RTL override and it didn't seem supported at all by the terminal. But these seem like a pretty scary / open question.

Write-Host "`u{202E}ABC`u{202C}_`u{202E}" -NoNewLine; 1..100 | ForEach-Object { Write-Host "ABC" -NoNewLine }

Actual

image

Expected

Notepad
image

U+206E (National Digit Shapes) also seems to be ignored. Requires changing Control Panel -> Regional Format -> Arabic (Saudi Arabia).

Write-Host "1234567890 `u{206E}1234567890"

Actual

image

Expected

Notepad
image

5 -Wide Spanning OpenType lookup tables

So related to the first repro example I gave doesn't hold up as the assumption I wrote about the engine doesn't seem to be true at the moment (but perhaps that's the next step in the works?), I think that the terminal currently gets lucky that it has not yet implemented #1860 with support for infinitely wide lines, because that will open the full extend of this bug, assuming whole lines need to be shaped all at once and will sometimes be too big to all be in memory/processed at once.

But I think it is reasonable for users to expect paragraphs of text they output on the terminal to still support contextual alternates (like Numderline) and other shaping within their paragraph (long line in this case) without the terminal injecting its own formatting / breaking the user's formatting.

I'd argue this is a quite common occurrence, more than the fist glance 1 of 80 characters is a forced line break make it a rate 1.25% occurrence per line, as users with small or resized / actively resizing windows they will go through every size and hit all of those edge cases.

@AffluentOwl commented on GitHub (Oct 11, 2023): Sorry I wasn't able to come up with a good minimal repro yet of the purest form of what I wanted to demonstrate, as I was running into other bugs (design choices?). Playing with this more, I think that the current implementation seems to be turning real lines from the file into psuedo-lines, that break on N bytes of data instead of N glyphs of data (or some measured width). This completely breaks the rendering in the middle of glyphs. (regardless of AtlasEngine) ## 1 - Line Break on Bytes ```pwsh 1..100 | ForEach-Object { Write-Host "a" -NoNewLine }; 1..10 | ForEach-Object { Write-Host "`u{0364}`u{0365}" -NoNewLine } ``` ### Actual > ![image](https://github.com/microsoft/terminal/assets/13774489/8510028b-1df7-45f0-adb4-4329517b32e3) ### Expected > Notepad > ![image](https://github.com/microsoft/terminal/assets/13774489/ec021d54-e717-4dbc-a951-9bd67a96d2d5) > > Edge/Chrome/Harfbuzz > ![image](https://github.com/microsoft/terminal/assets/13774489/4bca808d-779e-4432-ba4e-4b4a3556484b) > > Word/Uniscribe (Clips to line extents, but renders vertically) > ![image](https://github.com/microsoft/terminal/assets/13774489/5710ae0e-13b2-4375-a6f7-a1a4db67e6b8) Breaking the combining glyph is definitely undesirable, but it's stacking all the combining marks on top of each other is due to some flag passed to Uniscribe, perhaps designed to constrain line height, but Word shows it could be changed to chrome style rendering with anti-aliased text + transparency. ## 2 - No Unicode Line Breaking The next related issue is that the line breaks do not use anything close to the [Unicode Line Breaking rules](https://unicode.org/reports/tr14/tr14-14.html#Algorithm). So this happens: ```pwsh 1..10 | ForEach-Object { Write-Host "111000" -NoNewLine }; Write-Host " " -NoNewLine; 1..10 | ForEach-Object { Write-Host "111000" -NoNewLine ``` ### Actual > ![image](https://github.com/microsoft/terminal/assets/13774489/6e5821a9-6337-490a-be9c-961f34aeceaf) ### Expected > Notepad > ![image](https://github.com/microsoft/terminal/assets/13774489/11d89419-587b-43f9-9d91-c04a94f5f6fc) I think this could be more arguably justified, or perhaps given as an option to users to use proper Unicode line breaking or not. But the main issue which becomes obvious is that the underlines no longer underline the expected sets of 3 digits. ## 3 - Irreversible window resizes Also the way lines attempt to be recombined when resizing the window feels quite janky if the user has no scrollback buffer, because as the user widens and narrows the window, they lose their data, as the resize operation is not isomorphic. To me, the notion of a true logical line understood by the system would feel more natural. The user has no way to guarantee they can scroll back, since they might not be able to control if 1 long line consumes their whole 1000 lines of scrollback buffer. ## 4 - Scoped Control Characters I tested with the RTL override and it didn't seem supported at all by the terminal. But these seem like a pretty scary / open question. ```pwsh Write-Host "`u{202E}ABC`u{202C}_`u{202E}" -NoNewLine; 1..100 | ForEach-Object { Write-Host "ABC" -NoNewLine } ``` ### Actual > ![image](https://github.com/microsoft/terminal/assets/13774489/41f85e2f-dbce-4e20-8cb6-664fdd567d64) ### Expected > Notepad > ![image](https://github.com/microsoft/terminal/assets/13774489/f09849f0-f8c7-444c-9221-379c480f166b) U+206E (National Digit Shapes) also seems to be ignored. Requires changing Control Panel -> Regional Format -> Arabic (Saudi Arabia). ```pwsh Write-Host "1234567890 `u{206E}1234567890" ``` ### Actual > ![image](https://github.com/microsoft/terminal/assets/13774489/129640a1-856f-4e71-a75c-771d99ee58ca) ### Expected > Notepad > ![image](https://github.com/microsoft/terminal/assets/13774489/fd0c8449-5ec3-4736-b80a-21010af1f0ba) ## 5 -Wide Spanning OpenType lookup tables So related to the first repro example I gave doesn't hold up as the assumption I wrote about the engine doesn't seem to be true at the moment (but perhaps that's the next step in the works?), I think that the terminal currently gets lucky that it has not yet implemented #1860 with support for infinitely wide lines, because that will open the full extend of this bug, assuming whole lines need to be shaped all at once and will sometimes be too big to all be in memory/processed at once. But I think it is reasonable for users to expect paragraphs of text they output on the terminal to still support contextual alternates (like Numderline) and other shaping within their paragraph (long line in this case) without the terminal injecting its own formatting / breaking the user's formatting. I'd argue this is a quite common occurrence, more than the fist glance 1 of 80 characters is a forced line break make it a rate 1.25% occurrence per line, as users with small or resized / actively resizing windows they will go through every size and hit all of those edge cases.
Author
Owner

@lhecker commented on GitHub (Oct 24, 2023):

Sorry for responding late. I forgot to set myself a reminder for responding to this.
Allow me to respond to each of the 5 points above in order:

1 - Line Break on Bytes

That issue is fixed in the latest AtlasEngine version in Windows Terminal 1.18 and later:
image

2 - No Unicode Line Breaking

That is unfortunately something we do intentionally. Terminals traditionally do not adhere to many parts of the Unicode spec since they were designed before Unicode was a thing. For instance, vim has a built-in functionality to print Hebrew/Arabic text in reverse, because it expects that the hosting terminal doesn't support RTL overrides/detection. The same is true for line breaks and it's traditionally expected that proper word-wise line breaks don't exist. But we don't properly support grapheme clusters either and that's making the issue worse than it should be. We're tracking this with #8000 and I'm actively working on implementing grapheme cluster support right now (and have been for a while).
Additionally, a new Unicode working group is currently forming to discuss these issues and it's possible that in the future we might have a Unicode spec that specifies how this should be handled.
After we got grapheme cluster support, we could consider adding support for TR14 line breaks, but I would not be in favor of implementing it right away, because I suspect it would not be widely used at all (most terminals don't implement anything like that either after all) and thus not be worth maintaining.

3 - Irreversible window resizes

We're tracking this at #15976. It'll unfortunately take a while to get this addressed.

4 - Scoped Control Characters

RTL overrides are tracked in #12711. I'll look into the U+206E support.

5 - Wide Spanning OpenType lookup tables

I'm not entirely sure I understand you there... Are you saying we should shape entire lines of text at time, without the terminal breaking them into lines to fit them into the viewport width? (This might be difficult to achieve due to the previous "No Unicode Line Breaking" point.)


All in all, none of the above are related to AtlasEngine specifically yet, apart from the U+206E support. We could open smaller, more specific issues instead.

@lhecker commented on GitHub (Oct 24, 2023): Sorry for responding late. I forgot to set myself a reminder for responding to this. Allow me to respond to each of the 5 points above in order: ### 1 - Line Break on Bytes That issue is fixed in the latest AtlasEngine version in Windows Terminal 1.18 and later: ![image](https://github.com/microsoft/terminal/assets/2256941/f218cb9f-f477-4b89-aaeb-85b443f7eea9) ### 2 - No Unicode Line Breaking That is unfortunately something we do intentionally. Terminals traditionally do not adhere to many parts of the Unicode spec since they were designed before Unicode was a thing. For instance, `vim` has a built-in functionality to print Hebrew/Arabic text in reverse, because it expects that the hosting terminal doesn't support RTL overrides/detection. The same is true for line breaks and it's traditionally expected that proper word-wise line breaks don't exist. But we don't properly support grapheme clusters either and that's making the issue worse than it should be. We're tracking this with #8000 and I'm actively working on implementing grapheme cluster support right now (and have been for a while). Additionally, a new Unicode working group is currently forming to discuss these issues and it's possible that in the future we might have a Unicode spec that specifies how this should be handled. After we got grapheme cluster support, we could consider adding support for TR14 line breaks, but I would not be in favor of implementing it right away, because I suspect it would not be widely used at all (most terminals don't implement anything like that either after all) and thus not be worth maintaining. ### 3 - Irreversible window resizes We're tracking this at #15976. It'll unfortunately take a while to get this addressed. ### 4 - Scoped Control Characters RTL overrides are tracked in #12711. I'll look into the U+206E support. ### 5 - Wide Spanning OpenType lookup tables I'm not entirely sure I understand you there... Are you saying we should shape entire lines of text at time, without the terminal breaking them into lines to fit them into the viewport width? (This might be difficult to achieve due to the previous "No Unicode Line Breaking" point.) --- All in all, none of the above are related to AtlasEngine specifically yet, apart from the U+206E support. We could open smaller, more specific issues instead.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#20654