RTL text in conhost is no longer rendered correctly #16547

Closed
opened 2026-01-31 05:15:20 +00:00 by claunia · 24 comments
Owner

Originally created by @j4james on GitHub (Jan 30, 2022).

Originally assigned to: @lhecker on GitHub.

Windows Terminal version

Commit eb7559733d

Windows build number

10.0.19041.1415

Other Software

No response

Steps to reproduce

  1. Build a recent version of OpenConsole.
  2. Open a conhost bash shell.
  3. Execute the following command: printf "\u05ea\u05d7\u05d0\n"

Expected Behavior

RTL characters should be displayed in the exact order they were output, and not reversed. This is what it looks like in my inbox conhost (10.0.19041.1415):

image

This also matches the behaviour of XTerm.

Actual Behavior

In the current version of OpenConsole (I think since PR #10478), RTL characters are reversed, like this:

image

I realise that some people might consider this a good thing, since it gives the superficial appearance that it's rendering RTL languages correctly, but it is not compatible with the original conhost and breaks genuine RTL-aware applications (which rely on characters being displayed exactly where they've been positioned).

Originally created by @j4james on GitHub (Jan 30, 2022). Originally assigned to: @lhecker on GitHub. ### Windows Terminal version Commit eb7559733d3cc9062c7de610f3b95d9143099ca1 ### Windows build number 10.0.19041.1415 ### Other Software _No response_ ### Steps to reproduce 1. Build a recent version of OpenConsole. 2. Open a conhost bash shell. 3. Execute the following command: `printf "\u05ea\u05d7\u05d0\n"` ### Expected Behavior RTL characters should be displayed in the exact order they were output, and not reversed. This is what it looks like in my inbox conhost (10.0.19041.1415): ![image](https://user-images.githubusercontent.com/4181424/151705192-2d96e962-999b-40c6-8810-2cc9a570315a.png) This also matches the behaviour of XTerm. ### Actual Behavior In the current version of OpenConsole (I think since PR #10478), RTL characters are reversed, like this: ![image](https://user-images.githubusercontent.com/4181424/151705205-a0692267-57a5-4c5f-93c7-bb4647f819ba.png) I realise that some people might consider this a good thing, since it gives the superficial appearance that it's rendering RTL languages correctly, but it is not compatible with the original conhost and breaks genuine RTL-aware applications (which rely on characters being displayed exactly where they've been positioned).
Author
Owner

@j4james commented on GitHub (Jan 30, 2022):

I'm not suggesting we revert PR #10478, since I'd hate to lose the benefits we get from that, but I think the RTL behaviour could be fixed by inserting an additional step to calculate the glyph indexes with GetCharacterPlacementW, before calling ExtTextOutW with ETO_GLYPH_INDEX. As long as we don't set the GCP_REORDER flag, the characters should be displayed in the original buffer order.

@j4james commented on GitHub (Jan 30, 2022): I'm not suggesting we revert PR #10478, since I'd hate to lose the benefits we get from that, but I think the RTL behaviour could be fixed by inserting an additional step to calculate the glyph indexes with `GetCharacterPlacementW`, before calling `ExtTextOutW` with `ETO_GLYPH_INDEX`. As long as we don't set the `GCP_REORDER` flag, the characters should be displayed in the original buffer order.
Author
Owner

@j4james commented on GitHub (Mar 1, 2022):

@DHowett Don't want to nag, but note that this is a regression in conhost, and I'm a little concerned that it hasn't been triaged and may have been overlooked.

@j4james commented on GitHub (Mar 1, 2022): @DHowett Don't want to nag, but note that this is a regression in conhost, and I'm a little concerned that it hasn't been triaged and may have been overlooked.
Author
Owner

@zadjii-msft commented on GitHub (Mar 7, 2022):

Sorry, yes this was overlooked. I think mentally I kinda go "yep, I'm sure that's a real bug" when I see your name as the filer 😋 I'll toss this in 1.14. We should fix this for the OS version of conhost.

@zadjii-msft commented on GitHub (Mar 7, 2022): Sorry, yes this was overlooked. I think mentally I kinda go "yep, I'm sure that's a real bug" when I see your name as the filer 😋 I'll toss this in 1.14. We should fix this for the OS version of conhost.
Author
Owner

@DHowett commented on GitHub (Mar 12, 2022):

Pulled triage. Sorry @j4james, I've been snowed in on e-mail as I had to leave to take care of some family stuff. Thanks for the first pass, Mike.
d

@DHowett commented on GitHub (Mar 12, 2022): Pulled triage. Sorry @j4james, I've been snowed in on e-mail as I had to leave to take care of some family stuff. Thanks for the first pass, Mike. d
Author
Owner

@DHowett commented on GitHub (Mar 15, 2022):

Yes. This is very important for us to fix. /cc @alabuzhev for thoughts on how ExtTextOut makes our lives more difficult here.

@DHowett commented on GitHub (Mar 15, 2022): Yes. This is very important for us to fix. /cc @alabuzhev for thoughts on how `ExtTextOut` makes our lives more difficult here.
Author
Owner

@j4james commented on GitHub (Mar 15, 2022):

FYI, my quick hack fix for this was to replace the ExtTextOutW call here:

dacff61f88/src/renderer/gdi/paint.cpp (L448)

with something like this:

std::array<wchar_t, 1000> glyphs;
GCP_RESULTS results{};
results.lStructSize = sizeof(results);
results.lpGlyphs = glyphs.data();
results.nGlyphs = gsl::narrow_cast<UINT>(glyphs.size());
GetCharacterPlacementW(_hdcMemoryContext, t.lpstr, t.n, GCP_MAXEXTENT, &results, 0);
if (!ExtTextOutW(_hdcMemoryContext, t.x, t.y, t.uiFlags | ETO_GLYPH_INDEX, &t.rcl, results.lpGlyphs, results.nGlyphs, t.pdx))

Obviously not intended to be production code, but you get the idea.

@j4james commented on GitHub (Mar 15, 2022): FYI, my quick hack fix for this was to replace the `ExtTextOutW` call here: https://github.com/microsoft/terminal/blob/dacff61f8862fa7c28f0244e74555cb2658455ad/src/renderer/gdi/paint.cpp#L448 with something like this: ```cpp std::array<wchar_t, 1000> glyphs; GCP_RESULTS results{}; results.lStructSize = sizeof(results); results.lpGlyphs = glyphs.data(); results.nGlyphs = gsl::narrow_cast<UINT>(glyphs.size()); GetCharacterPlacementW(_hdcMemoryContext, t.lpstr, t.n, GCP_MAXEXTENT, &results, 0); if (!ExtTextOutW(_hdcMemoryContext, t.x, t.y, t.uiFlags | ETO_GLYPH_INDEX, &t.rcl, results.lpGlyphs, results.nGlyphs, t.pdx)) ``` Obviously not intended to be production code, but you get the idea.
Author
Owner

@alabuzhev commented on GitHub (Mar 15, 2022):

RTL characters should be displayed in the exact order they were output, and not reversed

I wish this was true. And also each character occupied exactly one cell. And no zero width. And no surrogates. And no clusters. And so on and so forth. Unfortunately, text processing is a PITA.

but it is not compatible with the original conhost and breaks genuine RTL-aware applications (which rely on characters being displayed exactly where they've been positioned)

Then the new and shiny Windows Terminal is also not compatible with the original conhost and breaks such RTL-aware applications. Are there any complaints from their maintainers? Should it be fixed there too? And in conhost DX renderer? And in conemu, console2 and other similar frontends?

Overall, my experience here is extremely limited, I don't work with RTL and can't say how it should be. @trexinc, I remember somewhat related discussions eons ago on the forum about how the console should behave with RTL to make life less painful. Do you have any opinion about this?

@alabuzhev commented on GitHub (Mar 15, 2022): > RTL characters should be displayed in the exact order they were output, and not reversed I wish this was true. And also each character occupied exactly one cell. And no zero width. And no surrogates. And no clusters. And so on and so forth. Unfortunately, text processing is a PITA. > but it is not compatible with the original conhost and breaks genuine RTL-aware applications (which rely on characters being displayed exactly where they've been positioned) Then the new and shiny Windows Terminal is also not compatible with the original conhost and breaks such RTL-aware applications. Are there any complaints from their maintainers? Should it be fixed there too? And in conhost DX renderer? And in conemu, console2 and other similar frontends? Overall, my experience here is extremely limited, I don't work with RTL and can't say how it should be. @trexinc, I remember somewhat related discussions eons ago on the forum about how the console should behave with RTL to make life less painful. Do you have any opinion about this?
Author
Owner

@alabuzhev commented on GitHub (Mar 16, 2022):

Speaking about compatibility with the original conhost: as mentioned here, font fallback used to work in pre-Windows 7 days, when NtGdiConsoleTextOut was used. I've just checked this on Windows XP and RTL is also reversed there:

image

@alabuzhev commented on GitHub (Mar 16, 2022): Speaking about compatibility with the original conhost: as mentioned [here](https://github.com/microsoft/terminal/issues/10472#issuecomment-865238372), font fallback used to work in pre-Windows 7 days, when `NtGdiConsoleTextOut` was used. I've just checked this on Windows XP and RTL is also reversed there: ![image](https://user-images.githubusercontent.com/11453922/158492603-46bcd026-5770-43a4-bd4d-2f80d4e30260.png)
Author
Owner

@j4james commented on GitHub (Mar 16, 2022):

Then the new and shiny Windows Terminal is also not compatible with the original conhost and breaks such RTL-aware applications.

It's been fixed in the new atlas render.

@j4james commented on GitHub (Mar 16, 2022): > Then the new and shiny Windows Terminal is also not compatible with the original conhost and breaks such RTL-aware applications. It's been fixed in the new atlas render.
Author
Owner

@lhecker commented on GitHub (Mar 16, 2022):

@j4james I believe this doesn't work with font fallback. I think if you try to draw Japanese text for instance, it'll show just blank / whitespace glyphs.

As far as I can see the only way to resolve this issue, while having both, font fallback and broken RTL support, is to use ScriptItemize, then ScriptShape, ScriptPlace and finally ScriptTextOut. That way we can set fLogicalOrder in SCRIPT_ANALYSIS to TRUE, ensuring we skip glyph reordering (if I understand the docs correctly). I don't even see any undocumented escape hatches for ExtTextOutW internally unfortunately.

@alabuzhev Wait... Did you paste them exactly as \u05ea\u05d7\u05d0 into the XP console? I would be somewhat surprised if we had supported RTL reordering back then... But if it used to work, then I wonder what the actually correct path forward is.

@lhecker commented on GitHub (Mar 16, 2022): @j4james I believe this doesn't work with font fallback. I think if you try to draw Japanese text for instance, it'll show just blank / whitespace glyphs. As far as I can see the only way to resolve this issue, while having _both_, font fallback and broken RTL support, is to use `ScriptItemize`, then `ScriptShape`, `ScriptPlace` and finally `ScriptTextOut`. That way we can set `fLogicalOrder` in `SCRIPT_ANALYSIS` to `TRUE`, ensuring we skip glyph reordering (if I understand the docs correctly). I don't even see any undocumented escape hatches for `ExtTextOutW` internally unfortunately. @alabuzhev Wait... Did you paste them exactly as `\u05ea\u05d7\u05d0` into the XP console? I would be somewhat surprised if we had supported RTL reordering back then... But if it used to work, then I wonder what the actually correct path forward is.
Author
Owner

@alabuzhev commented on GitHub (Mar 16, 2022):

Did you paste them exactly as \u05ea\u05d7\u05d0 into the XP console?

Yes, this as is: תחא

Moreover:

image

@alabuzhev commented on GitHub (Mar 16, 2022): > Did you paste them exactly as \u05ea\u05d7\u05d0 into the XP console? Yes, this as is: `תחא` Moreover: ![image](https://user-images.githubusercontent.com/11453922/158494279-d7650261-9974-4964-88ef-7b39252149b7.png)
Author
Owner

@lhecker commented on GitHub (Mar 16, 2022):

It's been fixed in the new atlas render.

While I would love to take that compliment as the author of the engine as is, I have to confess that this is unfortunately more like a side-effect from me not implementing RTL/BiDi support at all. 😕
Up until today I simply assumed that people are really really disappointed in Windows not properly supporting BiDi text and glyph reordering. Practically most popular UNIX terminals reorder their glyphs after all... Also I can't quite imagine how manually reordered Arabic glyphs would work... But I guess applications rely on this now?


@alabuzhev Wow! This is impressive! On one hand I think I now understand that we'll likely have to revert the ExtTextOutW benefits at least in parts, so that we don't break applications which rely on our broken behavior, but on the other hand... Just wow! This makes me at least personally somewhat conflicted about re-breaking glyph reordering. 😅

@lhecker commented on GitHub (Mar 16, 2022): > It's been fixed in the new atlas render. While I would love to take that compliment as the author of the engine as is, I have to confess that this is unfortunately more like a side-effect from me not implementing RTL/BiDi support at all. 😕 Up until today I simply assumed that people are really really disappointed in Windows not properly supporting BiDi text and glyph reordering. Practically most popular UNIX terminals reorder their glyphs after all... Also I can't quite imagine how manually reordered Arabic glyphs would work... But I guess applications rely on this now? --- @alabuzhev Wow! This is impressive! On one hand I think I now understand that we'll likely have to revert the `ExtTextOutW` benefits at least in parts, so that we don't break applications which rely on our broken behavior, but on the other hand... Just wow! This makes me at least personally somewhat conflicted about re-breaking glyph reordering. 😅
Author
Owner

@alabuzhev commented on GitHub (Mar 16, 2022):

so that we don't break applications which rely on our broken behavior

That's kinda my point - do you already have dozens of reports like "things are broken, the world is falling apart, do something now"?
Support for anything non-ASCII in Windows Console has always been like "it depends" - on the OS version, current font, system locale, console codepage, output method, the phase of the Moon etc. Personally I haven't seen any applications even trying to cover non-trivial cases, but YMMV of course.

@alabuzhev commented on GitHub (Mar 16, 2022): > so that we don't break applications which rely on our broken behavior That's kinda my point - do you already have dozens of reports like "things are broken, the world is falling apart, do something *now*"? Support for anything non-ASCII in Windows Console has always been like "it depends" - on the OS version, current font, system locale, console codepage, output method, the phase of the Moon etc. Personally I haven't seen any applications even trying to cover non-trivial cases, but YMMV of course.
Author
Owner

@j4james commented on GitHub (Mar 16, 2022):

@j4james I believe this doesn't work with font fallback. I think if you try to draw Japanese text for instance, it'll show just blank / whitespace glyphs.

Yeah, you're right. I've just tested and that's not working for me either. Oh well.

I would be somewhat surprised if we had supported RTL reordering back then... But if it used to work, then I wonder what the actually correct path forward is.

Yeah, that's weird. It's definitely not reordering RTL characters for me in the legacy console.

But I guess applications rely on this now?

That would be because it's almost impossible to write an RTL application on terminals that don't work this way. Give it a try. See if you can write some basic RTL applications on one of those terminals that reorders RTL characters. Like a simple RTL form entry system, or something that pops up a dialog or drop-down menu over existing RTL text. Maybe I'm just an idiot, but I can't see how you can make that work, but it's fairly straightforward on terminals that leave RTL characters exactly where you put them.

@j4james commented on GitHub (Mar 16, 2022): > @j4james I believe this doesn't work with font fallback. I think if you try to draw Japanese text for instance, it'll show just blank / whitespace glyphs. Yeah, you're right. I've just tested and that's not working for me either. Oh well. > I would be somewhat surprised if we had supported RTL reordering back then... But if it used to work, then I wonder what the actually correct path forward is. Yeah, that's weird. It's definitely not reordering RTL characters for me in the legacy console. > But I guess applications rely on this now? That would be because it's almost impossible to write an RTL application on terminals that don't work this way. Give it a try. See if you can write some basic RTL applications on one of those terminals that reorders RTL characters. Like a simple RTL form entry system, or something that pops up a dialog or drop-down menu over existing RTL text. Maybe I'm just an idiot, but I can't see how you can make that work, but it's fairly straightforward on terminals that leave RTL characters exactly where you put them.
Author
Owner

@lhecker commented on GitHub (Mar 16, 2022):

So I think we have 3 options here with various benefits:

  1. Keep ExtTextOutW - No work needed
    According to Wikipedia's web statistics I can guess that about 50% of Windows users don't use Latin characters for their primary language and about 10% use RTL scripts. The addition of font fallback has a very far reaching positive impact on our users, which so far were unable to use fonts like Consolas.
  2. Revert to PolyTextOutW - Minutes of work required
    Normally stability trumps anything else when it comes to conhost. Keeping the output of glyphs in their logical order, ensures we don't accidentally break Hebrew TUI applications.
  3. Use Uniscribe manually - Potentially days of work required
    This would fix the issue and show glyphs in their logical order.

Regarding 2. and 3.: I'm pretty sure that this will re-break Arabic scripts, since those heavily lean on ligatures and glyph reordering to render correctly. So basically conhost using logical order and the TUI application writing the glyphs backwards manually will only (practically at least) work for Hebrew basically as far as I can see, since Hebrew is a bit like "Latin in RTL".
Allowing Hebrew to work with Bidi-aware TUI applications, but making it impossible (or very hard) to use Arabic correctly, despite the latter being 10x more common, leaves a bit of a bad aftertaste in my opinion. Personally I'm leaning towards breaking Bidi-aware TUI applications, but allowing Arabic users to read their language.

However I understand that we'd not want to take any chances in regressions and would thus consider opting for 2. @miniksa?

@lhecker commented on GitHub (Mar 16, 2022): So I think we have 3 options here with various benefits: 1. Keep `ExtTextOutW` - No work needed According to Wikipedia's web statistics I can guess that about 50% of Windows users don't use Latin characters for their primary language and about 10% use RTL scripts. The addition of font fallback has a very far reaching positive impact on our users, which so far were unable to use fonts like Consolas. 2. Revert to `PolyTextOutW` - Minutes of work required Normally stability trumps anything else when it comes to conhost. Keeping the output of glyphs in their logical order, ensures we don't accidentally break Hebrew TUI applications. 3. Use Uniscribe manually - Potentially days of work required This would fix the issue and show glyphs in their logical order. Regarding 2. and 3.: I'm pretty sure that this will re-break Arabic scripts, since those heavily lean on ligatures _and_ glyph reordering to render correctly. So basically conhost using logical order and the TUI application writing the glyphs backwards manually will only (practically at least) work for Hebrew basically as far as I can see, since Hebrew is a bit like "Latin in RTL". Allowing Hebrew to work with Bidi-aware TUI applications, but making it impossible (or very hard) to use Arabic correctly, despite the latter being 10x more common, leaves a bit of a bad aftertaste in my opinion. Personally I'm leaning towards breaking Bidi-aware TUI applications, but allowing Arabic users to read their language. However I understand that we'd not want to take any chances in regressions and would thus consider opting for 2. @miniksa?
Author
Owner

@miniksa commented on GitHub (Mar 16, 2022):

As far as I can discern, no one ever actually concerned themselves with Arabic nor Hebrew support in the console host. The targeted languages were basically LTR European type character sets + the CJK trio. Beyond that... it looks like anything else that worked or didn't was a happy accident.

Furthermore, when our localization team tells us what languages we can pay for in terms of translations for developer utilities today... they limit it to: German, English, Spanish, French, Italian, Japanese, Korean, Brazilian Portuguese, Russian, Simplified Chinese, and Traditional Chinese. I'd therefore have to believe to some degree that research was performed to determine that was the appropriate balance between resources and developer market was to focus on those languages.

Therefore, my consideration here is happiness of those languages as primary goal with anything else being secondary.

Further, one of the most popular issues filed against conhost.exe is the lack of font fallback for Chinese, Japanese, and Korean languages. Switching to ExtTextOut (Option 1) to restore font fallback, therefore, dramatically reduced our inbound bug flow and solved an issue for four of the targeted languages.
An issue I've never seen filed in Feedback Hub, directly from our OEM customers, our business partners, or otherwise in the last 7-8 years of working on this is anything about Hebrew or Arabic. I know that's super scientific... to rely on my past experience.

But with the combination of those reasons, I would have to personally opt for Option 1.

I would offer to @lhecker, if he's interested, that next week is our organization's "Fix Hack Learn" week again. If he wants to spend a few days hacking Option 3 using Uniscribe to solve this problem and learn more about language processing... he would be free to do so. I think it would be better, though, long-term to focus efforts on supporting those languages fully in the Terminal and the Atlas renderer.

The discussion can continue, I'm not shutting it down. This is just my opinion on the situation.

@miniksa commented on GitHub (Mar 16, 2022): As far as I can discern, no one ever actually concerned themselves with Arabic nor Hebrew support in the console host. The targeted languages were basically LTR European type character sets + the CJK trio. Beyond that... it looks like anything else that worked or didn't was a happy accident. Furthermore, when our localization team tells us what languages we can pay for in terms of translations for developer utilities today... they limit it to: German, English, Spanish, French, Italian, Japanese, Korean, Brazilian Portuguese, Russian, Simplified Chinese, and Traditional Chinese. I'd therefore have to believe to some degree that research was performed to determine that was the appropriate balance between resources and developer market was to focus on those languages. Therefore, my consideration here is happiness of those languages as primary goal with anything else being secondary. Further, one of the most popular issues filed against `conhost.exe` is the lack of font fallback for Chinese, Japanese, and Korean languages. Switching to `ExtTextOut` (**Option 1**) to restore font fallback, therefore, dramatically reduced our inbound bug flow and solved an issue for four of the targeted languages. An issue I've never seen filed in Feedback Hub, directly from our OEM customers, our business partners, or otherwise in the last 7-8 years of working on this is anything about Hebrew or Arabic. I know that's super scientific... to rely on my past experience. But with the combination of those reasons, I would have to personally opt for **Option 1.** I would offer to @lhecker, if he's interested, that next week is our organization's "Fix Hack Learn" week again. If he wants to spend a few days hacking **Option 3** using Uniscribe to solve this problem and learn more about language processing... he would be free to do so. I think it would be better, though, long-term to focus efforts on supporting those languages fully in the Terminal and the Atlas renderer. The discussion can continue, I'm not shutting it down. This is just my opinion on the situation.
Author
Owner

@j4james commented on GitHub (Mar 16, 2022):

I have a suggestion for another possible solution which may keep everyone happy.

We carry on using ExtTextOut, but if we detect that the string contains RTL characters, we switch to a slower rendering branch that outputs one character at a time. That way the characters should all be displayed in the right place (by which I mean they won't be reordered).

@j4james commented on GitHub (Mar 16, 2022): I have a suggestion for another possible solution which may keep everyone happy. We carry on using `ExtTextOut`, but if we detect that the string contains RTL characters, we switch to a slower rendering branch that outputs one character at a time. That way the characters should all be displayed in the right place (by which I mean they won't be reordered).
Author
Owner

@lhecker commented on GitHub (Mar 16, 2022):

How do you detect runs of RTL glyphs? If the answer is Uniscribe, I think we can just go all the way and use it for text drawing too...
Especially since we can't output them characterwise, as that would break ligatures, ZWJ, etc. and all the other fun Unicode stuff. Rendering glyphs in their logical order with Unicode support is only possible if we opt into Uniscribe 100% I think.

@lhecker commented on GitHub (Mar 16, 2022): How do you detect runs of RTL glyphs? If the answer is Uniscribe, I think we can just go all the way and use it for text drawing too... Especially since we can't output them characterwise, as that would break ligatures, ZWJ, etc. and all the other fun Unicode stuff. Rendering glyphs in their logical order with Unicode support is only possible if we opt into Uniscribe 100% I think.
Author
Owner

@j4james commented on GitHub (Mar 17, 2022):

How do you detect runs of RTL glyphs?

I was thinking of something simple like a range check. Haven't looked at the unicode blocks in detail, but you could start with everything from U+0590 to U+08FF, and maybe another block covering supplementals. It doesn't really matter if we get false positives, because they're still going to render correctly - just a bit slower - and they ought to be rare.

Especially since we can't output them characterwise, as that would break ligatures, ZWJ, etc. and all the other fun Unicode stuff.

OK ligatures I can see being a problem for Arabic text. I've only really dealt with Hebrew so I'm not sure how well that works. The other unicode stuff seems less of a problem. Does any of that stuff work now? Are we ever expecting it to work in the GDI renderer?

@j4james commented on GitHub (Mar 17, 2022): > How do you detect runs of RTL glyphs? I was thinking of something simple like a range check. Haven't looked at the unicode blocks in detail, but you could start with everything from `U+0590` to `U+08FF`, and maybe another block covering supplementals. It doesn't really matter if we get false positives, because they're still going to render correctly - just a bit slower - and they ought to be rare. > Especially since we can't output them characterwise, as that would break ligatures, ZWJ, etc. and all the other fun Unicode stuff. OK ligatures I can see being a problem for Arabic text. I've only really dealt with Hebrew so I'm not sure how well that works. The other unicode stuff seems less of a problem. Does any of that stuff work now? Are we ever expecting it to work in the GDI renderer?
Author
Owner

@lhecker commented on GitHub (Mar 17, 2022):

Does any of that stuff work now? Are we ever expecting it to work in the GDI renderer?

Yeah it does! While Uniscribe doesn't seem to support "liga" in fonts, it does correctly handle ligatures in most languages. You gotta say, this is pretty nice to see in good old conhost, right?
(The output isn't perfect mind you, but this is still really good IMO...)

image

I'd be quite sad if we lost that, but I'd understand that it'd be for a good cause.
I've already told @miniksa that I'll try to attempt to implement a solution for logical glyph ordering next week. I'm at least curious how much we'd "loose" if we disable glyph reordering (aka use "logical ordering"), since I'm not a Unicode expert and I can only guestimate that it'd probably break Arabic without any possibility to fix it in a TUI application.

Given that most terminals apart from xterm seem to not draw glyphs in their logical order either, I do wonder however whether it's reasonable to merge my fix, even if I submit such a PR later.
I'd go with your opinion @j4james since you're vastly more experienced in this field than me, but I get the feeling that we'd be more consistent with other terminals if we'd actively not support TUI applications implementing their own BiDi (by reordering characters themselves) even if it breaks the TUI's layout...

BTW for my own curiosity: Do you happen have a specific application at hand that positions Hebrew text manually? This would allow me to better test my Uniscribe- (or char-range-) experiments.

@lhecker commented on GitHub (Mar 17, 2022): > Does any of that stuff work now? Are we ever expecting it to work in the GDI renderer? Yeah it does! While Uniscribe doesn't seem to support "liga" in fonts, it does correctly handle ligatures in most languages. You gotta say, this is pretty nice to see in good old conhost, right? (The output isn't perfect mind you, but this is still really good IMO...) ![image](https://user-images.githubusercontent.com/2256941/158717532-9bf272b6-3b7c-4217-bc1c-a9da2bdb1495.png) I'd be quite sad if we lost that, but I'd understand that it'd be for a good cause. I've already told @miniksa that I'll try to attempt to implement a solution for logical glyph ordering next week. I'm at least curious how much we'd "loose" if we disable glyph reordering (aka use "logical ordering"), since I'm not a Unicode expert and I can only guestimate that it'd probably break Arabic without any possibility to fix it in a TUI application. Given that most terminals apart from xterm seem to not draw glyphs in their logical order either, I do wonder however whether it's reasonable to merge my fix, even if I submit such a PR later. I'd go with your opinion @j4james since you're vastly more experienced in this field than me, but I get the feeling that we'd be more consistent with other terminals if we'd actively not support TUI applications implementing their own BiDi (by reordering characters themselves) even if it breaks the TUI's layout... BTW for my own curiosity: Do you happen have a specific application at hand that positions Hebrew text manually? This would allow me to better test my Uniscribe- (or char-range-) experiments.
Author
Owner

@j4james commented on GitHub (Mar 17, 2022):

Yeah it does! While Uniscribe doesn't seem to support "liga" in fonts, it does correctly handle ligatures in most languages. You gotta say, this is pretty nice to see in good old conhost, right?

Yeah, I saw the ligatures were working. I meant the other things you were refering to when you said "ZWJ, etc. and all the other fun Unicode stuff".

And while it looks nice at first glance, it's not particularly useful as is as far I'm concerned. You've got no hope of editing the text - all it's really good for is displaying a single line of content at best.

I'd be quite sad if we lost that, but I'd understand that it'd be for a good cause.

Yeah, ideally we'd have a solution that was realistically usable and also looked pretty, but I don't know how feasible that is for languages with ligatures. I thought with something like Arabic, an application might be able to output the appropriate form of each character manually, which might make up for the loss of ligature support, but I don't know enough about the subject to know if that's nonsense.

Given that most terminals apart from xterm seem to not draw glyphs in their logical order either

I wouldn't have said "most", but I haven't checked recently. And for those that do draw the glyphs in RTL order, there's not a standard of any sort that they're following - they all do things differently. Thankfully some of them at least have a way to turn that functionality off.

if we'd actively not support TUI applications implementing their own BiDi

You realise that just means we're saying we don't support TUI applications fullstop (at least for BiDi languages). If that's the route we want to take, that's fine - I seem to be in the minority in wanting support for RTL TUI apps. I'd just like to know for definite where we're going with this, so I can make my own plans accordingly.

BTW for my own curiosity: Do you happen have a specific application at hand that positions Hebrew text manually?

Well there is the command line utility from fribidi, which can be used as a kind bidi-aware version of cat (amongst other things). And there's also a Hebrew mode in vim (i.e. vim -H). The other applications I have are unfortunately not open source.

@j4james commented on GitHub (Mar 17, 2022): > Yeah it does! While Uniscribe doesn't seem to support "liga" in fonts, it does correctly handle ligatures in most languages. You gotta say, this is pretty nice to see in good old conhost, right? Yeah, I saw the ligatures were working. I meant the other things you were refering to when you said "ZWJ, etc. and all the other fun Unicode stuff". And while it looks nice at first glance, it's not particularly useful as is as far I'm concerned. You've got no hope of editing the text - all it's really good for is displaying a single line of content at best. > I'd be quite sad if we lost that, but I'd understand that it'd be for a good cause. Yeah, ideally we'd have a solution that was realistically usable and also looked pretty, but I don't know how feasible that is for languages with ligatures. I thought with something like Arabic, an application might be able to output the appropriate form of each character manually, which might make up for the loss of ligature support, but I don't know enough about the subject to know if that's nonsense. > Given that most terminals apart from xterm seem to not draw glyphs in their logical order either I wouldn't have said "most", but I haven't checked recently. And for those that do draw the glyphs in RTL order, there's not a standard of any sort that they're following - they all do things differently. Thankfully some of them at least have a way to turn that functionality off. > if we'd actively not support TUI applications implementing their own BiDi You realise that just means we're saying we don't support TUI applications fullstop (at least for BiDi languages). If that's the route we want to take, that's fine - I seem to be in the minority in wanting support for RTL TUI apps. I'd just like to know for definite where we're going with this, so I can make my own plans accordingly. > BTW for my own curiosity: Do you happen have a specific application at hand that positions Hebrew text manually? Well there is the command line utility from [fribidi](https://github.com/fribidi/fribidi), which can be used as a kind bidi-aware version of `cat` (amongst other things). And there's also a Hebrew mode in vim (i.e. `vim -H`). The other applications I have are unfortunately not open source.
Author
Owner

@lhecker commented on GitHub (Mar 17, 2022):

So instead of creating a Unicode standard at some point to standardize the (cell) width per grapheme cluster, applications started to straight up write "characters" in reverse. That's actually scary. 😨

Thanks for the tip with vim -H. I wasn't aware about that functionality.
You're also right that other terminals make this configurable.

In either case I'm convinced now and will make sure to build something that restores the previous behavior as soon as I can. I mean I already planned to do it, but now I'm doing it out of conviction. 😄
I don't think I'll go with your idea however (drawing text character-wise if RTL is detected), as I think that's categorically the wrong approach. At least I'm pretty sure from all I know, that character-wise drawing has significant flaws. The simplest example for that I can think of are ligatures again, where א‎ and ל‎ are two separate characters in Hebrew, but can form ﭏ if written next to each other. (I seriously wonder how vim -H supports that... I guess they just assume the font doesn't support this, since ﭏ is old Hebrew.)
Uniscribe is generally a lot faster than DirectWrite and I don't think we'll run into any performance problems any time soon, even if we make full use of it here. This is especially so, since ExtTextOut is implemented in terms of Uniscribe anyways.

I know approximately how to draw Unicode with Uniscribe, but it'll probably take me ages to integrate that into the GdiEngine... Implementing the ligature and wide glyph support in AtlasEngine was simple since it was a new project after all (which is why it handles Emojis with ZWJs for instance).
But I'll manage... I hope. 😅

@lhecker commented on GitHub (Mar 17, 2022): So instead of creating a Unicode standard at some point to standardize the (cell) width per grapheme cluster, applications started to straight up write "characters" in reverse. That's actually scary. 😨 Thanks for the tip with `vim -H`. I wasn't aware about that functionality. You're also right that other terminals make this configurable. In either case I'm convinced now and will make sure to build something that restores the previous behavior as soon as I can. I mean I already planned to do it, but now I'm doing it out of conviction. 😄 I don't think I'll go with your idea however (drawing text character-wise if RTL is detected), as I think that's categorically the wrong approach. At least I'm pretty sure from all I know, that character-wise drawing has significant flaws. The simplest example for that I can think of are ligatures again, where א‎ and ל‎ are two separate characters in Hebrew, but can form ﭏ if written next to each other. (I seriously wonder how `vim -H` supports that... I guess they just assume the font doesn't support this, since ﭏ is old Hebrew.) Uniscribe is generally a lot faster than DirectWrite and I don't think we'll run into any performance problems any time soon, even if we make full use of it here. This is especially so, since `ExtTextOut` is implemented in terms of Uniscribe anyways. I know approximately how to draw Unicode with Uniscribe, but it'll probably take me ages to integrate that into the GdiEngine... Implementing the ligature and wide glyph support in AtlasEngine was simple since it was a new project after all (which is why it handles Emojis with ZWJs for instance). But I'll manage... I hope. 😅
Author
Owner

@j4james commented on GitHub (Mar 17, 2022):

I don't think I'll go with your idea however (drawing text character-wise if RTL is detected), as I think that's categorically the wrong approach.

I agree with you there. I just thought it might be better than nothing, but if there's a way to do things properly with Uniscribe, I would be thrilled.

One thing you may need to watch out for when testing, is support for horizontal scrolling in conhost (disable the "wrap text output on resize" option, and make the buffer size wider than the window size). When the viewport isn't at the left margin, it can start rendering half way through the buffer, which breaks things completely in the current implement when RTL text is reordered.

Hopefully it won't be a problem if we're going back to rendering in logical order again, but you may get some weird artifacts when ligatures are split across the viewport border. You'll likely have similar problems when selecting text, and cursoring over text (depending on the cursor type). But I don't think it's the end of the world if we don't have all the edge cases working perfectly to start with.

Also note that the DECDWL double-width sequence has an effect on the horizontal offsets in the viewport, so that needs to be accounted for too.

And one last tip for testing. If, like me, you don't speak any RTL languages, I've found it helpful to use nonsense Hebrew content that looks vaguely like English, so I can more easily tell when something has gone wrong.

For example, the phrase below looks a bit like "young puppy won't nip on jogging pony".

ץחסק פחופפסנ חס קוח ז׳חסש ץקקטק פחטסץ

And the equivalent reversed text (which should look correct when the renderer doesn't do RTL reordering):

ץסטחפ קטקקץ שסח׳ז חוק סח נספפוחפ קסחץ
@j4james commented on GitHub (Mar 17, 2022): > I don't think I'll go with your idea however (drawing text character-wise if RTL is detected), as I think that's categorically the wrong approach. I agree with you there. I just thought it might be better than nothing, but if there's a way to do things properly with Uniscribe, I would be thrilled. One thing you may need to watch out for when testing, is support for horizontal scrolling in conhost (disable the "wrap text output on resize" option, and make the buffer size wider than the window size). When the viewport isn't at the left margin, it can start rendering half way through the buffer, which breaks things completely in the current implement when RTL text is reordered. Hopefully it won't be a problem if we're going back to rendering in logical order again, but you may get some weird artifacts when ligatures are split across the viewport border. You'll likely have similar problems when selecting text, and cursoring over text (depending on the cursor type). But I don't think it's the end of the world if we don't have all the edge cases working perfectly to start with. Also note that the `DECDWL` double-width sequence has an effect on the horizontal offsets in the viewport, so that needs to be accounted for too. And one last tip for testing. If, like me, you don't speak any RTL languages, I've found it helpful to use nonsense Hebrew content that looks vaguely like English, so I can more easily tell when something has gone wrong. For example, the phrase below looks a bit like "young puppy won't nip on jogging pony". ``` ץחסק פחופפסנ חס קוח ז׳חסש ץקקטק פחטסץ ``` And the equivalent reversed text (which should look correct when the renderer doesn't do RTL reordering): ``` ץסטחפ קטקקץ שסח׳ז חוק סח נספפוחפ קסחץ ```
Author
Owner

@lhecker commented on GitHub (Mar 18, 2022):

@j4james Damn that was almost too easy - took like 5 minutes: https://github.com/microsoft/terminal/compare/dev/lhecker/12294-bidi-override

diff --git a/src/renderer/gdi/paint.cpp b/src/renderer/gdi/paint.cpp
index 598ce3489..bc4c4e95a 100644
--- a/src/renderer/gdi/paint.cpp
+++ b/src/renderer/gdi/paint.cpp
@@ -445,9 +445,21 @@ using namespace Microsoft::Console::Render;
         for (size_t i = 0; i != _cPolyText; ++i)
         {
             const auto& t = _pPolyText[i];
-            if (!ExtTextOutW(_hdcMemoryContext, t.x, t.y, t.uiFlags, &t.rcl, t.lpstr, t.n, t.pdx))
+
+            SCRIPT_STATE ss{};
+            ss.fOverrideDirection = TRUE;
+
+            SCRIPT_STRING_ANALYSIS ssa;
+            hr = ScriptStringAnalyse(_hdcMemoryContext, t.lpstr, t.n, 0, -1, SSA_GLYPHS | SSA_FALLBACK | SSA_LINK, 0, nullptr, &ss, t.pdx, nullptr, nullptr, &ssa);
+            if (FAILED(hr))
+            {
+                break;
+            }
+
+            hr = ScriptStringOut(ssa, t.x, t.y, t.uiFlags, &t.rcl, 0, 0, FALSE);
+            ScriptStringFree(&ssa);
+            if (FAILED(hr))
             {
-                hr = E_FAIL;
                 break;
             }
         }

I think typing this message took longer than writing that code. 😄

ScriptStringAnalyse is a handy function that calls ScriptItemize, ScriptShape, ScriptPlace, and ScriptBreak for you.
Due to the lack of batching this approach is a lot slower than ExtTextOut though. My plan is to call those 4 functions myself (well 3, because we don't really need ScriptBreak) and call ScriptIsComplex. If it's false I can just straight up call TextExtOut to ensure the expected performance in the general case.

If I can't make it for whatever reason though, I think this is what we could ship, since it works.

@lhecker commented on GitHub (Mar 18, 2022): @j4james Damn that was almost too easy - took like 5 minutes: https://github.com/microsoft/terminal/compare/dev/lhecker/12294-bidi-override ```diff diff --git a/src/renderer/gdi/paint.cpp b/src/renderer/gdi/paint.cpp index 598ce3489..bc4c4e95a 100644 --- a/src/renderer/gdi/paint.cpp +++ b/src/renderer/gdi/paint.cpp @@ -445,9 +445,21 @@ using namespace Microsoft::Console::Render; for (size_t i = 0; i != _cPolyText; ++i) { const auto& t = _pPolyText[i]; - if (!ExtTextOutW(_hdcMemoryContext, t.x, t.y, t.uiFlags, &t.rcl, t.lpstr, t.n, t.pdx)) + + SCRIPT_STATE ss{}; + ss.fOverrideDirection = TRUE; + + SCRIPT_STRING_ANALYSIS ssa; + hr = ScriptStringAnalyse(_hdcMemoryContext, t.lpstr, t.n, 0, -1, SSA_GLYPHS | SSA_FALLBACK | SSA_LINK, 0, nullptr, &ss, t.pdx, nullptr, nullptr, &ssa); + if (FAILED(hr)) + { + break; + } + + hr = ScriptStringOut(ssa, t.x, t.y, t.uiFlags, &t.rcl, 0, 0, FALSE); + ScriptStringFree(&ssa); + if (FAILED(hr)) { - hr = E_FAIL; break; } } ``` I think typing this message took longer than writing that code. 😄 [ScriptStringAnalyse](https://docs.microsoft.com/en-us/windows/win32/api/usp10/nf-usp10-scriptstringanalyse) is a handy function that calls [ScriptItemize](https://docs.microsoft.com/en-us/windows/desktop/api/usp10/nf-usp10-scriptitemize), [ScriptShape](https://docs.microsoft.com/en-us/windows/desktop/api/usp10/nf-usp10-scriptshape), [ScriptPlace](https://docs.microsoft.com/en-us/windows/desktop/api/usp10/nf-usp10-scriptplace), and [ScriptBreak](https://docs.microsoft.com/en-us/windows/desktop/api/usp10/nf-usp10-scriptbreak) for you. Due to the lack of batching this approach is a lot slower than `ExtTextOut` though. My plan is to call those 4 functions myself (well 3, because we don't really need ScriptBreak) and call [ScriptIsComplex](https://docs.microsoft.com/en-us/windows/desktop/api/Usp10/nf-usp10-scriptiscomplex). If it's false I can just straight up call `TextExtOut` to ensure the expected performance in the general case. If I can't make it for whatever reason though, I think this is what we could ship, since it works.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#16547