Dedicated "paint" interface for VtRenderer #14441

Closed
opened 2026-01-31 04:10:26 +00:00 by claunia · 11 comments
Owner

Originally created by @skyline75489 on GitHub (Jul 9, 2021).

Description

As shown in #10563, ConPTY suffers from significant performance drop with too much scrolling request. A typical repro is:

time bash -c 'yes | head -n1000000'

This is a typical trace of OpenConsole.exe under the above workload:

image

Discussion

The main issue that caused this, I think, is that VtRenderer suffers from using the same interface as other graphic renderer. To break it down:

  1. Carriage return & linefeed can trigger graphic renderer to circling the buffer & force repaint. But for VtRenderer, it dosen't really need to do too much, other than print the newly added line. It doesn't really need to flush the buffer (EndPaint as shown above), because the actual text added to the buffer is only a single line (120 chars in a typical display).
  2. The _PaintBufferOutputHelper method insider renderer is obviously graphic-oriented. After all these years of development, I don't think it suits the need for VtRenderer anymore. An example is #10567, in which I try to reduce the cost of reassemble the line buffer. The reason why we need to reassemble the clusters, is because they are produced by splitting up the original buffer in _PaintBufferOutputHelper, which is heavily required by graphic renderers. If we have a dedicated interface, unnecessary operations like these can be eliminated.
  3. In VtEngine & XtermEngine, the actual number of methods that're both overridden and practically useful is rather small, comparing to the large amount of base methods in IRenderEngine. This indicates that the fundamental goal of VtEngine isn't really the same as the graphic renderers.

Proposed solution

I'm proposing a three-phase refactoring process for VtEngine to move away from IRenderEngine:

  1. Separate the methods in VtEngine that are heavily burdened by the current design, such as _PaintBufferOutputHelper. We add them as base methods to IRenderEngine. VtEngine is still a child class of IRenderEngine, but we add dedicated IO methods in it and test the performance gain.
  2. Separate the rest the methods in VtEngine same way as above.
  3. When we finished the first two steps, we should have a clear path towards adding a dedicated interface just for VtEngine, probably name it IVtEngine. Then we can create IVtEngine from IRenderEngine, and let IRenderEngine be what's left in it.

CC @DHowett @miniksa @zadjii-msft @lhecker @j4james for discussion.

(How can I just At everyone who's collaborator of this project. That would be helpful.)

Originally created by @skyline75489 on GitHub (Jul 9, 2021). ### Description As shown in #10563, ConPTY suffers from significant performance drop with too much scrolling request. A typical repro is: ```bash time bash -c 'yes | head -n1000000' ``` This is a typical trace of `OpenConsole.exe` under the above workload: ![image](https://user-images.githubusercontent.com/4710575/125062000-809cfc00-e0e0-11eb-9d0d-c9ec9e95884b.png) ### Discussion The main issue that caused this, I think, is that VtRenderer suffers from using the same interface as other graphic renderer. To break it down: 1. Carriage return & linefeed can trigger graphic renderer to circling the buffer & force repaint. But for VtRenderer, it dosen't really need to do too much, other than print the newly added line. It doesn't really need to flush the buffer (`EndPaint` as shown above), because the actual text added to the buffer is only a single line (120 chars in a typical display). 2. The `_PaintBufferOutputHelper` method insider `renderer` is obviously graphic-oriented. After all these years of development, I don't think it suits the need for VtRenderer anymore. An example is #10567, in which I try to reduce the cost of reassemble the line buffer. The reason why we need to reassemble the clusters, is because they are produced by splitting up the original buffer in `_PaintBufferOutputHelper`, which is heavily required by graphic renderers. If we have a dedicated interface, unnecessary operations like these can be eliminated. 3. In `VtEngine` & `XtermEngine`, the actual number of methods that're both overridden and practically useful is rather small, comparing to the large amount of base methods in `IRenderEngine`. This indicates that the fundamental goal of `VtEngine` isn't really the same as the graphic renderers. ### Proposed solution I'm proposing a three-phase refactoring process for VtEngine to move away from `IRenderEngine`: 1. Separate the methods in `VtEngine` that are heavily burdened by the current design, such as `_PaintBufferOutputHelper`. We add them as base methods to `IRenderEngine`. `VtEngine` is still a child class of `IRenderEngine`, but we add dedicated IO methods in it and test the performance gain. 1. Separate the rest the methods in `VtEngine` same way as above. 1. When we finished the first two steps, we should have a clear path towards adding a dedicated interface just for `VtEngine`, probably name it `IVtEngine`. Then we can create `IVtEngine` from `IRenderEngine`, and let `IRenderEngine` be what's left in it. CC @DHowett @miniksa @zadjii-msft @lhecker @j4james for discussion. (How can I just At everyone who's collaborator of this project. That would be helpful.)
claunia added the Issue-FeatureNeeds-TriageArea-VTProduct-Conpty labels 2026-01-31 04:10:27 +00:00
Author
Owner

@skyline75489 commented on GitHub (Jul 9, 2021):

Ideally I would really want this to be a smooth transition just as my previous DxFontRenderData related work. But for this particular issue, the difficulty is obvious. Personally I'm not very familiar with legacy conhost code, which is where I think most of the obstacles reside. I could really use some help from you guys on this. I hate to break legacy code and cause troubles for the team.

I'm expecting 50x-100x increment in throughput out of this task, which should be easily measurable after the first step is finished.

@skyline75489 commented on GitHub (Jul 9, 2021): Ideally I would really want this to be a smooth transition just as my previous `DxFontRenderData` related work. But for this particular issue, the difficulty is obvious. Personally I'm not very familiar with legacy conhost code, which is where I think most of the obstacles reside. I could really use some help from you guys on this. I hate to break legacy code and cause troubles for the team. I'm expecting 50x-100x increment in throughput out of this task, which should be easily measurable after the first step is finished.
Author
Owner

@miniksa commented on GitHub (Jul 9, 2021):

In your trace... can you expand EndPaint? Is it the WriteFile call that is making it take so long?

I don't necessarily want to start from the conclusion that we need to throw the whole architecture out when it could just be that we need to decouple the blocking write action from the rest of the processing that generates text.

@miniksa commented on GitHub (Jul 9, 2021): In your trace... can you expand `EndPaint`? Is it the `WriteFile` call that is making it take so long? I don't necessarily want to start from the conclusion that we need to throw the whole architecture out when it could just be that we need to decouple the blocking write action from the rest of the processing that generates text.
Author
Owner

@lhecker commented on GitHub (Jul 9, 2021):

An interim fix would indeed be to add a buffer like the til::spsc queue in between the renderer and WriteFile call.

In the long term it might be worthwhile though to see if VT handling without the renderer lends itself to a simpler implementation, as we wouldn't be constrained to any drawing-oriented logic anymore.
For instance printing a newline at the moment necessarily triggers a VT draw (as can be seen above), which is quite heavy, requires all sorts of class instantiations and flushes the output. Preventing this means adding more special case logic. Inversely the drawing-oriented renderer code is bound to also support the VT renderer and its quirks. We basically have to fit a single shoe to all feet at the moment.
If we didn't use the renderer infrastructure for VT, we could potentially reduce code complexity and quite likely improve performance rather effectively.

Basically what I'm saying is: If we add a buffer and decouple the renderer and WriteFile call, isn't this a "band aid fix", because the architecture doesn't support what we want to do? Solving this from the ground up with our 2021 knowledge how we need to do things, might allow us to drastically simplify our code.

@lhecker commented on GitHub (Jul 9, 2021): An interim fix would indeed be to add a buffer like the `til::spsc` queue in between the renderer and `WriteFile` call. In the long term it might be worthwhile though to see if VT handling without the renderer lends itself to a simpler implementation, as we wouldn't be constrained to any drawing-oriented logic anymore. For instance printing a newline at the moment necessarily triggers a VT draw (as can be seen above), which is quite heavy, requires all sorts of class instantiations and flushes the output. Preventing this means adding more special case logic. Inversely the drawing-oriented renderer code is bound to also support the VT renderer and its quirks. We basically have to fit a single shoe to all feet at the moment. If we didn't use the renderer infrastructure for VT, we could potentially reduce code complexity and quite likely improve performance rather effectively. Basically what I'm saying is: If we add a buffer and decouple the renderer and `WriteFile` call, isn't this a "band aid fix", because the architecture doesn't support what we want to do? Solving this from the ground up with our 2021 knowledge how we need to do things, might allow us to drastically simplify our code.
Author
Owner

@DHowett commented on GitHub (Jul 9, 2021):

In the long term it might be worthwhile though to see if VT handling without the renderer lends itself to a simpler implementation, as we wouldn't be constrained to any drawing-oriented logic anymore.

The Vt Rendering engine rather concerns itself with turning a console buffer--which can be written by applications using any of the Win32 console APIs--into a stream of VT that represents that buffer. Yes, we can probably add a path to each API implementation that does something different if conhost is in pty mode, but I don't expect that it's going to provide the correctness (or peri-correctness) that snapping the buffer and presenting it as though it is a screen does.

conhost has to soak up a lot of weird stuff like writing an attribute (only!) over a range of characters (for which there's no equivalent VT sequence), copying a region from position A to position B, etc.

@DHowett commented on GitHub (Jul 9, 2021): > In the long term it might be worthwhile though to see if VT handling without the renderer lends itself to a simpler implementation, as we wouldn't be constrained to any drawing-oriented logic anymore. The Vt Rendering engine rather concerns itself with turning a console buffer--which can be written by applications using any of the Win32 console APIs--into a stream of VT that represents that buffer. Yes, we can probably add a path to each API implementation that does something different if conhost is in pty mode, but I don't expect that it's going to provide the correctness (or peri-correctness) that snapping the buffer and presenting it as though it is a screen does. conhost has to soak up a lot of weird stuff like writing an attribute (only!) over a range of characters (for which there's no equivalent VT sequence), copying a region from position A to position B, etc.
Author
Owner

@miniksa commented on GitHub (Jul 9, 2021):

I just always thought the long term VT handling without the renderer framework was VT pass through mode... not rearchitecting the VT renderer.

This is a discussion though so I'm not ruling anything out.

@miniksa commented on GitHub (Jul 9, 2021): I just always thought the long term VT handling without the renderer framework was VT pass through mode... not rearchitecting the VT renderer. This is a discussion though so I'm not ruling anything out.
Author
Owner

@skyline75489 commented on GitHub (Jul 9, 2021):

To Dustin: I like the idea of VtEngine is subclass of IRenderEngine because I think it’s cool and also reasonable. The thing is, there’s also cons in this approach, which is what I listed above. I’m OK with still considering IVtEngine a similar interface as IRenderEngine, because this is the way to minimize the changeset and preserve correctness. I would love to see special handling for IVtEngine in renderer.cpp though, because it provides us with chances for all those optimization.

To Michael: yes WriteFile is one of the things that take long time. But the thing l hate the most (and takes double the time than WritrFile) is in PaintBufferLineHelper. For the example in the issue, if you print 1000000 new line, with 120 chars in a line, you having to call a bunch of code for 120000000 times in PaintBufferLineHelper, some of them are about fetching patternId, which provides no functionality in VtEngine. This is why I feel “there’s something wrong arch in the architecture”. I also consider ConPTY passthrough as the eventual plan. This proposal here is more of a low-hanging fruit comparing to ConPTY passthrough, which requires zero modification in the client side(for example Alacritty).

To Leonard: I was asking Michael about the spsc just yesterday. Michael has his own idea about that so I can’t speak for him.

To everyone: no I am not losing sleep because of the controversy. I am only having trouble with a mosquito or two that I really have to kill now.

获取 Outlook for iOShttps://aka.ms/o0ukef

@skyline75489 commented on GitHub (Jul 9, 2021): To Dustin: I like the idea of VtEngine is subclass of IRenderEngine because I think it’s cool and also reasonable. The thing is, there’s also cons in this approach, which is what I listed above. I’m OK with still considering IVtEngine a similar interface as IRenderEngine, because this is the way to minimize the changeset and preserve correctness. I would love to see special handling for IVtEngine in renderer.cpp though, because it provides us with chances for all those optimization. To Michael: yes WriteFile is one of the things that take long time. But the thing l hate the most (and takes double the time than WritrFile) is in PaintBufferLineHelper. For the example in the issue, if you print 1000000 new line, with 120 chars in a line, you having to call a bunch of code for 120000000 times in PaintBufferLineHelper, some of them are about fetching patternId, which provides no functionality in VtEngine. This is why I feel “there’s something wrong arch in the architecture”. I also consider ConPTY passthrough as the eventual plan. This proposal here is more of a low-hanging fruit comparing to ConPTY passthrough, which requires zero modification in the client side(for example Alacritty). To Leonard: I was asking Michael about the spsc just yesterday. Michael has his own idea about that so I can’t speak for him. To everyone: no I am not losing sleep because of the controversy. I am only having trouble with a mosquito or two that I really have to kill now. 获取 Outlook for iOS<https://aka.ms/o0ukef>
Author
Owner

@miniksa commented on GitHub (Jul 9, 2021):

There are other deficiencies in the architecture which we may be able to capture in a revision with something like flags or stages that each individual engine can opt into or out of to reduce the amount of work that is done in the parent while still maintaining the advantages of a unified interface.

For instance, the DirectX pipeline is having performance troubles as well because PaintBufferLineHelper divides up any time the colors change when it could happily accept a run with multiple color specifications instead of hopping in and out and in and out for every color change.

I just think we might be able to pull those requirements together with the ones for VT and made an adaptation that has the best of all worlds.

I do agree with you that the base was written 100% with GDI in mind as the only target. And it was just convenient to us that the BGFX and WDDM and then the VT and DX ones worked on the same interface, even if suboptimally. It could be time to refine it taking all of their unique considerations in mind.

@miniksa commented on GitHub (Jul 9, 2021): There are other deficiencies in the architecture which we may be able to capture in a revision with something like flags or stages that each individual engine can opt into or out of to reduce the amount of work that is done in the parent while still maintaining the advantages of a unified interface. For instance, the DirectX pipeline is having performance troubles as well because PaintBufferLineHelper divides up any time the colors change when it could happily accept a run with multiple color specifications instead of hopping in and out and in and out for every color change. I just think we might be able to pull those requirements together with the ones for VT and made an adaptation that has the best of all worlds. I do agree with you that the base was written 100% with GDI in mind as the only target. And it was just convenient to us that the BGFX and WDDM and then the VT and DX ones *worked* on the same interface, even if suboptimally. It could be time to refine it taking all of their unique considerations in mind.
Author
Owner

@j4james commented on GitHub (Jul 9, 2021):

conhost has to soak up a lot of weird stuff like writing an attribute (only!) over a range of characters (for which there's no equivalent VT sequence), copying a region from position A to position B, etc.

Well there's DECCARA (Change Attributes in Rectangular Area), and DECCRA (Copy Rectangular Area), which more or less cover those operations. And while I haven't done an audit of all the console APIs, I'm optimistic that we should be able to reproduce most of them reasonably well with existing VT sequences.

It's certainly my long term plan to try and completely replace the current conpty layer with a pure pass-through implementation, where the legacy console APIs are translated directly to VT sequences without an intermediate buffer (there may still need to be a buffer for the buffer reading APIs, but the conpty layer would ideally not be rendering from that buffer).

Whether I can ever actually achieve that plan is another question, and of course you may not want to accept a PR of that sort, but that's the direction I'm heading (very very slowly I'll admit).

So from my point of view, the VtRenderer is a stop-gap solution which I hope we can drop one day. But realistically that may never happen, so I'm totally in favour of any improvements to the performance in the meantime.

@j4james commented on GitHub (Jul 9, 2021): > conhost has to soak up a lot of weird stuff like writing an attribute (only!) over a range of characters (for which there's no equivalent VT sequence), copying a region from position A to position B, etc. Well there's `DECCARA` (Change Attributes in Rectangular Area), and `DECCRA` (Copy Rectangular Area), which more or less cover those operations. And while I haven't done an audit of all the console APIs, I'm optimistic that we should be able to reproduce most of them reasonably well with existing VT sequences. It's certainly my long term plan to try and completely replace the current conpty layer with a pure pass-through implementation, where the legacy console APIs are translated directly to VT sequences without an intermediate buffer (there may still need to be a buffer for the buffer reading APIs, but the conpty layer would ideally not be rendering from that buffer). Whether I can ever actually achieve that plan is another question, and of course you may not want to accept a PR of that sort, but that's the direction I'm heading (very very slowly I'll admit). So from my point of view, the `VtRenderer` is a stop-gap solution which I hope we can drop one day. But realistically that may never happen, so I'm totally in favour of any improvements to the performance in the meantime.
Author
Owner

@skyline75489 commented on GitHub (Jul 10, 2021):

I managed to drop the time needed 1000000 “yes” from 8.6s down to 4.5s, with the method I described (delay the Flush and add optimized path for VtEngine circling). With a bit more fine-tuning and aggressive optimization, I can see a final 3.3s result.

I am honestly disappointed by this because I have to find the 47x boost somewhere else.

获取 Outlook for iOShttps://aka.ms/o0ukef

@skyline75489 commented on GitHub (Jul 10, 2021): I managed to drop the time needed 1000000 “yes” from 8.6s down to 4.5s, with the method I described (delay the Flush and add optimized path for VtEngine circling). With a bit more fine-tuning and aggressive optimization, I can see a final 3.3s result. I am honestly disappointed by this because I have to find the 47x boost somewhere else. 获取 Outlook for iOS<https://aka.ms/o0ukef>
Author
Owner

@skyline75489 commented on GitHub (Jul 11, 2021):

Structurally I think #10610 should be considered the prior task. After #10610 is done, I can add the optimization described in this PR without messing around the IRenderEngine interface any more. My current solution is very ugly.

For the record, what I did in my experimental branch is adding another method in IRenderEngine for renderer.cpp to determine which engine is used:

For example in VtEngine:

[[nodiscard]] RenderEngineKind Kind() noexcept
{
    return RenderEngineKind::VirturlTerminal;
}

Then in renderer we can do this:

if (pEngine->Kind() == RenderEngineKind::VirturlTerminal) {
    // ...
}
@skyline75489 commented on GitHub (Jul 11, 2021): Structurally I think #10610 should be considered the prior task. After #10610 is done, I can add the optimization described in this PR without messing around the `IRenderEngine` interface any more. My current solution is very ugly. For the record, what I did in my experimental branch is adding another method in IRenderEngine for `renderer.cpp` to determine which engine is used: For example in VtEngine: ```C++ [[nodiscard]] RenderEngineKind Kind() noexcept { return RenderEngineKind::VirturlTerminal; } ``` Then in renderer we can do this: ```C++ if (pEngine->Kind() == RenderEngineKind::VirturlTerminal) { // ... } ```
Author
Owner

@skyline75489 commented on GitHub (Jul 18, 2021):

Closed in favor of #10610 & #10615

@skyline75489 commented on GitHub (Jul 18, 2021): Closed in favor of #10610 & #10615
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#14441