Support renderer fast presents #9961

Closed
opened 2026-01-31 02:08:31 +00:00 by claunia · 12 comments
Owner

Originally created by @mrange on GitHub (Aug 1, 2020).

Description of the new feature/enhancement

The current renderer has a complex workflow in order to render a frame. If one are looking into providing animated backgrounds or terminal effects that require high FPS rates the process is too slow especially since terminal effects disables partial (faster) repaints.

In addition; in DxRenderer.cpp there is lot of copying back-n-forth between forward and back buffer to enable partial updates and terminal effects.

This proposal is to:

  1. Allow fast presents (60+ FPS) for renderers where it makes sense.
  2. Enable partial updates even if terminal effects are enabled.
  3. Simplify and speed up DxRenderer.

I realize this is probably not high on the list of "need to" stuff but if the retro terminal effects is a thing I think it makes sense to make it render fast enough to support other cool scenarios.

Proposed technical implementation details (optional)

Proposed design

Originally created by @mrange on GitHub (Aug 1, 2020). # Description of the new feature/enhancement The current renderer has a complex workflow in order to render a frame. If one are looking into providing animated backgrounds or terminal effects that require high FPS rates the process is too slow especially since terminal effects disables partial (faster) repaints. In addition; in `DxRenderer.cpp` there is lot of copying back-n-forth between forward and back buffer to enable partial updates and terminal effects. This proposal is to: 1. Allow fast presents (60+ FPS) for renderers where it makes sense. 2. Enable partial updates even if terminal effects are enabled. 3. Simplify and speed up `DxRenderer`. I realize this is probably not high on the list of "need to" stuff but if the retro terminal effects is a thing I think it makes sense to make it render fast enough to support other cool scenarios. # Proposed technical implementation details (optional) [Proposed design](https://github.com/mrange/terminal/blob/pr/mrange/spec/shaderificII/doc/specs/%237147%20-%20Support%20renderer%20fast%20presents.md)
Author
Owner

@zadjii-msft commented on GitHub (Aug 3, 2020):

Huh, I like the idea. I know that we've been tossing around ideas for making the retro effect work with differential drawing for a bit now, but this might be cleaner. I'm gonna summon @miniksa to get his thoughts on your spec (and fork), since he's our rendering expert

@zadjii-msft commented on GitHub (Aug 3, 2020): Huh, I like the idea. I know that we've been tossing around ideas for making the retro effect work with differential drawing for a bit now, but this might be cleaner. I'm gonna summon @miniksa to get his thoughts on your spec (and fork), since he's our rendering expert
Author
Owner

@mrange commented on GitHub (Aug 16, 2020):

Hi. Anyone had a chance to look at the proposal?

@mrange commented on GitHub (Aug 16, 2020): Hi. Anyone had a chance to look at the proposal?
Author
Owner

@miniksa commented on GitHub (Aug 17, 2020):

I'll read this today. Sorry for the delay.

@miniksa commented on GitHub (Aug 17, 2020): I'll read this today. Sorry for the delay.
Author
Owner

@miniksa commented on GitHub (Aug 17, 2020):

I like the initial proposal.

I'm not sure where the assertion that the existing slow workflow is too slow for 60fps comes from. Is that specifically in the context of when a shader is applied? I don't think I've seen the partial rendering stalling us below 60fps on my machine for simple text...

I don't mind the texture idea. It's not a problem if we use MORE of DX11 than before. The whole point of moving to DX was that we'd use more and more of it over time anyway. Also, having a sort of backing buffer that is drawn to and then composing it on the front is fine. I think I did something like that in the GDI renderer.

Regarding the proposed impl:

  1. I definitely don't want a full invalidate on scroll. That's a big deal for outputting big.txt.
  2. Pixelshader always applied. I agree with your theoretical proposal to skip the shader route completely if no shader is applied. NOP shader is a creative idea, but I'd rather save all the memory overhead of not instantiating those things if no one is using them.

For the fast present impl:
A. For the renderers that don't support it, we generally return S_FALSE not S_OK. S_FALSE is still a SUCCEEDED() value, but it says that nothing was done or nothing needed to be done.
B. I feel like in lieu of using the frame limiting number for the FastPresent function if this shouldn't just be tied up with the WaitUntilCanRender method which would allow a FastPresent to be called every time the Waitable swap chain says it's ready until another the rest of the terminal is ready for a repaint and regular Present. Additionally... I wonder if we can't just have FastPresent and Present be the same thing with each renderer realizing it either can re-present the previous frame or that it should just S_FALSE until someone paints again. That saves adding another interface method and might simplify the thread.cpp loop/conditions.

Finally, we should continue to use Present1() where possible. There are significant performance advantages realized in the rest of the system rendering pipeline (specifically DirectComposition, the DWM, and any aware application that can use the additional metadata to do partial presentation such as Remote Desktop) by passing that data. I don't mind using Present for shaders enabled, but the common high-performance case must continue to use Present1.

@miniksa commented on GitHub (Aug 17, 2020): I like the initial proposal. I'm not sure where the assertion that the existing slow workflow is too slow for 60fps comes from. Is that specifically in the context of when a shader is applied? I don't think I've seen the partial rendering stalling us below 60fps on my machine for simple text... I don't mind the texture idea. It's not a problem if we use MORE of DX11 than before. The whole point of moving to DX was that we'd use more and more of it over time anyway. Also, having a sort of backing buffer that is drawn to and then composing it on the front is fine. I think I did something like that in the GDI renderer. Regarding the proposed impl: 1. I definitely don't want a full invalidate on scroll. That's a big deal for outputting `big.txt`. 3. Pixelshader always applied. I agree with your theoretical proposal to skip the shader route completely if no shader is applied. NOP shader is a creative idea, but I'd rather save all the memory overhead of not instantiating those things if no one is using them. For the fast present impl: A. For the renderers that don't support it, we generally return `S_FALSE` not `S_OK`. `S_FALSE` is still a `SUCCEEDED()` value, but it says that nothing was done or nothing needed to be done. B. I feel like in lieu of using the frame limiting number for the `FastPresent` function if this shouldn't just be tied up with the `WaitUntilCanRender` method which would allow a `FastPresent` to be called every time the Waitable swap chain says it's ready until another the rest of the terminal is ready for a repaint and regular `Present`. Additionally... I wonder if we can't just have `FastPresent` and `Present` be the same thing with each renderer realizing it either can re-present the previous frame or that it should just S_FALSE until someone paints again. That saves adding another interface method and might simplify the `thread.cpp` loop/conditions. Finally, we should continue to use `Present1()` where possible. There are significant performance advantages realized in the rest of the system rendering pipeline (specifically DirectComposition, the DWM, and any aware application that can use the additional metadata to do partial presentation such as Remote Desktop) by passing that data. I don't mind using `Present` for shaders enabled, but the common high-performance case must continue to use `Present1`.
Author
Owner

@mrange commented on GitHub (Aug 17, 2020):

  1. my intention was to restore scroll partial invalidate but it was too much work for the prototype.
  2. Yes, the initial idea was the NOP shader but I am back tracking from that idea in favour of CopySubResource or something similar.

WRT to Present1. I don't think Present1 is compatible with full screen shader effects but it could be possible to retain the current workflow and then have a separate workflow when shaders enabled. However, the drawback would be increased code-complexity.

With fullscreen refresh needed for the current implementation of how shaders are enabled it feels that repainting the full frame and then apply a destructive shader effect on the full frame feels like it unnecessary wasteful compared to just applying a pixel shader on a pre existing texture.

With that said I only have Debug built on my machine (disk space issues) and Debug is kind of sluggish at least for me.

Retaining the existing workflow and having a separate flow for pixel shaders is a significant change and it will take me some time to produce a working prototype.

@mrange commented on GitHub (Aug 17, 2020): 1. my intention was to restore scroll partial invalidate but it was too much work for the prototype. 2. Yes, the initial idea was the NOP shader but I am back tracking from that idea in favour of CopySubResource or something similar. WRT to Present1. I don't think Present1 is compatible with full screen shader effects but it could be possible to retain the current workflow and then have a separate workflow when shaders enabled. However, the drawback would be increased code-complexity. With fullscreen refresh needed for the current implementation of how shaders are enabled it feels that repainting the full frame and then apply a destructive shader effect on the full frame feels like it unnecessary wasteful compared to just applying a pixel shader on a pre existing texture. With that said I only have Debug built on my machine (disk space issues) and Debug is kind of sluggish at least for me. Retaining the existing workflow and having a separate flow for pixel shaders is a significant change and it will take me some time to produce a working prototype.
Author
Owner

@miniksa commented on GitHub (Aug 18, 2020):

It's okay if Present1 isn't compatible with full screen shader effects. This is my rationale:

  1. When using Remote Desktop with a Terminal, you're not likely to be using effects because you are sensitive to latency and bandwidth needs. Applications that use Present1 can be further compressed by Remote Desktop as the composition system is only notified of changes, therefore Remote Desktop only needs to carry those deltas to the remote machine. If we only used Present, Remote Desktop wouldn't get those nice notifications and would have to carry through a lot more data.
  2. When using effects, you're likely using a beefy machine locally and you value it looking pretty/flashy over raw performance either text-to-screen or over a network. If composition has to do more work at some level because it's given less information with Present over Present1, it's localized between a strong CPU, GPU, and large capacity bus bandwidth. You also don't have to contend with the bandwidth limitations of a remote scenario.

So having it bifurcated to support both user personas is likely what we need.

Also, we do not consider a Debug build as a valid measure of performance characteristics of the Terminal. There are a TON of debug-only logging, tracing, and guard mechanisms that are stripped out on Release. Additionally, the optimizer does quite a bit of work on our codebase that just isn't present in Debug builds. I would very much recommend that you use Release for performance profiling and Debug just for diagnostics, debugging, and trace purposes.

Finally, I agree with you. The destructiveness of the shader effect isn't something I was ever in favor of, but it was a code contribution from outside our core team. And I wasn't willing to hold up going from nothing to something over those details at the time. This is much of why we classified the feature in the experimental namespace: because we knew it would have to be further refined by someone at some future point. And if that someone is YOU and the time is NOW, that's perfectly okay with us!

@miniksa commented on GitHub (Aug 18, 2020): It's okay if `Present1` isn't compatible with full screen shader effects. This is my rationale: 1. When using Remote Desktop with a Terminal, you're not likely to be using effects because you are sensitive to latency and bandwidth needs. Applications that use `Present1` can be further compressed by Remote Desktop as the composition system is only notified of changes, therefore Remote Desktop only needs to carry those deltas to the remote machine. If we only used `Present`, Remote Desktop wouldn't get those nice notifications and would have to carry through a lot more data. 2. When using effects, you're likely using a beefy machine locally and you value it looking pretty/flashy over raw performance either text-to-screen or over a network. If composition has to do more work at some level because it's given less information with `Present` over `Present1`, it's localized between a strong CPU, GPU, and large capacity bus bandwidth. You also don't have to contend with the bandwidth limitations of a remote scenario. So having it bifurcated to support both user personas is likely what we need. Also, we do not consider a Debug build as a valid measure of performance characteristics of the Terminal. There are a TON of debug-only logging, tracing, and guard mechanisms that are stripped out on Release. Additionally, the optimizer does quite a bit of work on our codebase that just isn't present in Debug builds. I would very much recommend that you use Release for performance profiling and Debug just for diagnostics, debugging, and trace purposes. Finally, I agree with you. The destructiveness of the shader effect isn't something I was ever in favor of, but it was a code contribution from outside our core team. And I wasn't willing to hold up going from nothing to something over those details at the time. This is much of why we classified the feature in the `experimental` namespace: because we knew it would have to be further refined by someone at some future point. And if that someone is YOU and the time is NOW, that's perfectly okay with us!
Author
Owner

@mrange commented on GitHub (Aug 18, 2020):

I aware that DEBUG in C++ is very sluggish compared to RELEASE and I agree it's not a good measurement for good reasons. However, as this is at a conceptual stage currently I haven't bothered to do proper measurement. I guess I shouldn't have mentioned the performance reason without facts.

@mrange commented on GitHub (Aug 18, 2020): I aware that DEBUG in C++ is very sluggish compared to RELEASE and I agree it's not a good measurement for good reasons. However, as this is at a conceptual stage currently I haven't bothered to do proper measurement. I guess I shouldn't have mentioned the performance reason without facts.
Author
Owner

@mrange commented on GitHub (Aug 18, 2020):

When it comes to Present1 the remote desktop is a good argument and a good test case where I can see the existing code flow being beneficial in terms over the approach in this proposal.

@mrange commented on GitHub (Aug 18, 2020): When it comes to `Present1` the remote desktop is a good argument and a good test case where I can see the existing code flow being beneficial in terms over the approach in this proposal.
Author
Owner

@zadjii-msft commented on GitHub (Feb 18, 2021):

We don't really have an issue open already for "re-enable differential drawing when shaders are enabled", do we? I suppose this is the closest one.

I probably should have read this thread closer before I tried messing with the shaders yesterday (dev/migrie/dol/messing-with-shaders-take-1). Whoops. Well, I guess I know what I'll be looking into next month 😉

@zadjii-msft commented on GitHub (Feb 18, 2021): We don't _really_ have an issue open already for "re-enable differential drawing when shaders are enabled", do we? I suppose this is the closest one. I probably should have read this thread closer before I tried messing with the shaders yesterday ([`dev/migrie/dol/messing-with-shaders-take-1`](https://github.com/microsoft/terminal/tree/dev/migrie/dol/messing-with-shaders-take-1)). Whoops. Well, I guess I know what I'll be looking into next month 😉
Author
Owner

@zadjii-msft commented on GitHub (Jul 27, 2021):

I'm trying this out this week, and it's really close to working. I'm getting caught up in the scrolling bit though - I definitely don't want to do the InvalidateAll thing - that's the whole point of the optimization here, to not do that.

@mrange do you have any pointers for how to do the

my intention was to restore scroll partial invalidate but it was too much work for the prototype.

thing you mentioned? Any pointers would be immensely helpful 😄

My WIP branch is dev/migrie/fhl-2021/differential-pixel-shading, which is branched from dev/migrie/fhl-2021/more-shader-variables

@zadjii-msft commented on GitHub (Jul 27, 2021): I'm trying this out this week, and it's really close to working. I'm getting caught up in the scrolling bit though - I definitely don't want to do the `InvalidateAll` thing - that's the whole point of the optimization here, to not do that. @mrange do you have any pointers for how to do the > my intention was to restore scroll partial invalidate but it was too much work for the prototype. thing you mentioned? Any pointers would be immensely helpful 😄 My WIP branch is `dev/migrie/fhl-2021/differential-pixel-shading`, which is branched from `dev/migrie/fhl-2021/more-shader-variables`
Author
Owner

@zadjii-msft commented on GitHub (Jul 28, 2021):

Nevermind, I figured it out. I made two incredibly embarrassing mistakes

  • I forgot to pass some ComPtrs by ref into a function. Instead I was copy-constructing new ones, then creating objects into those copies, then assuming the value would persist to the caller. 🤦
  • I actually wasn't swapping the variables correctly. Thanks std::swap 🤦

That's what I get for trying to write code with a head cold. It works now, and works well. I'm pretty confident you could leave shaders on all the time and not feel the pain with these changes. It even works acceptably to have cmatrix running with retro effects full screen on my slaptop (bless it's little heart)

So now dev/migrie/fhl-2021/differential-pixel-shading looks a lot better. I gotta untie those branches but #shipit

@zadjii-msft commented on GitHub (Jul 28, 2021): Nevermind, I figured it out. I made two incredibly embarrassing mistakes * I forgot to pass some `ComPtr`s by ref into a function. Instead I was copy-constructing new ones, then creating objects into those copies, then assuming the value would persist to the caller. 🤦 * I actually _wasn't swapping the variables correctly_. Thanks `std::swap` 🤦 That's what I get for trying to write code with a head cold. It works now, and works _well_. I'm pretty confident you could leave shaders on all the time and not feel the pain with these changes. It even works _acceptably_ to have `cmatrix` running with retro effects full screen on my slaptop (bless it's little heart) So now [`dev/migrie/fhl-2021/differential-pixel-shading`](https://github.com/microsoft/terminal/compare/dev/migrie/fhl-2021/more-shader-variables...dev/migrie/fhl-2021/differential-pixel-shading?expand=1) looks a lot better. I gotta untie those branches but #shipit
Author
Owner

@lhecker commented on GitHub (Sep 1, 2022):

With #13885 in place, our new text renderer can handle this situation now. Since shading on the GPU is fast, it simply redraws the entire screen on every frame. Sounds slow? With the retro shader enabled rendering my entire 4K monitor worth of content 144 times per second takes up about 2-5% of my GPU load.
It'll ship, enabled by default, in Windows Terminal Preview 1.16 sometime soon. 🙂

@lhecker commented on GitHub (Sep 1, 2022): With #13885 in place, our new text renderer can handle this situation now. Since shading on the GPU is fast, it simply redraws the entire screen on every frame. Sounds slow? With the retro shader enabled rendering my entire 4K monitor worth of content 144 times per second takes up about 2-5% of my GPU load. It'll ship, enabled by default, in Windows Terminal Preview 1.16 sometime soon. 🙂
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#9961