DxEngine::WaitUntilCanRender() should not have a hard Sleep() #17674

Closed
opened 2026-01-31 05:49:47 +00:00 by claunia · 3 comments
Owner

Originally created by @Corillian on GitHub (Jun 10, 2022).

c754f4d22d/src/renderer/dx/DxRenderer.cpp (L1512)

I work as an engine/graphics programmer in the games industry and seeing this causes some amount of stress. I've gone through a bit of the rendering code and I have a couple of observations, for whatever they're worth:

  1. Your DxEngine rendering code seems to be written based on assumptions that are true when working with traditional CPU-based software rasterizers - i.e. old GDI based rendering. These assumptions are frequently not true when working with GPU's. For example, it can actually be faster to submit a single draw call that does a lot of work, like touching every pixel by redrawing the entire window, instead of sending a whole bunch of draw calls to do small amounts of work, like updating multiple disparate dirty rect's.
  2. There should not be any circumstance under which your rendering thread is causing throughput issues. The logical view of text should update and the rendering thread should render this view as quickly as possible by requesting an immutable snapshot. Given that this application isn't a game, and maxing FPS is entirely irrelevant, you should wait on v-sync instead of a hardcoded period of time that could, ironically, still result in tearing if you aren't waiting on v-sync anyway for final submission. If you want to be extra high-speed about it your rendering thread does not have to be an actual dedicated OS thread that blocks. You can have an event get set on v-sync that notifies a worker thread in a thread pool so that your rendering thread is just a logical thread.
  3. You seem to go through some effort to avoid clearing the background and this doesn't make any sense to me. A solid color fill of a rectangle is pretty much the fastest thing a GPU can do and it seems like your time is better spent on optimizing the rendering of the foreground. For example, you could cache command buffers with all of the state necessary to render a single line of text. I haven't ever used D2D so I don't know how flexible it is in this regard but given that I can render millions of objects significantly more complex than text at 60fps+ I would think, worst case, you could brute force the entire screen every frame without issue.
Originally created by @Corillian on GitHub (Jun 10, 2022). https://github.com/microsoft/terminal/blob/c754f4d22da47d93ba6e1401496605eba45248d6/src/renderer/dx/DxRenderer.cpp#L1512 I work as an engine/graphics programmer in the games industry and seeing this causes some amount of stress. I've gone through a bit of the rendering code and I have a couple of observations, for whatever they're worth: 1. Your DxEngine rendering code seems to be written based on assumptions that are true when working with traditional CPU-based software rasterizers - i.e. old GDI based rendering. These assumptions are frequently *not* true when working with GPU's. For example, it can actually be faster to submit a single draw call that does a lot of work, like touching every pixel by redrawing the entire window, instead of sending a whole bunch of draw calls to do small amounts of work, like updating multiple disparate dirty rect's. 2. There should not be any circumstance under which your rendering thread is causing throughput issues. The logical view of text should update and the rendering thread should render this view as quickly as possible by requesting an immutable snapshot. Given that this application isn't a game, and maxing FPS is entirely irrelevant, **you should wait on v-sync** instead of a hardcoded period of time that could, ironically, still result in tearing if you aren't waiting on v-sync anyway for final submission. If you want to be extra high-speed about it your rendering thread does not have to be an actual dedicated OS thread that blocks. You can have an event get set on v-sync that notifies a worker thread in a thread pool so that your rendering thread is just a logical thread. 3. You seem to go through some effort to avoid clearing the background and this doesn't make any sense to me. A solid color fill of a rectangle is pretty much the fastest thing a GPU can do and it seems like your time is better spent on optimizing the rendering of the foreground. For example, you could cache command buffers with all of the state necessary to render a single line of text. I haven't ever used D2D so I don't know how flexible it is in this regard but given that I can render millions of objects significantly more complex than text at 60fps+ I would think, worst case, you could brute force the entire screen every frame without issue.
Author
Owner

@zadjii-msft commented on GitHub (Jun 10, 2022):

There's a lot to unpack here and it's almost the weekend, so sorry for the delays in a longer, better response.

It's important to know that most of the stack here has evolved very organically over the course of the last decade. The GDI engine came from the collection of disparate draw calls all over the codebase. The DX engine came from the GDI engine and (some other internal dx renderer that worked well enough). The DX engine was hacked together for conhost long before Terminal even existed.

IIRC, the original sleep is descended from the GDI engine where yea, the rendering needs to happen on the CPU. There was certainly a point in that engine where trying to repaint more often than 1/60s did create a detrimental impact on the throughput. The way it existed originally, there was only ever a Renderer and a GdiEngine, and the Sleep was originally in the Renderer. Many refactors later left that in the DxEngine, which we might be able to just get rid on now, yea.

I believe we once experimented with syncing the DxEngine's frames to the v-sync. That was for #649. The original experiment didn't pan out from what I remember, and we haven't really revisited since. We could probably try that again here, in place of the Sleep().

Overall, I think a lot of what you've mentioned here is stuff that we're hoping to address with the new atlas renderer. Especially changing the renderer interface to give the engine a snapshot of the viewport's contents. That's one of the big things we're hoping to expose around the #8000 timeframe.

@zadjii-msft commented on GitHub (Jun 10, 2022): There's a lot to unpack here and it's almost the weekend, so sorry for the delays in a longer, better response. It's important to know that most of the stack here has evolved very organically over the course of the last decade. The GDI engine came from the collection of disparate draw calls all over the codebase. The DX engine came from the GDI engine and (some other internal dx renderer that worked well enough). The DX engine was hacked together for conhost long before Terminal even existed. IIRC, the original sleep is descended from the GDI engine where yea, the rendering needs to happen on the CPU. There was certainly a point in that engine where trying to repaint more often than 1/60s did create a detrimental impact on the throughput. The way it existed originally, there was only ever a Renderer and a GdiEngine, and the Sleep was originally in the Renderer. Many refactors later left that in the DxEngine, which we might be able to just get rid on now, yea. I believe we once experimented with syncing the DxEngine's frames to the v-sync. That was for #649. The original experiment didn't pan out from what I remember, and we haven't really revisited since. We could probably try that again here, in place of the Sleep(). Overall, I think a lot of what you've mentioned here is stuff that we're hoping to address with the new atlas renderer. Especially changing the renderer interface to give the engine a snapshot of the viewport's contents. That's one of the big things we're hoping to expose around the #8000 timeframe.
Author
Owner

@lhecker commented on GitHub (Jun 13, 2022):

Overall, I think a lot of what you've mentioned here is stuff that we're hoping to address with the new atlas renderer.

FYI you can find it here: https://github.com/microsoft/terminal/tree/main/src/renderer/atlas
This engine doesn't make the mistake mentioned in your 1st point. The code isn't particularly beautiful or well documented yet unfortunately. I'm currently working on rewriting our text buffer (#8000) instead to support - among others - fast buffer snapshots during rendering and so my time is a bit "stretched thin". Afterwards we can remove the Sleep() for sure. There's a lot of highly exciting performance optimization work left to do in this area! 🙂
Here are instructions about how to enable this engine in Terminal Preview.

If you or anyone else has any tips, advice, or just feedback for the newer AtlasEngine, that'd be absolutely amazing!

@lhecker commented on GitHub (Jun 13, 2022): > Overall, I think a lot of what you've mentioned here is stuff that we're hoping to address with the new atlas renderer. FYI you can find it here: https://github.com/microsoft/terminal/tree/main/src/renderer/atlas This engine doesn't make the mistake mentioned in your 1st point. The code isn't particularly beautiful or well documented yet unfortunately. I'm currently working on rewriting our text buffer (#8000) instead to support - among others - fast buffer snapshots during rendering and so my time is a bit "stretched thin". Afterwards we can remove the `Sleep()` for sure. There's a lot of highly exciting performance optimization work left to do in this area! 🙂 [Here](https://devblogs.microsoft.com/commandline/windows-terminal-preview-1-13-release/#new-rendering-engine) are instructions about how to enable this engine in Terminal Preview. If you or anyone else has any tips, advice, or just feedback for the newer AtlasEngine, that'd be absolutely amazing!
Author
Owner

@zadjii-msft commented on GitHub (Jun 14, 2022):

Alrighty so we discussed this as a team yesterday.

Turns out, if you totally remove that Sleep in the DxEngine, it does start to negatively impact throughput. Without it, the DX engine spends too much time holding the terminal lock, doing too much CPU-side work, and ultimately slows the Terminal down. This could theoretically be replaced by a vsync wait there - we'd want to carefully measure any throughput changes that might cause. However, our long term strategy is to migrate to the atlas renderer entirely, and sunset the DxEngine. As discussed above, the Atlas renderer doesn't have this sort of issue. Overall, that work is being tracked in #9999, so I'll direct follow-up there.

Thanks!

@zadjii-msft commented on GitHub (Jun 14, 2022): Alrighty so we discussed this as a team yesterday. Turns out, if you totally remove that `Sleep` in the DxEngine, it does start to negatively impact throughput. Without it, the DX engine spends too much time holding the terminal lock, doing too much CPU-side work, and ultimately slows the Terminal down. This could theoretically be replaced by a vsync wait there - we'd want to carefully measure any throughput changes that might cause. However, our long term strategy is to migrate to the atlas renderer entirely, and sunset the DxEngine. As discussed above, the Atlas renderer doesn't have this sort of issue. Overall, that work is being tracked in #9999, so I'll direct follow-up there. Thanks!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#17674