Reduce virtual function usage to mitigate performance penalty due to CFG #14457

Open
opened 2026-01-31 04:10:54 +00:00 by claunia · 8 comments
Owner

Originally created by @skyline75489 on GitHub (Jul 11, 2021).

What is CFG?

Control Flow Guard (CFG) is MSVC/Win32 feature that can combat memory exploits & vulnerabilities. I found this accidentally when I was exploring the performance bottleneck in #10564. I found that even if I replace the entire GetPatternId with return 0, that call is still very expensive. Then I realized it's not about what's inside the call, it's about the call itself. For every virtual function call, the compiler will insert the following instruction before the call:

call __guard_dispatch_icall_fptr, [SomeMethod]

This will then leads to several extra instructions before calling the actual method.

Disabling CFG

CFG can be disabled in the system setting (restart required). With CFG disabled system-wide, the call __guard_dispatch_icall_fptr will simple return quickly without the extra instructions.

CFG can also be disabled in compiler option (/guard:cf). I failed to actually disable this in WT for some reason. I suppose with the option disabled, no __guard_dispatch_icall_fptr will be inserted any more.

Proposed solution

OK here's my refurbished plan;

  1. Remove all render related methods in IRenderEngine and replace them with a single method PaintFrame(IRenderData *pData) (#10615)
  2. Replace InvalidateSomething methods in IRenderEngine with TriggerSomething to give the actual engine more control & reduce the coupling with renderer.cpp.
  3. Let's see what we can do for the left methods.

CC @DHowett @miniksa @zadjii-msft @lhecker for discussion.

Originally created by @skyline75489 on GitHub (Jul 11, 2021). ### What is CFG? [Control Flow Guard (CFG)](https://docs.microsoft.com/en-us/windows/win32/secbp/control-flow-guard) is MSVC/Win32 feature that can combat memory exploits & vulnerabilities. I found this accidentally when I was exploring the performance bottleneck in #10564. I found that even if I replace the entire `GetPatternId` with `return 0`, that call is still very expensive. Then I realized it's not about what's inside the call, it's about the call *itself*. For every virtual function call, the compiler will insert the following instruction before the call: ```asm call __guard_dispatch_icall_fptr, [SomeMethod] ``` This will then leads to several extra instructions before calling the actual method. ### Disabling CFG CFG can be disabled in the system setting (restart required). With CFG disabled system-wide, the `call __guard_dispatch_icall_fptr` will simple return quickly without the extra instructions. CFG can also be disabled in compiler option `(/guard:cf)`. I failed to actually disable this in WT for some reason. I suppose with the option disabled, no `__guard_dispatch_icall_fptr` will be inserted any more. ### Proposed solution OK here's my refurbished plan; 1. Remove all render related methods in `IRenderEngine` and replace them with a single method `PaintFrame(IRenderData *pData)` (#10615) 1. Replace `InvalidateSomething` methods in `IRenderEngine` with `TriggerSomething` to give the actual engine more control & reduce the coupling with `renderer.cpp`. 1. Let's see what we can do for the left methods. CC @DHowett @miniksa @zadjii-msft @lhecker for discussion.
claunia added the Issue-FeatureProduct-TerminalProduct-ConptyArea-Performance labels 2026-01-31 04:10:55 +00:00
Author
Owner

@skyline75489 commented on GitHub (Jul 11, 2021):

As for the actual performance gain, I found 10% improvement in VT throughput in my experimental branch.

For the main branch right now, the performance gain isn't really obvious, because other code paths are too slow 😅 .

@skyline75489 commented on GitHub (Jul 11, 2021): As for the actual performance gain, I found 10% improvement in VT throughput in my experimental branch. For the main branch right now, the performance gain isn't really obvious, because other code paths are too slow 😅 .
Author
Owner

@DHowett commented on GitHub (Jul 11, 2021):

WDDM, BGFX and GDI are selected at runtime; there is no separate build of conhost for OneCore machines.

@DHowett commented on GitHub (Jul 11, 2021): WDDM, BGFX and GDI are selected at runtime; there is no separate build of conhost for OneCore machines.
Author
Owner

@skyline75489 commented on GitHub (Jul 11, 2021):

That's sad :( . Can we at least make VtRenderer a compile-time thing? And those renderer in WT.

@skyline75489 commented on GitHub (Jul 11, 2021): That's sad :( . Can we at least make VtRenderer a compile-time thing? And those renderer in WT.
Author
Owner

@DHowett commented on GitHub (Jul 11, 2021):

When the DX engine is in conhost it too is selected at runtime.

I suspect we will see more improvement from reducing how many times we have to call through the virtual functions and increasing the amount of data we exchange for each one than by simply using a static interface.

@DHowett commented on GitHub (Jul 11, 2021): When the DX engine is in conhost it too is selected at runtime. I suspect we will see more improvement from reducing how many times we have to call through the virtual functions and increasing the amount of data we exchange for each one than by simply using a static interface.
Author
Owner

@skyline75489 commented on GitHub (Jul 11, 2021):

reducing how many times we have to call through the virtual functions

Great idea!

If you're OK with what I did in https://github.com/microsoft/terminal/issues/10596#issuecomment-877730364 , we can still reduce a lot of unnecessary virtual function calls.

@skyline75489 commented on GitHub (Jul 11, 2021): >reducing how many times we have to call through the virtual functions Great idea! If you're OK with what I did in https://github.com/microsoft/terminal/issues/10596#issuecomment-877730364 , we can still reduce a *lot* of unnecessary virtual function calls.
Author
Owner

@DHowett commented on GitHub (Jul 11, 2021):

I'm not sure. Wouldn't better batching help every renderer at the same time?

For what it's worth, I did once propose having a "render feature flags" enum such that a render engine could specify what it needed (rather than what it is). I think that that is a better granularity than trying to teach the base renderer about the different kinds of render engine. After all, what you've done is reimplement RTTI but as an enum value returned from a method, and it would break encapsulation in the same exact way as using RTTI would.

@DHowett commented on GitHub (Jul 11, 2021): I'm not sure. Wouldn't better batching help *every* renderer at the same time? For what it's worth, I did once propose having a "render feature flags" enum such that a render engine could specify what it _needed_ (rather than what it _is_). I think that that is a better granularity than trying to teach the base renderer about the different kinds of render engine. After all, what you've done is reimplement RTTI but as an enum value returned from a method, and it would break encapsulation in the same exact way as using RTTI would.
Author
Owner

@skyline75489 commented on GitHub (Aug 18, 2021):

At first I think "render feature flag" sounds like a nice idea. However, after the experiment (#10615), I think that is not really needed, because most of the functionalities in the base class can simply be extracted into smaller helper classes. We might want to choose the "composition over inheritance" path.

@skyline75489 commented on GitHub (Aug 18, 2021): At first I think "render feature flag" sounds like a nice idea. However, after the experiment (#10615), I think that is not really needed, because most of the functionalities in the base class can simply be extracted into smaller helper classes. We might want to choose the "composition over inheritance" path.
Author
Owner

@sfiruch commented on GitHub (Jul 20, 2023):

Making the dispatch explicit would also solve any CFG-related performance issues:

if(flagIsFeature1Available)
   useFeature1();
else if(flagIsFeature2Available)
   useFeature2();
else
   useFallback();

This is very easy for branch predictors to handle, so could be used in hot code paths.

@sfiruch commented on GitHub (Jul 20, 2023): Making the dispatch explicit would also solve any CFG-related performance issues: ``` if(flagIsFeature1Available) useFeature1(); else if(flagIsFeature2Available) useFeature2(); else useFallback(); ``` This is very easy for branch predictors to handle, so could be used in hot code paths.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#14457