Apply PGO to Binaries #9674

Closed
opened 2026-01-31 02:00:41 +00:00 by claunia · 2 comments
Owner

Originally created by @miniksa on GitHub (Jul 17, 2020).

Originally assigned to: @miniksa on GitHub.

PGO, or Profile Guided Optimization, is a way that we can accelerate the performance of the most frequently used paths in our applications. We profile the applications with several test scenarios that represent very strenuous or hot paths through the application. The instrumented binaries count up the usages of functions along paths while the test scenarios are run. Then when the applications are rebuilt, the binaries are provided to the linker to guide it as to which functions are the most important. See also: https://docs.microsoft.com/en-us/cpp/build/profile-guided-optimizations?view=vs-2019

During my prototyping phases in dev/miniksa/gotta_go_fast in July 2020, I found that we can get about 10-20% less runtime out of large text processing operations like time cat big.txt, a large list operation, or anything else sending a massive stream of "WriteFile" type operations to our system. Given that massive blocks of data transfer tend to be more noticeable as either a total runtime or a latency issue by our users (versus UI operations like splitting panes), I believe that we should use profiling runs that focus on massive data transfer and add other hot scenarios as necessary to help us squeeze the most performance out of our application.

This feature-size task represents setting up this system for our application.

The https://github.com/microsoft/microsoft-ui-xaml team has already done this for their entire stack. I had attempted to roll my own lesser variation of it, but on doing so, I realized that it will be better overall to just replicate their work against our application.

The following activities are what I imagine will be required to get PGO going:

  • Mimic targets/properties from Microsoft UI XAML related to build properties, compiler, and linker flags necessary to generate both a "Training" binary for generating profile data as well as teaching the final "Optimized Release" binary to consume that profile data during linking.
  • Set up test automation using the Helix framework/labs such that we can run Input Injection and UI Automation tests in a lab for repeatable "training scenarios" against the training binaries
    • BONUS: Also get the ol' UIA tests we have running in this lab
    • BONUS: Also get the TerminalApp tests that @zadjii-msft has had as "local only" forever running in this lab.
  • Mimic the YAML definitions from Microsoft UI XAML for dispatching these test runs and collecting data from the Helix lab
  • Convert our artifact storage to use a public NuGet feed as a place to store the large Profile Guided Databases (.PGD files) generated from the training run counts (.PGC files) - See #6954
  • Mimic the scripts from Microsoft UI XAML for uploading PGD artifacts to the public NuGet feed as well as the ones to select the most relevant PGD databases to use when compiling (nearest time to the current branch as not every commit SHA is profiled).
  • Write tests for our scenarios to run in the Helix lab
    • cat big.txt --> output massive amount of unformatted text
    • cat ls.txt --> output massive amount of colorized/formatted text
    • Fan favorite random cell drawing utilities like cacafire and cmatrix
    • Good ol' GIF to ASCII chafa
    • Search functionality through UIA tree that @carlos-zamora has had performance issues with when working with NVDA
    • Some sort of full stack launch test that can serve as both a canary and provide some weight for optimizing startup time
  • Pull it all together, weight and merge the profiles, and release profile optimized builds

If necessary, I have noticed that just profiling the conhost binary when in PTY mode can provide some performance boost without also profiling the WindowsTerminal binary (and all its Terminal* DLLs) as well. We could make incremental progress here, but it is definitely best if we can get end-to-end profile guided optimizations working.

Originally created by @miniksa on GitHub (Jul 17, 2020). Originally assigned to: @miniksa on GitHub. PGO, or Profile Guided Optimization, is a way that we can accelerate the performance of the most frequently used paths in our applications. We profile the applications with several test scenarios that represent very strenuous or hot paths through the application. The instrumented binaries count up the usages of functions along paths while the test scenarios are run. Then when the applications are rebuilt, the binaries are provided to the linker to guide it as to which functions are the most important. See also: https://docs.microsoft.com/en-us/cpp/build/profile-guided-optimizations?view=vs-2019 During my prototyping phases in `dev/miniksa/gotta_go_fast` in July 2020, I found that we can get about 10-20% less runtime out of large text processing operations like `time cat big.txt`, a large list operation, or anything else sending a massive stream of "WriteFile" type operations to our system. Given that massive blocks of data transfer tend to be more noticeable as either a total runtime or a latency issue by our users (versus UI operations like splitting panes), I believe that we should use profiling runs that focus on massive data transfer and add other hot scenarios as necessary to help us squeeze the most performance out of our application. This feature-size task represents setting up this system for our application. The https://github.com/microsoft/microsoft-ui-xaml team has already done this for their entire stack. I had attempted to roll my own lesser variation of it, but on doing so, I realized that it will be better overall to just replicate their work against our application. The following activities are what I imagine will be required to get PGO going: - [ ] Mimic targets/properties from Microsoft UI XAML related to build properties, compiler, and linker flags necessary to generate both a "Training" binary for generating profile data as well as teaching the final "Optimized Release" binary to consume that profile data during linking. - [ ] Set up test automation using the Helix framework/labs such that we can run Input Injection and UI Automation tests in a lab for repeatable "training scenarios" against the training binaries - [ ] BONUS: Also get the ol' UIA tests we have running in this lab - [ ] BONUS: Also get the TerminalApp tests that @zadjii-msft has had as "local only" forever running in this lab. - [ ] Mimic the YAML definitions from Microsoft UI XAML for dispatching these test runs and collecting data from the Helix lab - [x] Convert our artifact storage to use a public NuGet feed as a place to store the large Profile Guided Databases (.PGD files) generated from the training run counts (.PGC files) - See #6954 - [ ] Mimic the scripts from Microsoft UI XAML for uploading PGD artifacts to the public NuGet feed as well as the ones to select the most relevant PGD databases to use when compiling (nearest time to the current branch as not every commit SHA is profiled). - [ ] Write tests for our scenarios to run in the Helix lab - [ ] `cat big.txt` --> output massive amount of unformatted text - [ ] `cat ls.txt` --> output massive amount of colorized/formatted text - [ ] Fan favorite random cell drawing utilities like `cacafire` and `cmatrix` - [ ] Good ol' GIF to ASCII `chafa` - [ ] Search functionality through UIA tree that @carlos-zamora has had performance issues with when working with NVDA - [ ] Some sort of full stack launch test that can serve as both a canary and provide some weight for optimizing startup time - [ ] Pull it all together, weight and merge the profiles, and release profile optimized builds If necessary, I have noticed that just profiling the `conhost` binary when in PTY mode can provide some performance boost without also profiling the `WindowsTerminal` binary (and all its `Terminal*` DLLs) as well. We could make incremental progress here, but it is definitely best if we can get end-to-end profile guided optimizations working.
Author
Owner

@DHowett commented on GitHub (Jul 17, 2020):

Triaged into Terminal 2.0

@DHowett commented on GitHub (Jul 17, 2020): Triaged into Terminal 2.0
Author
Owner

@ghost commented on GitHub (May 25, 2021):

:tada:This issue was addressed in #10071, which has now been successfully released as Windows Terminal Preview v1.9.1445.0.🎉

Handy links:

@ghost commented on GitHub (May 25, 2021): :tada:This issue was addressed in #10071, which has now been successfully released as `Windows Terminal Preview v1.9.1445.0`.:tada: Handy links: * [Release Notes](https://github.com/microsoft/terminal/releases/tag/v1.9.1445.0) * [Store Download](https://www.microsoft.com/store/apps/9n8g5rfz9xk3?cid=storebadge&ocid=badge)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#9674