[PR #7619] Make til::color's COLORREF conversion more optimal #26958

Closed
opened 2026-01-31 09:19:07 +00:00 by claunia · 0 comments
Owner

Original Pull Request: https://github.com/microsoft/terminal/pull/7619

State: closed
Merged: Yes


Clang (10) has no trouble optimizing the COLORREF conversion operator to
a simple 32-bit load with mask (!) even though it's a series of bit
shifts across multiple struct members.

MSVC (19.24) doesn't make the same optimization decision, and it emits
three 8-bit loads and some shifting.

In any case, the optimization only applies at -O2 (clang) and above.

In this commit, we leverage the spec-legality of using unions for type
conversions and the overlap of four uint8_ts and a uint32_t to make the
conversion very obvious to both compilers.

x86_64 msvc O0 O1 O2
shifts 12 11 11 (fully inlined)
union 5 1 1 (fully inlined)
x86_64 clang O0 O1 O2 + O3
shifts 14 5 1 (fully inlined)
union 9 3 1 (fully inlined)

j4james brought up some concerns about til::color's minor wastefulness
in https://github.com/microsoft/terminal/pull/7578#discussion_r487355989.

This is a clear, simple transformation that saves us a few instructions
in a relatively common case, so I'm accepting a micro-optimization even
though we don't have data showing this to be a hot spot.

**Original Pull Request:** https://github.com/microsoft/terminal/pull/7619 **State:** closed **Merged:** Yes --- Clang (10) has no trouble optimizing the COLORREF conversion operator to a simple 32-bit load with mask (!) even though it's a series of bit shifts across multiple struct members. MSVC (19.24) doesn't make the same optimization decision, and it emits three 8-bit loads and some shifting. In any case, the optimization only applies at -O2 (clang) and above. In this commit, we leverage the spec-legality of using unions for type conversions and the overlap of four uint8_ts and a uint32_t to make the conversion very obvious to both compilers. x86_64 msvc | O0 | O1 | O2 ------------|----|----|-------------------- shifts | 12 | 11 | 11 (fully inlined) union | 5 | 1 | 1 (fully inlined) x86_64 clang | O0 | O1 | O2 + O3 -------------|----|----|-------------------- shifts | 14 | 5 | 1 (fully inlined) union | 9 | 3 | 1 (fully inlined) j4james brought up some concerns about til::color's minor wastefulness in https://github.com/microsoft/terminal/pull/7578#discussion_r487355989. This is a clear, simple transformation that saves us a few instructions in a relatively common case, so I'm accepting a micro-optimization even though we don't have data showing this to be a hot spot.
claunia added the pull-request label 2026-01-31 09:19:07 +00:00
Sign in to join this conversation.
No Label pull-request
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#26958