#13626 contains a small "regression" compared to #13321:
It now began to store trailers in the buffer wherever possible to allow
a region
of the buffer to be backed up and restored via Read/WriteConsoleOutput.
But we're unfortunately still ill-equipped to handle anything but UCS-2
via
WriteConsoleOutput, so it's best to again ignore trailers just like in
#13321.
## Validation Steps Performed
* Added unit test ✅
This is a rather trivial changeset. Now that these two are present in the
`std` namespace there's no reason for us to continue using the `gsl` ones.
Additionally this ensures future compatibility with other 3rd party libraries.
My long-term plan is to replace the `CodepointWidth` enum with a simple integer
return value that indicates the amount of columns a codepoint is wide.
This is necessary so that we can return 0 for ZWJs (zero width joiners).
This initial commit represents a cleanup effort around `CodepointWidthDetector`.
Since less code runs faster, this change has the nice side-effect of running
roughly 5-10% faster across the board. It also drops the binary size by ~1.2kB.
## Validation Steps Performed
* `CodepointWidthDetectorTests` passes ✅
* U+26bf (``"`u{26bf}"`` inside pwsh) is a wide glyph
in OpenConsole and narrow one in Windows Terminal ✅
This commit replaces `Utf16Parser` with `<til/unicode.h>` which includes:
* `til::utf16_iterator` as a replacement for `Utf16Parser::Parse`
* `til::utf16_next` as a replacement for `Utf16Parser::ParseNext`
This fixes 2 bugs with `Utf16Parser`:
* Swallowing invalid surrogate pairs instead of turning them into U+FFFD.
* `std::vector<std::vector<wchar_t>>`. It's now >12000% faster.
## Validation Steps Performed
* New unit tests pass ✅
* Searching for narrow/wide characters in conhost works ✅
This commit is a from-scratch rewrite of `ROW` with the primary goal to get
rid of the rather bodgy `UnicodeStorage` class and improve Unicode support.
Previously a 120x9001 terminal buffer would store a vector of 9001 `ROW`s
where each `ROW` stored exactly 120 `wchar_t`. Glyphs exceeding their
allocated space would be stored in the `UnicodeStorage` which was basically
a `hashmap<Coordinate, String>`. Iterating over the text in a `ROW` would
require us to check each glyph and fetch it from the map conditionally.
On newlines we'd have to invalidate all map entries that are now gone,
so for every invalidated `ROW` we'd iterate through all glyphs again and if
a single one was stored in `UnicodeStorage`, we'd then iterate through the
entire hashmap to remove all coordinates that were residing on that `ROW`.
All in all, this wasn't the most robust nor performant code.
The new implementation is simple (from a design perspective):
Store all text in a `ROW` in a regular string. Grow the string if needed.
The association between columns and text works by storing character offsets
in a column-wide array. This algorithm is <100 LOC and removes ~1000.
As an aside this PR does a few more things that go hand in hand:
* Remove most of `ROW` helper classes, which aren't needed anymore.
* Allocate backing memory in a single `VirtualAlloc` call.
* Rewrite `IsCursorDoubleWidth` to use `DbcsAttrAt` directly.
Improves overall performance by 10-20% and makes this implementation
faster than the previous NxM storage, despite the added complexity.
Part of #8000
## Validation Steps Performed
* Existing and new unit and feature tests complete ✅
* Printing Unicode completes without crashing ✅
* Resizing works without crashing ✅
Previously this project used a great variety of types to present text buffer
coordinates: `short`, `unsigned short`, `int`, `unsigned int`, `size_t`,
`ptrdiff_t`, `COORD`/`SMALL_RECT` (aka `short`), and more.
This massive commit migrates almost all use of those types over to the
centralized types `til::point`/`size`/`rect`/`inclusive_rect` and their
underlying type `til::CoordType` (aka `int32_t`).
Due to the size of the changeset and statistics I expect it to contain bugs.
The biggest risk I see is that some code potentially, maybe implicitly, expected
arithmetic to be mod 2^16 and that this code now allows it to be mod 2^32.
Any narrowing into `short` later on would then throw exceptions.
## PR Checklist
* [x] Closes#4015
* [x] I work here
* [x] Tests added/passed
## Validation Steps Performed
Casual usage of OpenConsole and Windows Terminal. ✅
#4015 requires sweeping changes in order to allow a migration of our buffer
coordinates from `int16_t` to `int32_t`. This commit reduces the size of
future commits by using type inference wherever possible, dropping the
need to manually adjust types throughout the project later.
As an added bonus this commit standardizes the alignment of cv qualifiers
to be always left of the type (e.g. `const T&` instead of `T const&`).
The migration to type inference with `auto` was mostly done
using JetBrains Resharper with some manual intervention and the
standardization of cv qualifier alignment using clang-format 14.
## References
This is preparation work for #4015.
## Validation Steps Performed
* Tests pass ✅
855e136 contains a regression which breaks buffer reflow if wide surrogate
characters are present. This happens because we made use of the
`TextBufferCellIterator` whose increment operator skips 2 cells for wide
characters. This created a "misalignment" in the reflow logic which was written
for cell-wise iteration. This commit fixes the issue, by reverting back to the
previous algorithm without iterators.
Closes#12837
Closes MSFT-38904421
## Validation Steps Performed
* Run ``pwsh -noprofile -command echo "`u{D83D}`u{DE43}"``
* Resizing conhost preserves all contents ✅
* Resizing Windows Terminal doesn't crash it ✅
* Added a test covering this issue ✅
The recent changes to use gsl::span everywhere added a few bounds checks
along codepaths where we were already checking bounds. Some of them may
be non-obvious to the optimizer, so we can now use til::at to help them
along.
To accomplish this, I've added a new overload of til::at that takes a
span and directly accesses its backing buffer.
We were using std::basic_string_view as a stand-in for std::span so that
we could change over all at once when C++20 dropped with full span
support. That day's not here yet, but as of 54a7fce3e we're using GSL 3,
whose span is C++20-compliant.
This commit replaces every instance of basic_string_view that was not
referring to an actual string with a span of the appropriate type.
I moved the `const` qualifier into span's `T` because while
`basic_string_view.at()` returns `const T&`, `span.at()` returns `T&`
(without the const). I wanted to maintain the invariant that members of
the span were immutable.
* Mechanical Changes
* `sv.at(x)` -> `gsl::at(sp, x)`
* `sv.c{begin,end}` -> `sp.{begin,end}` (span's iterators are const)
I had to replace a `std::basic_string<>` with a `std::vector<>` in
ConImeInfo, and I chose to replace a manual array walk in
ScreenInfoUiaProviderBase with a ranged-for. Please review those
specifically.
This will almost certainly cause a code size regression in Windows
because I'm blowing out all the PGO counts. Whoops.
Related: #3956, #975.
console: switch to /Zc:wchar_t (native wchar_t)
This matches what we use in OpenConsole and makes {fmt} play nice.
I've also removed the workaround we introduced into OutputCellIterator
to work around not using /Zc:wchar_t.
Fixes MSFT:27626309.
Fixes GH-2673.
Retrieved from https://microsoft.visualstudio.com os.2020 OS official/rs_onecore_dep_uxp 1508f7c232ec58bebc37fedfdec3eb8f9bff5502
This is an attempt to simplify the SGR (Select Graphic Rendition)
implementation in conhost, to cut down on the number of methods required
in the `ConGetSet` interface, and pave the way for future improvements
and bug fixes. It already fixes one bug that prevented SGR 0 from being
correctly applied when combined with meta attributes.
* This a first step towards fixing the conpty narrowing bugs in issue
#2661
* I'm hoping the simplification of `ConGetSet` will also help with
#3849.
* Some of the `TextAttribute` refactoring in this PR overlaps with
similar work in PR #1978.
## Detailed Description of the Pull Request / Additional comments
The main point of this PR was to simplify the
`AdaptDispatch::SetGraphicsRendition` implementation. So instead of
having it call a half a dozen methods in the `ConGetSet` API, depending
on what kinds of attributes needed to be set, there is now just one call
to get current attributes, and another call to set the new value. All
adjustments to the attributes are made in the `AdaptDispatch` class, in
a simple switch statement.
To help with this refactoring, I also made some change to the
`TextAttribute` class to make it easier to work with. This included
adding a set of methods for setting (and getting) the individual
attribute flags, instead of having the calling code being exposed to the
internal attribute structures and messing with bit manipulation. I've
tried to get rid of any methods that were directly setting legacy, meta,
and extended attributes.
Other than the fix to the `SGR 0` bug, the `AdaptDispatch` refactoring
mostly follows the behaviour of the original code. In particular, it
still maps the `SGR 38/48` indexed colors to RGB instead of retaining
the index, which is what we ultimately need it to do. Fixing that will
first require the color tables to be unified (issue #1223), which I'm
hoping to address in a followup PR.
But for now, mapping the indexed colors to RGB values required adding an
an additional `ConGetSet` API to lookup the color table entries. In the
future that won't be necessary, but the API will still be useful for
other color reporting operations that we may want to support. I've made
this API, and the existing setter, standardise on index values being in
the "Xterm" order, since that'll be essential for unifying the code with
the terminal adapter one day.
I should also point out one minor change to the `SGR 38/48` behavior,
which is that out-of-range RGB colors are now ignored rather than being
clamped, since that matches the way Xterm works.
## Validation Steps Performed
This refactoring has obviously required corresponding changes to the
unit tests, but most were just minor updates to use the new
`TextAttribute` methods without any real change in behavior. However,
the adapter tests did require significant changes to accommodate the new
`ConGetSet` API. The basic structure of the tests remain the same, but
the simpler API has meant fewer values needed to be checked in each test
case. I think they are all still covering the areas there were intended
to, though, and they are all still passing.
Other than getting the unit tests to work, I've also done a bunch of
manual testing of my own. I've made sure the color tests in Vttest all
still work as well as they used to. And I've confirmed that the test
case from issue #5341 is now working correctly.
Closes#5341
This commit introduces a github action to check our spelling and fixes
the following misspelled words so that we come up green.
It also renames TfEditSes to TfEditSession, because Ses is not a word.
currently, excerpt, fallthrough, identified, occurred, propagate,
provided, rendered, resetting, separate, succeeded, successfully,
terminal, transferred, adheres, breaks, combining, preceded,
architecture, populated, previous, setter, visible, window, within,
appxmanifest, hyphen, control, offset, powerpoint, suppress, parsing,
prioritized, aforementioned, check in, build, filling, indices, layout,
mapping, trying, scroll, terabyte, vetoes, viewport, whose
…. It will return a replacement character at that point if it was given bad data. #788
<!-- Enter a brief description/summary of your PR here. What does it fix/what does it change/how was it tested (even manually, if necessary)? -->
## Summary of the Pull Request
This modifies the parser used while inserting text into the underlying data buffer to never return an empty sequence. The empty sequence is invalid as you can't insert a "nothing" into the buffer. The buffer asserted this with a fail fast crash. Now we will instead insert U+FFFD (the Unicode replacement character) � to symbolize that something was invalid and has been replaced.
<!-- Please review the items on the PR checklist before submitting-->
## PR Checklist
* [x] Closes#788 and internal MSFT: 20990158
* [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA
* [x] Tests added/passed
* [x] Requires documentation to be updated
* [x] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #788
<!-- Provide a more detailed description of the PR, other things fixed or any additional comments/features here -->
## Detailed Description of the Pull Request / Additional comments
The solution here isn't perfect and isn't going to solve all of our problems. I was basically trying to stop the crash while not getting in the way of the other things coming down the pipe for the input channels.
I considered the following:
1. Remove the fail fast assertion from the buffer
- I didn't want to do this because it really is invalid to get all the way to placing the text down into the buffer and then request a string of 0 length get inserted. I feel the fail fast is a good indication that something is terribly wrong elsewhere that should be corrected.
2. Update the UTF16 parser in order to stop returning empty strings
- This is what I ultimately did. If it would ever return just a lead, it returns �. If it would ever return just a trail, it returns �. Otherwise it will return them as a pair if they're both there, or it will return a single valid codepoint. I am now assuming that if the parse function is being called in an Output Iterator and doesn't contain a string with all pieces of the data that are needed, that someone at a higher level messed up the data, it is in valid, and it should be repaired into replacements.
- This then will move the philosophy up out of the buffer layer to make folks inserting into the buffer identify half a sequence (if they're sitting on a stream where this circumstance could happen... one `wchar_t` at a time) and hold onto it until the next bit arrives. This is because there can be many different routes into the buffer from many different streams/channels. So buffering it low, right near the insertion point, is bad as it might pair loose `wchar_t` across stream entrypoints.
3. Update the iterator, on creating views, to disallow/transform empty strings.
- I considered this solution as well, but it would have required, under some circumstances, a second parsing of the string to identify lead/trail status from outside the `Utf16Parser` class to realize when to use the � character. So I avoided the double-parse.
4. Change the cooked read classes to identify that they pulled the lead `wchar_t` from a sequence then try to pull another one.
- I was going to attempt this, but @adiviness said that he tried it and it made all sorts of other weirdness happen with the edit line.
- Additionally, @adiviness has an outstanding series of effort to make cooked read significantly less horrible and disgusting. I didn't want to get in the way here.
5. Change the `GetChar` method off of the input buffer queue to return a `char32_t`, a `wstring_view`, transform a standalone lead/trail, etc.
- The `GetChar` method is used by several different accessors and API calls to retrieve information off of the input queue, transforming the Key events into straight up characters. To change this at that level would change them all. Long-term, it is probably warranted to do so as all of those consumers likely need to become aware of handling UTF-16 surrogates before we can declare victory. But two problems.
1. This gets in the way of @adiviness work on cooked read data
2. This goes WAY beyond the scope of what I want to accomplish here as the immediate goal is to stop the crash, not fix the world.
I've validated this by:
1. Writing some additional tests against the Utf16Parser to simulate some of the theoretical sequences that could arrive and need to be corrected into replacement characters per a verbal discussion and whiteboarding with @adiviness.
2. Manually triggered the emoji panel and inserted a bunch of emoji. Then seeked around left and right, deleted assorted points with the backspace key, pressed enter to commit, and used the up-arrow history to recommit them to see what happened. There were no crashes. The behavior is still weird and not great... but outside the scope of no crashy crashy.