RendererDtorAndThread unit tests are sometimes hanging #7291

Closed
opened 2026-01-31 01:00:18 +00:00 by claunia · 7 comments
Owner

Originally created by @j4james on GitHub (Apr 5, 2020).

Environment

Windows build number: Version 10.0.18362.657
Windows Terminal version (if applicable): Commit 4f8acb4b9f

Steps to reproduce

In a cmd.exe shell, in the tools directory, run the RendererDtorAndThread tests repeatedly like this:

for /L %i in (1,1,1000) do runut /name:*RendererDtorAndThread*

Expected behavior

The tests should complete successfully every time.

Actual behavior

Invariably one of the runs will hang. It's not that big a deal when testing locally, because it doesn't happen that often, but it's annoying when it causes the CI build to timeout when submitting a PR (which I've had happen a couple of time now).

After reproducing the hang locally, I attaching to the process in a debugger, and found that the RenderThread was stuck waiting for the _hPaintEnabledEvent. I also noticed that the these tests used to have a Sleep which I think was added specifically to prevent this from happening. See here:

8476040481/src/host/ut_host/VtIoTests.cpp (L360-L365)

Uncommenting those Sleep calls seem to fix the problem, although the tests do then take much longer to run.

Originally created by @j4james on GitHub (Apr 5, 2020). # Environment Windows build number: Version 10.0.18362.657 Windows Terminal version (if applicable): Commit 4f8acb4b9fbd61196d0a5287749a58abe0fbaa28 # Steps to reproduce In a _cmd.exe_ shell, in the _tools_ directory, run the `RendererDtorAndThread` tests repeatedly like this: for /L %i in (1,1,1000) do runut /name:*RendererDtorAndThread* # Expected behavior The tests should complete successfully every time. # Actual behavior Invariably one of the runs will hang. It's not that big a deal when testing locally, because it doesn't happen that often, but it's annoying when it causes the CI build to timeout when submitting a PR (which I've had happen a couple of time now). After reproducing the hang locally, I attaching to the process in a debugger, and found that the `RenderThread` was stuck waiting for the `_hPaintEnabledEvent`. I also noticed that the these tests used to have a `Sleep` which I think was added specifically to prevent this from happening. See here: https://github.com/microsoft/terminal/blob/8476040481a7459d3b6deb94b498262c5cfeb5a4/src/host/ut_host/VtIoTests.cpp#L360-L365 Uncommenting those `Sleep` calls seem to fix the problem, although the tests do then take much longer to run.
Author
Owner

@zadjii-msft commented on GitHub (Apr 6, 2020):

Oh hey so this is the real cause behind #4344! Thanks for finding this!

I wonder what the smallest value for the sleep we could put there to make the tests always pass.

Or I mean, we could maybe patch the RenderThread to make sure that this bug isn't possible. Maybe we could add some test event that the RenderThread could wait on at the top of _ThreadProc to make sure the test does the setup correctly before actually waiting on _hPaintEnabledEvent. Locking and waiting is hard and I haven't had much ️ yet so maybe that doesn't make sense.

@zadjii-msft commented on GitHub (Apr 6, 2020): Oh hey so this is the real cause behind #4344! Thanks for finding this! I wonder what the smallest value for the sleep we could put there to make the tests always pass. Or I mean, we could maybe patch the `RenderThread` to make sure that this bug isn't possible. Maybe we could add some test event that the `RenderThread` could wait on at the top of `_ThreadProc` to make sure the test does the setup correctly before actually waiting on `_hPaintEnabledEvent`. Locking and waiting is hard and I haven't had much ☕️ yet so maybe that doesn't make sense.
Author
Owner

@j4james commented on GitHub (Apr 7, 2020):

Oh hey so this is the real cause behind #4344! Thanks for finding this!

I don't think it's the same thing as #4344. In those days I'm sure it was hanging somewhere else. Also, the Sleep that I mentioned above was only commented out in PR #4490, so if that's to blame, that's a more recent issue.

Or I mean, we could maybe patch the RenderThread to make sure that this bug isn't possible.

Yeah, I was thinking the same thing, but I wasn't really sure what was going on in that test, so I thought it best left to someone that knew what they were doing. 😉

@j4james commented on GitHub (Apr 7, 2020): > Oh hey so this is the real cause behind #4344! Thanks for finding this! I don't think it's the same thing as #4344. In those days I'm sure it was hanging somewhere else. Also, the `Sleep` that I mentioned above was only commented out in PR #4490, so if that's to blame, that's a more recent issue. > Or I mean, we could maybe patch the `RenderThread` to make sure that this bug isn't possible. Yeah, I was thinking the same thing, but I wasn't really sure what was going on in that test, so I thought it best left to someone that knew what they were doing. 😉
Author
Owner

@DHowett-MSFT commented on GitHub (Apr 8, 2020):

About the "initialize and destroy a hundred of these objects" tests: I'm wary, and wonder if they are of any use to us. Were they added due to a regression?

@DHowett-MSFT commented on GitHub (Apr 8, 2020): About the "initialize and destroy a hundred of these objects" tests: I'm wary, and wonder if they are of any use to us. Were they added due to a regression?
Author
Owner

@zadjii-msft commented on GitHub (Apr 8, 2020):

Were they added due to a regression?

Not a regression per se, but they were added to catch this exact category of issue. At one point in the past, long before we were on GH, we had tests that would hang just like this seemingly at random. They'd only happen in CI, and only sporadically. These tests helped suss out those bugs, because we would just run the same test a bunch of times to make sure that if the bug existed, we'd definitely hit it in CI.

There was something crazy at the time where dtor'ing these objects in the Unit tests could cause a hard crash, not just a hang. Something about the class having different sizes when it was compiled with UNIT_TESTING=1 vs without, and that leading to us freeing basically random memory.

Both those issues happened at roughly the same time in the past, and I can't really be sure which was the actual origin of these tests, or if the two causes combined for these tests.

I suppose it's not hitting as reliably anymore. The switch off VsTest probably helped quite a bit with that. So we could maybe get rid of this one now. It seems like it's only really a problem in the Render thread tests, and I'm not sure those are really testing anything of value...

@zadjii-msft commented on GitHub (Apr 8, 2020): > Were they added due to a regression? Not a regression _per se_, but they were added to catch this exact category of issue. At one point in the past, long before we were on GH, we had tests that would hang just like this seemingly at random. They'd only happen in CI, and only sporadically. These tests helped suss out those bugs, because we would just run the same test a _bunch_ of times to make sure that if the bug existed, we'd _definitely_ hit it in CI. There was something crazy at the time where dtor'ing these objects in the Unit tests could cause a _hard_ crash, not just a hang. Something about the class having different sizes when it was compiled with `UNIT_TESTING=1` vs without, and that leading to us freeing basically random memory. Both those issues happened at roughly the same time in the past, and I can't really be sure which was the actual origin of these tests, or if the two causes combined for these tests. I suppose it's not hitting as reliably anymore. The switch off VsTest probably helped quite a bit with that. So we could maybe get rid of this one now. It seems like it's only really a problem in the Render thread tests, and I'm not sure those are _really_ testing anything of value...
Author
Owner

@zadjii-msft commented on GitHub (Jan 29, 2021):

Hey we haven't hit this or #4344 in a while now, have we?

@zadjii-msft commented on GitHub (Jan 29, 2021): Hey we haven't hit this or #4344 in a while now, have we?
Author
Owner

@j4james commented on GitHub (Jan 29, 2021):

I haven't been paying much attention to the CI build failures, but I don't recall seeing this in ages, and I can't reproduce it locally with my test case, so I think it's probably OK to close.

@j4james commented on GitHub (Jan 29, 2021): I haven't been paying much attention to the CI build failures, but I don't recall seeing this in ages, and I can't reproduce it locally with my test case, so I think it's probably OK to close.
Author
Owner

@zadjii-msft commented on GitHub (Feb 1, 2021):

Good enough for me, thanks!

@zadjii-msft commented on GitHub (Feb 1, 2021): Good enough for me, thanks!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#7291