Processing large data is extremely slow (in VMware) #4283

Closed
opened 2026-01-30 23:42:54 +00:00 by claunia · 6 comments
Owner

Originally created by @egmontkob on GitHub (Oct 5, 2019).

Environment

Windows build number: Win32NT 10.0.18362.0
Windows Terminal version (if applicable): 0.5.2762.0

inside VMware Workstation Player 15.5.0 on Ubuntu 19.10

Core i5-6200U @ 2.30GHz

Steps to reproduce / Actual behavior

cat'ing my favorite test file for speed measurement takes extremely long time.

Inside VMware, in Windows Terminal it takes about 10 minutes of wall clock time.

On Linux (inside the same VMware) it takes from 2 to 22 seconds mesasured in various terminal emulators, namely: Konsole (KDE), Pterm (PuTTY), St (suckless), Terminology (Enlightenment), Urxvt, VTE, Xterm.

The test file is ~42MiB large, contains ~667k lines. It's the output of a ls -lR --color=always / on Ubuntu. (I'm not attaching it since it could leak private stuff, plus it would be a pointless waste of storage space.)

cat is executed either locally in PowerShell, or remotely over ssh to my host computer, it doesn't matter.

On the previous version of WT which I installed about a week ago, it took about 9:22 (the same time twice) to cat this file at the terminal's default size. If the terminal was iconified, or I switched to another (idle) tab, the time dropped to 6:50-ish. Interestingly, in a tiny but visible terminal (approx. 30x4) the time increased to 11 minutes. In a giant terminal (maximized with pretty small font) the time hardly increased, to 9:37.

With the current WT version 0.5.2762.0 now I'm seeing even larger numbers: 10:39 at the default size (measured only once), 7:30-ish in minimized window or when viewing another tab.

The exact times probably don't matter too much, we're talking about the magnitudes here, it's ~100x slower than VTE for example with its 5.2 seconds if it's in a good mood.

The given example is sure an extreme one (why would anyone cat such a giant file?), but smaller files, such as /etc/services already take a noticeable ~0.5 seconds, whereas on Linux terminals it's instantaneous. For verbose compilations of large projects, this could actually cause a noticeable productivity loss for developers.

I don't know whether running under VMware (e.g. no hardware graphics acceleration) is relevant, but again, the Linux numbers were also measured inside VMware, on a Fedora 30 guest.

Expected behavior

Windows Terminal should be comparably fast to most graphical terminals on *NIX.

Since I don't know the reason for the slowness (which needs to be investigated first), and I don't know whether it's specific to VMware, these are random guesses only and might completely miss the actual problem:

The terminal should read and parse as much data as possible, only stopping for updating its UI according to the monitor refresh rate, typically 60 times per second (or maybe at a hardwired 60Hz if refresh rate can't be detrmined or the concept doesn't exist – I don't know how it goes in VMware). If updating the UI takes so much time that there's hardly any time left for processing incoming data, it should start dropping frames (VTE counterpart, kind of). In iconified state or when another tab is selected, it shouldn't spend any time on drawing.


Note that I've checked other bugreports about slowness, e.g. #1064, but they don't seem to be about this kind of extreme slowness.

Originally created by @egmontkob on GitHub (Oct 5, 2019). <!-- 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨 I ACKNOWLEDGE THE FOLLOWING BEFORE PROCEEDING: 1. If I delete this entire template and go my own path, the core team may close my issue without further explanation or engagement. 2. If I list multiple bugs/concerns in this one issue, the core team may close my issue without further explanation or engagement. 3. If I write an issue that has many duplicates, the core team may close my issue without further explanation or engagement (and without necessarily spending time to find the exact duplicate ID number). 4. If I leave the title incomplete when filing the issue, the core team may close my issue without further explanation or engagement. 5. If I file something completely blank in the body, the core team may close my issue without further explanation or engagement. All good? Then proceed! --> <!-- This bug tracker is monitored by Windows Terminal development team and other technical folks. **Important: When reporting BSODs or security issues, DO NOT attach memory dumps, logs, or traces to Github issues**. Instead, send dumps/traces to secure@microsoft.com, referencing this GitHub issue. If this is an application crash, please also provide a Feedback Hub submission link so we can find your diagnostic data on the backend. Use the category "Apps > Windows Terminal (Preview)" and choose "Share My Feedback" after submission to get the link. Please use this form and describe your issue, concisely but precisely, with as much detail as possible. --> # Environment ```none Windows build number: Win32NT 10.0.18362.0 Windows Terminal version (if applicable): 0.5.2762.0 inside VMware Workstation Player 15.5.0 on Ubuntu 19.10 Core i5-6200U @ 2.30GHz ``` # Steps to reproduce / Actual behavior `cat`'ing my favorite test file for speed measurement takes extremely long time. Inside VMware, in Windows Terminal it takes about 10 minutes of wall clock time. On Linux (inside the same VMware) it takes from 2 to 22 seconds mesasured in various terminal emulators, namely: Konsole (KDE), Pterm (PuTTY), St (suckless), Terminology (Enlightenment), Urxvt, VTE, Xterm. The test file is ~42MiB large, contains ~667k lines. It's the output of a `ls -lR --color=always /` on Ubuntu. (I'm not attaching it since it could leak private stuff, plus it would be a pointless waste of storage space.) `cat` is executed either locally in PowerShell, or remotely over ssh to my host computer, it doesn't matter. On the previous version of WT which I installed about a week ago, it took about 9:22 (the same time twice) to `cat` this file at the terminal's default size. If the terminal was iconified, or I switched to another (idle) tab, the time dropped to 6:50-ish. Interestingly, in a tiny but visible terminal (approx. 30x4) the time increased to 11 minutes. In a giant terminal (maximized with pretty small font) the time hardly increased, to 9:37. With the current WT version 0.5.2762.0 now I'm seeing even larger numbers: 10:39 at the default size (measured only once), 7:30-ish in minimized window or when viewing another tab. The exact times probably don't matter too much, we're talking about the magnitudes here, it's ~100x slower than VTE for example with its 5.2 seconds if it's in a good mood. The given example is sure an extreme one (why would anyone cat such a giant file?), but smaller files, such as `/etc/services` already take a noticeable ~0.5 seconds, whereas on Linux terminals it's instantaneous. For verbose compilations of large projects, this could actually cause a noticeable productivity loss for developers. I don't know whether running under VMware (e.g. no hardware graphics acceleration) is relevant, but again, the Linux numbers were also measured inside VMware, on a Fedora 30 guest. # Expected behavior Windows Terminal should be comparably fast to most graphical terminals on *NIX. Since I don't know the reason for the slowness (which needs to be investigated first), and I don't know whether it's specific to VMware, these are random guesses only and might completely miss the actual problem: The terminal should read and parse as much data as possible, only stopping for updating its UI according to the monitor refresh rate, typically 60 times per second (or maybe at a hardwired 60Hz if refresh rate can't be detrmined or the concept doesn't exist – I don't know how it goes in VMware). If updating the UI takes so much time that there's hardly any time left for processing incoming data, it should start dropping frames ([VTE counterpart, kind of](https://bugzilla.gnome.org/show_bug.cgi?id=730732)). In iconified state or when another tab is selected, it shouldn't spend any time on drawing. --- Note that I've checked other bugreports about slowness, e.g. #1064, but they don't seem to be about this kind of extreme slowness.
Author
Owner

@DHowett-MSFT commented on GitHub (Apr 22, 2020):

Re-marking this one for post-1.0 instead of pre-1.0. We're making improvements here, such that we can cut down on a bunch of unnecessary rendering, but we're not quite to the point where we can just shuttle all the data through the console driver at a great enough speed to make this "instantaneous". We'll keep investigating after our launch. Thanks for the robust bug reports, as always, @egmontkob.

@DHowett-MSFT commented on GitHub (Apr 22, 2020): Re-marking this one for post-1.0 instead of pre-1.0. We're making improvements here, such that we can cut down on a bunch of unnecessary _rendering_, but we're not quite to the point where we can just shuttle all the data through the console driver at a great enough speed to make this "instantaneous". We'll keep investigating after our launch. Thanks for the robust bug reports, as always, @egmontkob.
Author
Owner

@miniksa commented on GitHub (Jun 30, 2020):

90a24b20b8 is my attempt at experimenting to see if we can make this go even faster by breaking the locking that is occurring here.

For big.txt from #1064 (which is about 6MB), I go from

real    0m3.838s
user    0m0.000s
sys     0m0.150s

to

real    0m0.124s
user    0m0.000s
sys     0m0.113s

The graphical output still takes longer than that, but it's not backing up the actual I/O. Also, it's a super dumb and rough implementation to try to prove whether this is worth pursuing. It's nowhere near ready. But I think it proves that there's a good return on investment to be had in this area by breaking up the locking.

@miniksa commented on GitHub (Jun 30, 2020): https://github.com/microsoft/terminal/commit/90a24b20b8d27edbc8451936d215cf111cfe3164 is my attempt at experimenting to see if we can make this go even faster by breaking the locking that is occurring here. For `big.txt` from #1064 (which is about 6MB), I go from ``` real 0m3.838s user 0m0.000s sys 0m0.150s ``` to ``` real 0m0.124s user 0m0.000s sys 0m0.113s ``` The graphical output still takes longer than that, but it's not backing up the actual I/O. Also, it's a super dumb and rough implementation to try to prove whether this is worth pursuing. It's nowhere near ready. But I think it proves that there's a good return on investment to be had in this area by breaking up the locking.
Author
Owner

@NeKJ commented on GitHub (Jul 30, 2020):

Yes, IMO it's definitely worth it. The display/rendering should never block IO/CPU operations (as it happens now if I understood correctly).

@NeKJ commented on GitHub (Jul 30, 2020): Yes, IMO it's definitely worth it. The display/rendering should never block IO/CPU operations (as it happens now if I understood correctly).
Author
Owner

@miniksa commented on GitHub (Jul 30, 2020):

Yes, IMO it's definitely worth it. The display/rendering should never block IO/CPU operations (as it happens now if I understood correctly).

Well... I can't have it "never" block IO/CPU unless I consume an infinite amount of memory or otherwise optimize the entire pipeline to be balanced.

@miniksa commented on GitHub (Jul 30, 2020): > Yes, IMO it's definitely worth it. The display/rendering should never block IO/CPU operations (as it happens now if I understood correctly). Well... I can't have it "never" block IO/CPU unless I consume an infinite amount of memory or otherwise optimize the entire pipeline to be balanced.
Author
Owner

@Po-wei commented on GitHub (May 7, 2021):

@miniksa
Just a small finding.
Using the same big.txt from #1064 with my computer.
time cat big.txt took

real    11.26s
user    0.00s
sys     1.61s

But if I do a ssh localhost first and then type the same command
it took

real    3.38s
user    0.00s
sys     0.17s

For the rendering, both looks identical for me.
Not sure if ssh do a larger buffer in the background or something

win 10.0.19042.928
terminal 1.8.1032.0
ubuntu 20.04

@Po-wei commented on GitHub (May 7, 2021): @miniksa Just a small finding. Using the same big.txt from #1064 with my computer. time cat big.txt took ``` real 11.26s user 0.00s sys 1.61s ``` But if I do a **ssh localhost** first and then type the same command it took ``` real 3.38s user 0.00s sys 0.17s ``` For the rendering, both looks identical for me. Not sure if ssh do a larger buffer in the background or something win 10.0.19042.928 terminal 1.8.1032.0 ubuntu 20.04
Author
Owner

@lhecker commented on GitHub (Oct 10, 2024):

I've had to modify ls -lR --color=always / to exclude /usr and /mnt in WSL2, because the former contains the WSL driver mount point (= slow), and the latter the NTFS mounts (same). The remaining output is ~10MB (measured with wc -c). It takes roughly 0.8s to produce the output (output redirected to /dev/null).

Printing the output in Windows 11 24H2 with conhost takes roughly 2.4s and about as long in Windows Terminal Preview 1.22. That's not quite as good as your best value of 2s, but that's because of the small WSL2 pipe buffer size of just 4KiB. I've meant to ask them to increase it to 128KiB which should double our performance down to <2s.

I'll be closing this issue then. 😊

@lhecker commented on GitHub (Oct 10, 2024): I've had to modify `ls -lR --color=always /` to exclude `/usr` and `/mnt` in WSL2, because the former contains the WSL driver mount point (= slow), and the latter the NTFS mounts (same). The remaining output is ~10MB (measured with `wc -c`). It takes roughly 0.8s to produce the output (output redirected to `/dev/null`). Printing the output in Windows 11 24H2 with conhost takes roughly 2.4s and about as long in Windows Terminal Preview 1.22. That's not quite as good as your best value of 2s, but that's because of the small WSL2 pipe buffer size of just 4KiB. I've meant to ask them to increase it to 128KiB which should double our performance down to <2s. I'll be closing this issue then. 😊
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/terminal#4283