mirror of
https://github.com/adamhathcock/sharpcompress.git
synced 2026-02-03 21:23:38 +00:00
Memory is leaking when multithreading #673
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @VoidXH on GitHub (Feb 17, 2025).
For a ~2 GB 7z with ~40k files (this one specifically), running the following code will eat 64 GBs of memory in a minute:
Note that the opening of a new archive is needed in the try block, because multithreading a single handle to the archive would make
WriteToDirectorythrow a lot of exceptions.@adamhathcock commented on GitHub (Feb 18, 2025):
It's not leaking if the end of the using recovers it. That said, the 7zip extraction isn't the best. Most of the code should be gone over with stackallocations or pooling.
You're using a lot of duplicate things anyway by extracting an archive then extracting it again?
I would encourage not to use 7zip as a format and stick to zip or gzip or bzip2 or anything else really.
@VoidXH commented on GitHub (Feb 18, 2025):
The end of the using doesn't recover it, all handles are disposed just after use. If this runs single-threaded, the issue is not there. I assume this might be because of some static references.
No, it's just opening the archive again for each extracted file. They only get extracted once. This is needed because if you use the same SevenZipArchive object from two threads, one WriteToDirectory will throw an exception. It only works if you use one object for one write.
7z was chosen by the linked dependency.
@VoidXH commented on GitHub (Feb 18, 2025):
The leak is not caused by the disposal, but doing multithreaded export. The following code also leaks, even if it reuses the handles:
@Morilli commented on GitHub (Feb 19, 2025):
This "leak" might be caused by you not properly
Disposeing theSevenZipArchivecreated:You should either write
using SevenZipArchive handle = SevenZipArchive.Open(archive);or dispose the archive in a finally block, NOT in the try block, as that code will not get hit when an exception occurs.I can not reproduce any memory issues with either the first or second example you posted however. Note that each archive instance will allocate 64MB for the window, so if the parallel code uses 64 threads you're gonna get 64MB*64 = 4GB memory usage.
The most glarious issue here however is the fact that the archive you're trying to extract is a solid archive, meaning you should be using
ExtractAllEntrieson theSevenZipArchiveand iterate over the entries using the reader returned from that function, otherwise extraction performance will be abysmal.Something like this:
That said, there is still an issue where
SevenZipFilePart.GetCompressedStreamwill throw an exception when the file to be extracted is empty (#854), which will cause a lot of exceptions which may be the cause of the perceived memory leak.@VoidXH commented on GitHub (Feb 19, 2025):
This was fixed in the second snippet.
Thanks, I'll try with this one, and if it works with fine performance, I'll close this issue.
@VoidXH commented on GitHub (Feb 19, 2025):
Yup, performance is fine, thanks.
@adamhathcock commented on GitHub (Feb 20, 2025):
Thanks @Morilli