When decompressing a file in 7z, there will be a problem of performance degradation #559

New Issue

claunia · 2026-01-29T22:13:44Z

claunia commented

2026-01-29 22:13:44 +00:00

Originally created by @ghost on GitHub (Feb 20, 2023).

I have conducted some tests, and sharpcompress does suffer from performance degradation when reading some 7z files.

For example, there are 100 files in a 7z archive. When it reads the previous files, such as the first 10, the speed is very fast, but the more it reads the latter files, the slower the decompression speed will be.

There seems to be a rule that the further the file is in the back, the slower the decompression will be.

For example, if you have a 300MB 7z file, you extract one of the 100 files, and it appears in the 10th file order, you will read it very quickly, but if you only read the 70th file, then Even if they have the same file size, the decompression speed will become very slow.

This problem does not occur in the zip format.

Originally created by @ghost on GitHub (Feb 20, 2023). I have conducted some tests, and sharpcompress does suffer from **performance degradation** when reading some 7z files. For example, there are 100 files in a 7z archive. When it reads the previous files, such as the first 10, the speed is very fast, but the more it reads the latter files, the slower the decompression speed will be. There seems to be a rule that the further the file is in the back, the slower the decompression will be. For example, **if you have a 300MB 7z file**, you extract one of the 100 files, and it appears in the 10th file order, you will read it very quickly, but if you **only** read the **70th** file, then Even if they have the same file size, the decompression speed will become very slow. This problem does not occur in the zip format.

claunia added the enhancement up for grabs labels 2026-01-29 22:13:44 +00:00

claunia commented

2026-01-29 22:13:44 +00:00

@adamhathcock commented on GitHub (Mar 1, 2023):

PRs are welcome. I've been away for personal reasons.

@adamhathcock commented on GitHub (Mar 1, 2023): PRs are welcome. I've been away for personal reasons.

claunia commented

2026-01-29 22:13:45 +00:00

@Nanook commented on GitHub (Mar 1, 2023):

The speed degradation described above could be due to 7zip grouping files with "Solid Block Size" to achieve better compression. Files at the end of these blocks cannot be directly decompressed. Earlier files must be decompressed first.

@Nanook commented on GitHub (Mar 1, 2023): The speed degradation described above could be due to 7zip grouping files with "Solid Block Size" to achieve better compression. Files at the end of these blocks cannot be directly decompressed. Earlier files must be decompressed first.

claunia commented

2026-01-29 22:13:45 +00:00

@ghost commented on GitHub (Mar 2, 2023):

The speed degradation described above could be due to 7zip grouping files with "Solid Block Size" to achieve better compression. Files at the end of these blocks cannot be directly decompressed. Earlier files must be decompressed first.

@Nanook Thank you, I use WinRAR to test the same files (use WinRAR to decompress .7z files), and WinRAR works normally, it does not have this bug.

@ghost commented on GitHub (Mar 2, 2023): > The speed degradation described above could be due to 7zip grouping files with "Solid Block Size" to achieve better compression. Files at the end of these blocks cannot be directly decompressed. Earlier files must be decompressed first. @Nanook Thank you, I use WinRAR to test the same files (use WinRAR to decompress **.7z files**), and WinRAR works normally, it does not have this bug.

claunia commented

2026-01-29 22:13:45 +00:00

@Nanook commented on GitHub (Mar 2, 2023):

It's not necessarily a bug, but rather a trade off between compression and flexibility. Rar also has SOLID mode which makes the full archive non seekable. 7zip's is at least configurable. Just for info :)

@Nanook commented on GitHub (Mar 2, 2023): It's not necessarily a bug, but rather a trade off between compression and flexibility. Rar also has SOLID mode which makes the full archive non seekable. 7zip's is at least configurable. Just for info :)

claunia commented

2026-01-29 22:13:45 +00:00

@ghost commented on GitHub (Mar 3, 2023):

@Nanook No, I meant to use WinRAR to decompress .7z files. It has almost no problem of slowing down. WinRAR is fast at unpacking both .zip and .7z files.

@ghost commented on GitHub (Mar 3, 2023): @Nanook No, I meant to use WinRAR to decompress **.7z** files. It has almost no problem of slowing down. WinRAR is fast at unpacking both .zip and .7z files.

claunia commented

2026-01-29 22:13:45 +00:00

@Nanook commented on GitHub (Mar 3, 2023):

Thanks for the info. Perhaps the SharpCompress implementation can be improved. I'll take a look next time I'm in there.

@Nanook commented on GitHub (Mar 3, 2023): Thanks for the info. Perhaps the SharpCompress implementation can be improved. I'll take a look next time I'm in there.

claunia commented

2026-01-29 22:13:46 +00:00

@ghost commented on GitHub (Mar 3, 2023):

Thank you very much, looks like this is an old bug, detailed data in this post.

post url:
https://github.com/adamhathcock/sharpcompress/issues/399

@ghost commented on GitHub (Mar 3, 2023): Thank you very much, looks like this is an old bug, detailed data in this post. post url: https://github.com/adamhathcock/sharpcompress/issues/399

claunia commented

2026-01-29 22:13:46 +00:00

@Erior commented on GitHub (Mar 12, 2023):

Looking at the cases it seems to be using archive.ExtractAllEntries() vs archive.Entries.Where(entry => !entry.IsDirectory)

Performance might not be great but the latter does a skip to find the entry you try to decompress..the more files in the 7z file the more to redo/decompress to get to your file.

archive.ExtractAllEntries()
while( reader.MoveToNextEntry()) )
....

Does move along a bit faster

@Erior commented on GitHub (Mar 12, 2023): Looking at the cases it seems to be using archive.ExtractAllEntries() vs archive.Entries.Where(entry => !entry.IsDirectory) Performance might not be great but the latter does a skip to find the entry you try to decompress..the more files in the 7z file the more to redo/decompress to get to your file. archive.ExtractAllEntries() while( reader.MoveToNextEntry()) ) .... Does move along a bit faster

claunia commented

2026-01-29 22:13:46 +00:00

@Lombra commented on GitHub (Mar 15, 2023):

Not sure whether this is related, but I've been using SevenZipSharp for my 7z needs, which has an issue of being extremely slow to process solid 7z archives on a file entry level. However, it also has an ExtractArchive method which works as fast as you might expect.

SevenZipSharp is basically a wrapper around 7z.dll, so I don't know whether that's unhelpful. The DLL is not supported on Linux however, which is how I found out about this project.

This archive extracts in an instant using SevenZipSharp:

var archive = new SevenZipExtractor(@"mod.7z");
archive.ExtractArchive(@"C:\temp");

The same archive takes nine seconds to extract using SharpCompress.

var archive = SevenZipArchive.Open(@"mod.7z");
archive.WriteToDirectory(@"C:\temp");

@Lombra commented on GitHub (Mar 15, 2023): Not sure whether this is related, but I've been using SevenZipSharp for my 7z needs, which has an issue of being extremely slow to process solid 7z archives on a file entry level. However, it also has an [ExtractArchive](https://github.com/squid-box/SevenZipSharp/blob/dev/SevenZip/SevenZipExtractor.cs#L1337) method which works as fast as you might expect. SevenZipSharp is basically a wrapper around 7z.dll, so I don't know whether that's unhelpful. The DLL is not supported on Linux however, which is how I found out about this project. This archive extracts in an instant using SevenZipSharp: ```cs var archive = new SevenZipExtractor(@"mod.7z"); archive.ExtractArchive(@"C:\temp"); ``` The same archive takes nine seconds to extract using SharpCompress. ```cs var archive = SevenZipArchive.Open(@"mod.7z"); archive.WriteToDirectory(@"C:\temp"); ```

claunia commented

2026-01-29 22:13:47 +00:00

@bodgit commented on GitHub (May 3, 2023):

I randomly stumbled on this issue, I wrote a similar Golang library for reading .7z archives so I'm familiar with this particular phenomenon.

The most efficient way to extract a .7z archive is to iterate over the files in the order they're stored in the archive. If you offer some sort of random access API to the files and implement it naively then you will get this performance degradation. The problem comes from the fact that in order to read file n in a solid block, you have to read and discard all of the decompressed data for files 0 through n-1. As n increases, you have to read and discard more and more data. This is exacerbated if files near the beginning of the block are quite large and you're only interested in some files near the end. You also can't just seek forward into the compressed stream as the state machine of the decompression routine(s) will be confused.

So if you implement your "extract everything in the archive" API in terms of your random access API, i.e. extracting file 0, then file 1, etc. it will have worse performance the larger the archive becomes. Whereas a dedicated "extract everything" API that iterates over the archive in one shot will be quick relative to the size of the archive.

I had the exact some problem and fixed this in my library by caching the reader of the decompressed data after reading file n, so that when reading any file > n it would use that cached reader instead of recreating it again from scratch. This means that iterating over the files in archive order always results in a cache hit as there's always a cached reader positioned at the end of file n-1. If I were to sort the files in any way then that would potentially introduce performance degradation again; the worst scenario being to sort the files in reverse order to that of the archive.

Hope that helps.

@bodgit commented on GitHub (May 3, 2023): I randomly stumbled on this issue, I wrote a similar Golang library for reading .7z archives so I'm familiar with this particular phenomenon. The most efficient way to extract a .7z archive is to iterate over the files in the order they're stored in the archive. If you offer some sort of random access API to the files and implement it naively then you will get this performance degradation. The problem comes from the fact that in order to read file `n` in a solid block, you have to read and discard all of the decompressed data for files `0` through `n-1`. As `n` increases, you have to read and discard more and more data. This is exacerbated if files near the beginning of the block are quite large and you're only interested in some files near the end. You also can't just seek forward into the compressed stream as the state machine of the decompression routine(s) will be confused. So if you implement your "extract everything in the archive" API in terms of your random access API, i.e. extracting file `0`, then file `1`, etc. it will have worse performance the larger the archive becomes. Whereas a dedicated "extract everything" API that iterates over the archive in one shot will be quick relative to the size of the archive. I had the exact some problem and fixed this in my library by caching the reader of the decompressed data after reading file `n`, so that when reading any file `> n` it would use that cached reader instead of recreating it again from scratch. This means that iterating over the files in archive order always results in a cache hit as there's always a cached reader positioned at the end of file `n-1`. If I were to sort the files in any way then that would potentially introduce performance degradation again; the worst scenario being to sort the files in reverse order to that of the archive. Hope that helps.

claunia commented

2026-01-29 22:13:47 +00:00

@adamhathcock commented on GitHub (May 3, 2023):

CreateReaderForSolidExtraction and SevenZipReader will access things sequentally in the archive.

7z has the worst of both worlds and the IArchive interface might not the best for it

@adamhathcock commented on GitHub (May 3, 2023): `CreateReaderForSolidExtraction` and `SevenZipReader` will access things sequentally in the archive. 7z has the worst of both worlds and the IArchive interface might not the best for it

claunia commented

2026-01-29 22:13:47 +00:00

@FlsZen commented on GitHub (Jul 19, 2023):

I created a PR #750 with the extension method that supports my use case of extracting large .7z files to a new directory. It's super fast compared to WriteToDirectory, but might not be as feature-rich as needed.

@FlsZen commented on GitHub (Jul 19, 2023): I created a PR #750 with the extension method that supports my use case of extracting large `.7z` files to a new directory. It's super fast compared to `WriteToDirectory`, but might not be as feature-rich as needed.

claunia commented

2026-01-29 22:13:48 +00:00

@adamhathcock commented on GitHub (Jul 19, 2023):

Your PR uses a Task.Run which is just putting the thread on a different pool. If you want to do that, fine, but that's beyond the scope of this library. True Async needs to go all the way to the Stream.

If you want a different Extract to happen using the Reader, I'd be up for a PR for that

@adamhathcock commented on GitHub (Jul 19, 2023): Your PR uses a `Task.Run` which is just putting the thread on a different pool. If you want to do that, fine, but that's beyond the scope of this library. True Async needs to go all the way to the Stream. If you want a different Extract to happen using the Reader, I'd be up for a PR for that

claunia referenced this issue

2026-01-29 22:18:58 +00:00

[PR #559] [MERGED] Use Net5, NetCoreApp3.1, NetStandard2.1, NetStandard2.0 only #1090

Sign in to join this conversation.

Branches Tags

master

copilot/fix-rar-extraction-issues

copilot/add-lzwreader-support

adam/add-alternate-compressions

adam/cleanup-options

copilot/fix-openentrystreamasync-memory-issue

release

adam/more-explode-async

copilot/fix-infinite-loop-rar-archive

adam/data-descriptor-fix

adam/fix-tests-with-proper-rewind

copilot/fix-data-descriptor-stream-bug

adam/lmza-investigation

adam/create-rar-async

adam/async-rar2

copilot/support-multi-threading-path

copilot/sub-pr-1132-again

adam/memory-perf

copilot/add-performance-benchmarking

copilot/sub-pr-1121

copilot/add-password-support-zip-files

copilot/add-so-optimized-zip-support

adam/rar-async-only

copilot/add-buffered-stream-async-read

copilot/sub-pr-1076

copilot/fix-decompression-exception

copilot/fix-archivefactory-issue

copilot/rationalize-sourcestream-volumes

adam/open-async

copilot/add-ace-archive-support

copilot/sub-pr-1040-again

adam/more-async-3

copilot/fix-tararchive-incomplete-iteration

adam/multi-threaded

copilot/sub-pr-1040

adam/awesome-copilot

copilot/fix-ziparchive-extraction-issue

copilot/fix-tararchive-open-crash

copilot/fix-tar-xz-file-reading-issue

copilot/setup-copilot-instructions

copilot/fix-decompression-performance-issue

copilot/convert-stream-access-to-async

adam/enable-agent

adam/async-deflate

adam/async-rar

adam/more-cleanup

adam/zstd

async-2

zstandard

net461-tests

dmg

async

build-netcore3

recycle-memory-stream

presentation

pax

netcore2

zip_encryption

dotnet-tool

tar_redux

native_zlib

Issue-197

system_buffers

TarNames

7zip_sfx

portable_crypto

WinRT

new_7zip

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/sharpcompress#559