mirror of
https://github.com/adamhathcock/sharpcompress.git
synced 2026-02-03 21:23:38 +00:00
SevenZip / 7z IReader Implementation? #697
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @mitchcapper on GitHub (Sep 11, 2025).
I wanted to iterate over the contents and metadata of some archives and sharp compress has largely worked great for that.
I initially was confused when ReaderFactory.Open(stream) didn't work on 7z.
the official docs do cover this but I missed it.
Initially I had then seen they also mention for perf: to use ExtractAllEntries for 7z and rar. Although it seemed to be primarily for writing the files out it seemed it might be a good way to go for performance and it gave an IReader back. It generally seemed to work well but I noticed on some archives the memory usage was huge, only eventually tracing it to the dictionary size (so a 350MB dictionary may result in a 800MB of memory usage). I also noticed just accessing .Entries on the SevenZipArchive didn't have this penalty. I then realized my error of ExtractAllEntries calling LoadEntries vs GetEntries causing the full dictionary load.
As I wanted to minimize the alternate code path for 7z I eventually settled on a custom IReader implementation that worked well for me but may be some problem outside of just performance for its specific exclusion.
I then just deviate to add detection code prior to my use of ReaderFactory and then only call ReaderFactory if my IReader reader is not set:
Again there is probably a reason IReader was avoided, also above OpenEntryStream is not possible to implement as it is as EntryStream's constructor is internal. I did look at using AbstractReader as it avoided that issue but its constructor is also internal so was moot. For my needs I didn't need that but in theory above may work other than access.
@adamhathcock commented on GitHub (Oct 21, 2025):
7Zip cannot have a forward-only reader as the format requires random access usually. There can be multiple streams that are combined files (like a SOLID rar) but there are no quarentees.
I've done https://github.com/adamhathcock/sharpcompress/pull/964 to throw errors on non-solid rars and non-7Zip but I'd want to have a better rule for 7Zip files.
@mitchcapper commented on GitHub (Oct 21, 2025):
I am confused, IReaderFactory.OpenReader takes a stream and returns an IReader. SevenZipArchive.Open also takes a stream. I understand the idea OpenReader is meant to take any forward only stream but if it isn't seekable then the SevenZipArchive.Open call would fail anyway and it would just show like another failed format I assume. Is the problem the idea someone passes a 7z archive stream that says its a seekable but seeking does something they wouldn't want?
@adamhathcock commented on GitHub (Oct 22, 2025):
Not really. The basic tenet I'm trying to follow is "Reader interface for forward-only and Archive interface for seekable." 7Zip has multiple streams and therefore almost needs the ExtractAll with a reader because they're appended files in a compressed stream but it's still seekable.
This kind of thing is why I don't like 7Zip as a format. It tries to be clever to save a few bytes.
@mitchcapper commented on GitHub (Oct 22, 2025):
Alright final thought:
Would it be worthwhile to add an interface like IIterable (im sure there is a better name) for archives that can be iterated? It could simply be the parent of the current IReader interface and the only user change would be sevenzip could implement it (and pretty minor code changes to support). This is a great library I just figure the more formats you can use with a generic interface the better.
If not feel free to close this I don't have any other ideas:)
@adamhathcock commented on GitHub (Oct 23, 2025):
The easiest thing might be just breaking the rule for 7Zip and saying a random access stream is required for 7Zip on the Reader interface or something.
The
Entriescollection is already lazy. I can't remember if it can be used or not for streams that are multiple files or not.