SevenZip / 7z IReader Implementation? #697

Open
opened 2026-01-29 22:16:04 +00:00 by claunia · 5 comments
Owner

Originally created by @mitchcapper on GitHub (Sep 11, 2025).

I wanted to iterate over the contents and metadata of some archives and sharp compress has largely worked great for that.

I initially was confused when ReaderFactory.Open(stream) didn't work on 7z.

the official docs do cover this but I missed it.

Initially I had then seen they also mention for perf: to use ExtractAllEntries for 7z and rar. Although it seemed to be primarily for writing the files out it seemed it might be a good way to go for performance and it gave an IReader back. It generally seemed to work well but I noticed on some archives the memory usage was huge, only eventually tracing it to the dictionary size (so a 350MB dictionary may result in a 800MB of memory usage). I also noticed just accessing .Entries on the SevenZipArchive didn't have this penalty. I then realized my error of ExtractAllEntries calling LoadEntries vs GetEntries causing the full dictionary load.

As I wanted to minimize the alternate code path for 7z I eventually settled on a custom IReader implementation that worked well for me but may be some problem outside of just performance for its specific exclusion.

internal class Our7ZReader(SevenZipArchive archive) : IReader {
	public bool MoveToNextEntry() {
		if (!inited) {
			inited = true;
			enumerator = archive.Entries?.GetEnumerator();
		}
		return enumerator.MoveNext();
	}
	public EntryStream OpenEntryStream() => new EntryStream(this, enumerator?.Current.OpenEntryStream());
	public void WriteEntryTo(Stream writableStream) => enumerator?.Current.WriteTo(writableStream);
	private bool inited = false;
	private IEnumerator<SevenZipArchiveEntry> enumerator;
	public ArchiveType ArchiveType => archive.Type;
	public IEntry Entry => enumerator?.Current;
	public bool Cancelled { get; private set; }

	public event EventHandler<ReaderExtractionEventArgs<IEntry>> EntryExtractionProgress;
	public event EventHandler<CompressedBytesReadEventArgs> CompressedBytesRead;
	public event EventHandler<FilePartExtractionBeginEventArgs> FilePartExtractionBegin;
	public void Cancel() => Cancelled = true;

	public void Dispose() {
		enumerator?.Dispose();
		enumerator = null;
	}
}

I then just deviate to add detection code prior to my use of ReaderFactory and then only call ReaderFactory if my IReader reader is not set:

} else if (archivePath.EndsWith(".7z", StringComparison.CurrentCultureIgnoreCase)) {
				if (SharpCompress.Archives.SevenZip.SevenZipArchive.IsSevenZipFile(stream)) {
					stream.Seek(0, SeekOrigin.Begin);
					var archive = SharpCompress.Archives.SevenZip.SevenZipArchive.Open(stream);
					toDispose.Add(archive);
					var ourReader = new Our7ZReader(archive);
					reader = ourReader;
					toDispose.Add(ourReader);
				}
}

Again there is probably a reason IReader was avoided, also above OpenEntryStream is not possible to implement as it is as EntryStream's constructor is internal. I did look at using AbstractReader as it avoided that issue but its constructor is also internal so was moot. For my needs I didn't need that but in theory above may work other than access.

Originally created by @mitchcapper on GitHub (Sep 11, 2025). I wanted to iterate over the contents and metadata of some archives and sharp compress has largely worked great for that. I initially was confused when [ReaderFactory.Open(stream)](https://github.com/adamhathcock/sharpcompress/blob/master/USAGE.md#use-readerfactory-to-autodetect-archive-type-and-open-the-entry-stream) didn't work on 7z. the official docs do cover [this](https://github.com/adamhathcock/sharpcompress/blob/master/FORMATS.md#supported-format-table) but I missed it. Initially I had then seen they also mention for perf: to use [ExtractAllEntries](https://github.com/adamhathcock/sharpcompress/blob/master/USAGE.md#extract-all-files-from-a-rar-file-to-a-directory-using-rararchive) for 7z and rar. Although it seemed to be primarily for writing the files out it seemed it might be a good way to go for performance and it gave an IReader back. It generally seemed to work well but I noticed on some archives the memory usage was huge, only eventually tracing it to the dictionary size (so a 350MB dictionary may result in a 800MB of memory usage). I also noticed just accessing .Entries on the SevenZipArchive didn't have this penalty. I then realized my error of ExtractAllEntries calling LoadEntries vs GetEntries causing the full dictionary load. As I wanted to minimize the alternate code path for 7z I eventually settled on a custom IReader implementation that worked well for me but may be some problem outside of just performance for its specific exclusion. ```csharp internal class Our7ZReader(SevenZipArchive archive) : IReader { public bool MoveToNextEntry() { if (!inited) { inited = true; enumerator = archive.Entries?.GetEnumerator(); } return enumerator.MoveNext(); } public EntryStream OpenEntryStream() => new EntryStream(this, enumerator?.Current.OpenEntryStream()); public void WriteEntryTo(Stream writableStream) => enumerator?.Current.WriteTo(writableStream); private bool inited = false; private IEnumerator<SevenZipArchiveEntry> enumerator; public ArchiveType ArchiveType => archive.Type; public IEntry Entry => enumerator?.Current; public bool Cancelled { get; private set; } public event EventHandler<ReaderExtractionEventArgs<IEntry>> EntryExtractionProgress; public event EventHandler<CompressedBytesReadEventArgs> CompressedBytesRead; public event EventHandler<FilePartExtractionBeginEventArgs> FilePartExtractionBegin; public void Cancel() => Cancelled = true; public void Dispose() { enumerator?.Dispose(); enumerator = null; } } ``` I then just deviate to add detection code prior to my use of ReaderFactory and then only call ReaderFactory if my IReader reader is not set: ```csharp } else if (archivePath.EndsWith(".7z", StringComparison.CurrentCultureIgnoreCase)) { if (SharpCompress.Archives.SevenZip.SevenZipArchive.IsSevenZipFile(stream)) { stream.Seek(0, SeekOrigin.Begin); var archive = SharpCompress.Archives.SevenZip.SevenZipArchive.Open(stream); toDispose.Add(archive); var ourReader = new Our7ZReader(archive); reader = ourReader; toDispose.Add(ourReader); } } ``` Again there is probably a reason IReader was avoided, also above OpenEntryStream is not possible to implement as it is as EntryStream's constructor is internal. I did look at using AbstractReader as it avoided that issue but its constructor is also internal so was moot. For my needs I didn't need that but in theory above may work other than access.
claunia added the enhancementup for grabs labels 2026-01-29 22:16:04 +00:00
Author
Owner

@adamhathcock commented on GitHub (Oct 21, 2025):

7Zip cannot have a forward-only reader as the format requires random access usually. There can be multiple streams that are combined files (like a SOLID rar) but there are no quarentees.

I've done https://github.com/adamhathcock/sharpcompress/pull/964 to throw errors on non-solid rars and non-7Zip but I'd want to have a better rule for 7Zip files.

@adamhathcock commented on GitHub (Oct 21, 2025): 7Zip cannot have a forward-only reader as the format requires random access usually. There can be multiple streams that are combined files (like a SOLID rar) but there are no quarentees. I've done https://github.com/adamhathcock/sharpcompress/pull/964 to throw errors on non-solid rars and non-7Zip but I'd want to have a better rule for 7Zip files.
Author
Owner

@mitchcapper commented on GitHub (Oct 21, 2025):

I am confused, IReaderFactory.OpenReader takes a stream and returns an IReader. SevenZipArchive.Open also takes a stream. I understand the idea OpenReader is meant to take any forward only stream but if it isn't seekable then the SevenZipArchive.Open call would fail anyway and it would just show like another failed format I assume. Is the problem the idea someone passes a 7z archive stream that says its a seekable but seeking does something they wouldn't want?

@mitchcapper commented on GitHub (Oct 21, 2025): I am confused, IReaderFactory.OpenReader takes a stream and returns an IReader. SevenZipArchive.Open also takes a stream. I understand the idea OpenReader is meant to take any forward only stream but if it isn't seekable then the SevenZipArchive.Open call would fail anyway and it would just show like another failed format I assume. Is the problem the idea someone passes a 7z archive stream that says its a seekable but seeking does something they wouldn't want?
Author
Owner

@adamhathcock commented on GitHub (Oct 22, 2025):

Is the problem the idea someone passes a 7z archive stream that says its a seekable but seeking does something they wouldn't want?

Not really. The basic tenet I'm trying to follow is "Reader interface for forward-only and Archive interface for seekable." 7Zip has multiple streams and therefore almost needs the ExtractAll with a reader because they're appended files in a compressed stream but it's still seekable.

This kind of thing is why I don't like 7Zip as a format. It tries to be clever to save a few bytes.

@adamhathcock commented on GitHub (Oct 22, 2025): > Is the problem the idea someone passes a 7z archive stream that says its a seekable but seeking does something they wouldn't want? Not really. The basic tenet I'm trying to follow is "Reader interface for forward-only and Archive interface for seekable." 7Zip has multiple streams and therefore almost needs the ExtractAll with a reader because they're appended files in a compressed stream but it's still seekable. This kind of thing is why I don't like 7Zip as a format. It tries to be clever to save a few bytes.
Author
Owner

@mitchcapper commented on GitHub (Oct 22, 2025):

Alright final thought:

Would it be worthwhile to add an interface like IIterable (im sure there is a better name) for archives that can be iterated? It could simply be the parent of the current IReader interface and the only user change would be sevenzip could implement it (and pretty minor code changes to support). This is a great library I just figure the more formats you can use with a generic interface the better.

If not feel free to close this I don't have any other ideas:)

@mitchcapper commented on GitHub (Oct 22, 2025): Alright final thought: Would it be worthwhile to add an interface like IIterable (im sure there is a better name) for archives that can be iterated? It could simply be the parent of the current IReader interface and the only user change would be sevenzip could implement it (and pretty minor code changes to support). This is a great library I just figure the more formats you can use with a generic interface the better. If not feel free to close this I don't have any other ideas:)
Author
Owner

@adamhathcock commented on GitHub (Oct 23, 2025):

The easiest thing might be just breaking the rule for 7Zip and saying a random access stream is required for 7Zip on the Reader interface or something.

The Entries collection is already lazy. I can't remember if it can be used or not for streams that are multiple files or not.

@adamhathcock commented on GitHub (Oct 23, 2025): The easiest thing might be just breaking the rule for 7Zip and saying a random access stream is required for 7Zip on the Reader interface or something. The `Entries` collection is already lazy. I can't remember if it can be used or not for streams that are multiple files or not.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/sharpcompress#697