SevenZip / 7z IReader Implementation? #697

New Issue

claunia · 2026-01-29T22:16:04Z

claunia commented

2026-01-29 22:16:04 +00:00

Originally created by @mitchcapper on GitHub (Sep 11, 2025).

I wanted to iterate over the contents and metadata of some archives and sharp compress has largely worked great for that.

I initially was confused when ReaderFactory.Open(stream) didn't work on 7z.

the official docs do cover this but I missed it.

Initially I had then seen they also mention for perf: to use ExtractAllEntries for 7z and rar. Although it seemed to be primarily for writing the files out it seemed it might be a good way to go for performance and it gave an IReader back. It generally seemed to work well but I noticed on some archives the memory usage was huge, only eventually tracing it to the dictionary size (so a 350MB dictionary may result in a 800MB of memory usage). I also noticed just accessing .Entries on the SevenZipArchive didn't have this penalty. I then realized my error of ExtractAllEntries calling LoadEntries vs GetEntries causing the full dictionary load.

As I wanted to minimize the alternate code path for 7z I eventually settled on a custom IReader implementation that worked well for me but may be some problem outside of just performance for its specific exclusion.

internal class Our7ZReader(SevenZipArchive archive) : IReader {
	public bool MoveToNextEntry() {
		if (!inited) {
			inited = true;
			enumerator = archive.Entries?.GetEnumerator();
		}
		return enumerator.MoveNext();
	}
	public EntryStream OpenEntryStream() => new EntryStream(this, enumerator?.Current.OpenEntryStream());
	public void WriteEntryTo(Stream writableStream) => enumerator?.Current.WriteTo(writableStream);
	private bool inited = false;
	private IEnumerator<SevenZipArchiveEntry> enumerator;
	public ArchiveType ArchiveType => archive.Type;
	public IEntry Entry => enumerator?.Current;
	public bool Cancelled { get; private set; }

	public event EventHandler<ReaderExtractionEventArgs<IEntry>> EntryExtractionProgress;
	public event EventHandler<CompressedBytesReadEventArgs> CompressedBytesRead;
	public event EventHandler<FilePartExtractionBeginEventArgs> FilePartExtractionBegin;
	public void Cancel() => Cancelled = true;

	public void Dispose() {
		enumerator?.Dispose();
		enumerator = null;
	}
}

I then just deviate to add detection code prior to my use of ReaderFactory and then only call ReaderFactory if my IReader reader is not set:

} else if (archivePath.EndsWith(".7z", StringComparison.CurrentCultureIgnoreCase)) {
				if (SharpCompress.Archives.SevenZip.SevenZipArchive.IsSevenZipFile(stream)) {
					stream.Seek(0, SeekOrigin.Begin);
					var archive = SharpCompress.Archives.SevenZip.SevenZipArchive.Open(stream);
					toDispose.Add(archive);
					var ourReader = new Our7ZReader(archive);
					reader = ourReader;
					toDispose.Add(ourReader);
				}
}

Again there is probably a reason IReader was avoided, also above OpenEntryStream is not possible to implement as it is as EntryStream's constructor is internal. I did look at using AbstractReader as it avoided that issue but its constructor is also internal so was moot. For my needs I didn't need that but in theory above may work other than access.

Originally created by @mitchcapper on GitHub (Sep 11, 2025). I wanted to iterate over the contents and metadata of some archives and sharp compress has largely worked great for that. I initially was confused when [ReaderFactory.Open(stream)](https://github.com/adamhathcock/sharpcompress/blob/master/USAGE.md#use-readerfactory-to-autodetect-archive-type-and-open-the-entry-stream) didn't work on 7z. the official docs do cover [this](https://github.com/adamhathcock/sharpcompress/blob/master/FORMATS.md#supported-format-table) but I missed it. Initially I had then seen they also mention for perf: to use [ExtractAllEntries](https://github.com/adamhathcock/sharpcompress/blob/master/USAGE.md#extract-all-files-from-a-rar-file-to-a-directory-using-rararchive) for 7z and rar. Although it seemed to be primarily for writing the files out it seemed it might be a good way to go for performance and it gave an IReader back. It generally seemed to work well but I noticed on some archives the memory usage was huge, only eventually tracing it to the dictionary size (so a 350MB dictionary may result in a 800MB of memory usage). I also noticed just accessing .Entries on the SevenZipArchive didn't have this penalty. I then realized my error of ExtractAllEntries calling LoadEntries vs GetEntries causing the full dictionary load. As I wanted to minimize the alternate code path for 7z I eventually settled on a custom IReader implementation that worked well for me but may be some problem outside of just performance for its specific exclusion. ```csharp internal class Our7ZReader(SevenZipArchive archive) : IReader { public bool MoveToNextEntry() { if (!inited) { inited = true; enumerator = archive.Entries?.GetEnumerator(); } return enumerator.MoveNext(); } public EntryStream OpenEntryStream() => new EntryStream(this, enumerator?.Current.OpenEntryStream()); public void WriteEntryTo(Stream writableStream) => enumerator?.Current.WriteTo(writableStream); private bool inited = false; private IEnumerator<SevenZipArchiveEntry> enumerator; public ArchiveType ArchiveType => archive.Type; public IEntry Entry => enumerator?.Current; public bool Cancelled { get; private set; } public event EventHandler<ReaderExtractionEventArgs<IEntry>> EntryExtractionProgress; public event EventHandler<CompressedBytesReadEventArgs> CompressedBytesRead; public event EventHandler<FilePartExtractionBeginEventArgs> FilePartExtractionBegin; public void Cancel() => Cancelled = true; public void Dispose() { enumerator?.Dispose(); enumerator = null; } } ``` I then just deviate to add detection code prior to my use of ReaderFactory and then only call ReaderFactory if my IReader reader is not set: ```csharp } else if (archivePath.EndsWith(".7z", StringComparison.CurrentCultureIgnoreCase)) { if (SharpCompress.Archives.SevenZip.SevenZipArchive.IsSevenZipFile(stream)) { stream.Seek(0, SeekOrigin.Begin); var archive = SharpCompress.Archives.SevenZip.SevenZipArchive.Open(stream); toDispose.Add(archive); var ourReader = new Our7ZReader(archive); reader = ourReader; toDispose.Add(ourReader); } } ``` Again there is probably a reason IReader was avoided, also above OpenEntryStream is not possible to implement as it is as EntryStream's constructor is internal. I did look at using AbstractReader as it avoided that issue but its constructor is also internal so was moot. For my needs I didn't need that but in theory above may work other than access.

claunia added the enhancement up for grabs labels 2026-01-29 22:16:04 +00:00

claunia commented

2026-01-29 22:16:05 +00:00

@adamhathcock commented on GitHub (Oct 21, 2025):

7Zip cannot have a forward-only reader as the format requires random access usually. There can be multiple streams that are combined files (like a SOLID rar) but there are no quarentees.

I've done https://github.com/adamhathcock/sharpcompress/pull/964 to throw errors on non-solid rars and non-7Zip but I'd want to have a better rule for 7Zip files.

@adamhathcock commented on GitHub (Oct 21, 2025): 7Zip cannot have a forward-only reader as the format requires random access usually. There can be multiple streams that are combined files (like a SOLID rar) but there are no quarentees. I've done https://github.com/adamhathcock/sharpcompress/pull/964 to throw errors on non-solid rars and non-7Zip but I'd want to have a better rule for 7Zip files.

claunia commented

2026-01-29 22:16:05 +00:00

@mitchcapper commented on GitHub (Oct 21, 2025):

I am confused, IReaderFactory.OpenReader takes a stream and returns an IReader. SevenZipArchive.Open also takes a stream. I understand the idea OpenReader is meant to take any forward only stream but if it isn't seekable then the SevenZipArchive.Open call would fail anyway and it would just show like another failed format I assume. Is the problem the idea someone passes a 7z archive stream that says its a seekable but seeking does something they wouldn't want?

@mitchcapper commented on GitHub (Oct 21, 2025): I am confused, IReaderFactory.OpenReader takes a stream and returns an IReader. SevenZipArchive.Open also takes a stream. I understand the idea OpenReader is meant to take any forward only stream but if it isn't seekable then the SevenZipArchive.Open call would fail anyway and it would just show like another failed format I assume. Is the problem the idea someone passes a 7z archive stream that says its a seekable but seeking does something they wouldn't want?

claunia commented

2026-01-29 22:16:05 +00:00

@adamhathcock commented on GitHub (Oct 22, 2025):

Is the problem the idea someone passes a 7z archive stream that says its a seekable but seeking does something they wouldn't want?

Not really. The basic tenet I'm trying to follow is "Reader interface for forward-only and Archive interface for seekable." 7Zip has multiple streams and therefore almost needs the ExtractAll with a reader because they're appended files in a compressed stream but it's still seekable.

This kind of thing is why I don't like 7Zip as a format. It tries to be clever to save a few bytes.

@adamhathcock commented on GitHub (Oct 22, 2025): > Is the problem the idea someone passes a 7z archive stream that says its a seekable but seeking does something they wouldn't want? Not really. The basic tenet I'm trying to follow is "Reader interface for forward-only and Archive interface for seekable." 7Zip has multiple streams and therefore almost needs the ExtractAll with a reader because they're appended files in a compressed stream but it's still seekable. This kind of thing is why I don't like 7Zip as a format. It tries to be clever to save a few bytes.

claunia commented

2026-01-29 22:16:06 +00:00

@mitchcapper commented on GitHub (Oct 22, 2025):

Alright final thought:

Would it be worthwhile to add an interface like IIterable (im sure there is a better name) for archives that can be iterated? It could simply be the parent of the current IReader interface and the only user change would be sevenzip could implement it (and pretty minor code changes to support). This is a great library I just figure the more formats you can use with a generic interface the better.

If not feel free to close this I don't have any other ideas:)

@mitchcapper commented on GitHub (Oct 22, 2025): Alright final thought: Would it be worthwhile to add an interface like IIterable (im sure there is a better name) for archives that can be iterated? It could simply be the parent of the current IReader interface and the only user change would be sevenzip could implement it (and pretty minor code changes to support). This is a great library I just figure the more formats you can use with a generic interface the better. If not feel free to close this I don't have any other ideas:)

claunia commented

2026-01-29 22:16:06 +00:00

@adamhathcock commented on GitHub (Oct 23, 2025):

The easiest thing might be just breaking the rule for 7Zip and saying a random access stream is required for 7Zip on the Reader interface or something.

The Entries collection is already lazy. I can't remember if it can be used or not for streams that are multiple files or not.

@adamhathcock commented on GitHub (Oct 23, 2025): The easiest thing might be just breaking the rule for 7Zip and saying a random access stream is required for 7Zip on the Reader interface or something. The `Entries` collection is already lazy. I can't remember if it can be used or not for streams that are multiple files or not.

claunia referenced this issue

2026-01-29 22:19:20 +00:00

[PR #697] Added support for reading comment header for Rar v5 archives #1174

claunia referenced this issue

2026-01-29 22:19:32 +00:00

[PR #784] [MERGED] Dont crash on reading rar5 comment #783 #1221

claunia referenced this issue

2026-01-29 22:19:33 +00:00

[PR #784] Dont crash on reading rar5 comment #783 #1225

Sign in to join this conversation.

Branches Tags

master

release

adam/merge-release-to-master

dependabot/nuget/xunit.v3-3.2.2

adam/more-explode-async

copilot/fix-infinite-loop-rar-archive

adam/data-descriptor-fix

adam/fix-tests-with-proper-rewind

copilot/fix-data-descriptor-stream-bug

adam/lmza-investigation

adam/create-rar-async

adam/async-rar2

copilot/support-multi-threading-path

copilot/sub-pr-1132-again

adam/memory-perf

copilot/add-performance-benchmarking

copilot/sub-pr-1121

copilot/add-password-support-zip-files

copilot/add-so-optimized-zip-support

adam/rar-async-only

copilot/add-buffered-stream-async-read

copilot/sub-pr-1076

copilot/fix-decompression-exception

copilot/fix-archivefactory-issue

copilot/rationalize-sourcestream-volumes

adam/open-async

copilot/add-ace-archive-support

copilot/sub-pr-1040-again

adam/more-async-3

copilot/fix-tararchive-incomplete-iteration

adam/multi-threaded

copilot/sub-pr-1040

adam/awesome-copilot

copilot/fix-ziparchive-extraction-issue

copilot/fix-tararchive-open-crash

copilot/fix-tar-xz-file-reading-issue

copilot/setup-copilot-instructions

copilot/fix-decompression-performance-issue

copilot/convert-stream-access-to-async

adam/enable-agent

adam/async-deflate

adam/async-rar

adam/more-cleanup

adam/zstd

async-2

zstandard

net461-tests

dmg

async

build-netcore3

recycle-memory-stream

presentation

pax

netcore2

zip_encryption

dotnet-tool

tar_redux

native_zlib

Issue-197

system_buffers

TarNames

7zip_sfx

portable_crypto

WinRT

new_7zip

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/sharpcompress#697