[PR #1143] Add opt-in multi-threading support for file-based archive extraction #1581

Open
opened 2026-01-29 22:21:14 +00:00 by claunia · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/adamhathcock/sharpcompress/pull/1143
Author: @Copilot
Created: 1/18/2026
Status: 🔄 Open

Base: masterHead: copilot/support-multi-threading-path


📝 Commits (3)

  • e0a43e9 Initial plan
  • 3e23a6e Add multi-threading support for file-based archives - sync test passing
  • 4a6e523 Add opt-in multi-threading support with SupportsMultiThreadedExtraction flag

📊 Changes

9 files changed (+640 additions, -11 deletions)

View changed files

📝 src/SharpCompress/Archives/AbstractArchive.cs (+13 -0)
📝 src/SharpCompress/Archives/IArchive.cs (+8 -0)
📝 src/SharpCompress/Archives/Rar/SeekableFilePart.cs (+71 -0)
📝 src/SharpCompress/Common/Tar/TarFilePart.cs (+39 -1)
📝 src/SharpCompress/Common/Zip/SeekableZipFilePart.cs (+170 -10)
📝 src/SharpCompress/IO/SourceStream.cs (+24 -0)
📝 src/SharpCompress/Readers/ReaderOptions.cs (+8 -0)
tests/SharpCompress.Test/Tar/TarMultiThreadTests.cs (+115 -0)
tests/SharpCompress.Test/Zip/ZipMultiThreadTests.cs (+192 -0)

📄 Description

When an archive is opened from a FileInfo or file path, multiple threads can now extract different entries concurrently when explicitly enabled. Previously, all extractions shared a single FileStream, causing position conflicts and corruption.

Changes

Core Infrastructure

  • SourceStream: Added CreateIndependentStream(volumeIndex) to create new FileStream instances from tracked FileInfo objects
  • Thread-safe header loading: Added synchronization (lock for sync, SemaphoreSlim for async) to prevent concurrent header load races

Opt-in Multi-threading Support

  • IArchive.SupportsMultiThreadedExtraction: Boolean property indicating if multi-threaded extraction is supported for the archive instance
    • Returns true when archive is opened from a file, multi-threading is enabled, and archive is not SOLID
  • ReaderOptions.EnableMultiThreadedExtraction: Explicit opt-in flag to enable multi-threaded extraction (defaults to false for backward compatibility)

Format Support

Modified entry extraction to create independent streams when file-based and multi-threading is enabled:

  • SeekableZipFilePart: Both header loading and data extraction use independent streams when flag is enabled
  • TarFilePart: Data extraction uses independent streams when flag is enabled
  • SeekableFilePart (Rar): Data extraction uses independent streams when flag is enabled

Fallback Behavior

  • Stream-based archives (opened from Stream instead of FileInfo) continue using shared streams - backward compatible, no multi-threading support
  • File-based archives with multi-threading disabled (default) use existing single-stream behavior

Example Usage

var options = new ReaderOptions { EnableMultiThreadedExtraction = true };
using var archive = ZipArchive.OpenArchive("large.zip", options);

// Check if multi-threading is supported
if (archive.SupportsMultiThreadedExtraction)
{
    // Extract multiple entries concurrently
    Parallel.ForEach(archive.Entries.Where(e => !e.IsDirectory), entry => {
        using var stream = entry.OpenEntryStream();
        using var output = File.Create($"output/{entry.Key}");
        stream.CopyTo(output);
    });
}

Known Limitations

  • Async concurrent operations have timing issues with header loading coordination
  • SOLID archives (some Rar, all 7Zip) should use sequential extraction for performance
  • Streaming compression formats (GZip, BZip2 wrapping Tar) cannot benefit from concurrent extraction
  • Multi-threading is opt-in and disabled by default for backward compatibility

Testing

Added ZipMultiThreadTests and TarMultiThreadTests demonstrating concurrent extraction patterns. Tests verify that:

  • Multi-threading is disabled by default
  • SupportsMultiThreadedExtraction property accurately reflects multi-threading availability
  • Extraction works correctly with and without multi-threading enabled

<issue_title>Support multi-threading when supplying a path/FileInfo</issue_title>
><issue_description>Since the ultimate source is a local file, there's no reason multi-threading won't work for some formats. What is required:
>
> - a format that supports header reading separate from bytes or one that can skip entries (e.g. Zip, Rar, Tar)
> - compressed entries can be read from separate parts simultaneously (probably most?)
> - known non-supporting as things are single compressed streams that usually wrap a Tar (Gzip, BZip2, etc)
> - unknown: "SOLID" support. SOLID Rars and 7ZIps compress headers with entry bytes. 7Zip might be able to work but differently. If the entire Rar is SOLID then it's like a Gzipped Tar which can't really work either.</issue_description>
>
><agent_instructions>make sure things are FileInfo and RandomAccess. Allow multiple independent streams to ensure isolation across threads. new streams should probably seek back to the file to get the stream of bytes</agent_instructions>
>
> ## Comments on the Issue (you are @copilot in this section)
>
>
>

Original prompt

This section details on the original issue you should resolve

<issue_title>Support multi-threading when supplying a path/FileInfo</issue_title>
<issue_description>Since the ultimate source is a local file, there's no reason multi-threading won't work for some formats. What is required:

  • a format that supports header reading separate from bytes or one that can skip entries (e.g. Zip, Rar, Tar)
  • compressed entries can be read from separate parts simultaneously (probably most?)
  • known non-supporting as things are single compressed streams that usually wrap a Tar (Gzip, BZip2, etc)
  • unknown: "SOLID" support. SOLID Rars and 7ZIps compress headers with entry bytes. 7Zip might be able to work but differently. If the entire Rar is SOLID then it's like a Gzipped Tar which can't really work either.</issue_description>

<agent_instructions>make sure things are FileInfo and RandomAccess. Allow multiple independent streams to ensure isolation across threads. new streams should probably seek back to the file to get the stream of bytes</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/adamhathcock/sharpcompress/pull/1143 **Author:** [@Copilot](https://github.com/apps/copilot-swe-agent) **Created:** 1/18/2026 **Status:** 🔄 Open **Base:** `master` ← **Head:** `copilot/support-multi-threading-path` --- ### 📝 Commits (3) - [`e0a43e9`](https://github.com/adamhathcock/sharpcompress/commit/e0a43e97272d6779029cc5ff692a0132ba7295f2) Initial plan - [`3e23a6e`](https://github.com/adamhathcock/sharpcompress/commit/3e23a6e5a6d0af3904b750df39394759d3fdb3a0) Add multi-threading support for file-based archives - sync test passing - [`4a6e523`](https://github.com/adamhathcock/sharpcompress/commit/4a6e5232ae664b06f955ae7b775ce0520cc5a884) Add opt-in multi-threading support with SupportsMultiThreadedExtraction flag ### 📊 Changes **9 files changed** (+640 additions, -11 deletions) <details> <summary>View changed files</summary> 📝 `src/SharpCompress/Archives/AbstractArchive.cs` (+13 -0) 📝 `src/SharpCompress/Archives/IArchive.cs` (+8 -0) 📝 `src/SharpCompress/Archives/Rar/SeekableFilePart.cs` (+71 -0) 📝 `src/SharpCompress/Common/Tar/TarFilePart.cs` (+39 -1) 📝 `src/SharpCompress/Common/Zip/SeekableZipFilePart.cs` (+170 -10) 📝 `src/SharpCompress/IO/SourceStream.cs` (+24 -0) 📝 `src/SharpCompress/Readers/ReaderOptions.cs` (+8 -0) ➕ `tests/SharpCompress.Test/Tar/TarMultiThreadTests.cs` (+115 -0) ➕ `tests/SharpCompress.Test/Zip/ZipMultiThreadTests.cs` (+192 -0) </details> ### 📄 Description When an archive is opened from a `FileInfo` or file path, multiple threads can now extract different entries concurrently when explicitly enabled. Previously, all extractions shared a single `FileStream`, causing position conflicts and corruption. ## Changes ### Core Infrastructure - **`SourceStream`**: Added `CreateIndependentStream(volumeIndex)` to create new `FileStream` instances from tracked `FileInfo` objects - **Thread-safe header loading**: Added synchronization (lock for sync, `SemaphoreSlim` for async) to prevent concurrent header load races ### Opt-in Multi-threading Support - **`IArchive.SupportsMultiThreadedExtraction`**: Boolean property indicating if multi-threaded extraction is supported for the archive instance - Returns `true` when archive is opened from a file, multi-threading is enabled, and archive is not SOLID - **`ReaderOptions.EnableMultiThreadedExtraction`**: Explicit opt-in flag to enable multi-threaded extraction (defaults to `false` for backward compatibility) ### Format Support Modified entry extraction to create independent streams when file-based and multi-threading is enabled: - **`SeekableZipFilePart`**: Both header loading and data extraction use independent streams when flag is enabled - **`TarFilePart`**: Data extraction uses independent streams when flag is enabled - **`SeekableFilePart`** (Rar): Data extraction uses independent streams when flag is enabled ### Fallback Behavior - Stream-based archives (opened from `Stream` instead of `FileInfo`) continue using shared streams - backward compatible, no multi-threading support - File-based archives with multi-threading disabled (default) use existing single-stream behavior ## Example Usage ```csharp var options = new ReaderOptions { EnableMultiThreadedExtraction = true }; using var archive = ZipArchive.OpenArchive("large.zip", options); // Check if multi-threading is supported if (archive.SupportsMultiThreadedExtraction) { // Extract multiple entries concurrently Parallel.ForEach(archive.Entries.Where(e => !e.IsDirectory), entry => { using var stream = entry.OpenEntryStream(); using var output = File.Create($"output/{entry.Key}"); stream.CopyTo(output); }); } ``` ## Known Limitations - **Async concurrent operations** have timing issues with header loading coordination - **SOLID archives** (some Rar, all 7Zip) should use sequential extraction for performance - **Streaming compression formats** (GZip, BZip2 wrapping Tar) cannot benefit from concurrent extraction - **Multi-threading is opt-in** and disabled by default for backward compatibility ## Testing Added `ZipMultiThreadTests` and `TarMultiThreadTests` demonstrating concurrent extraction patterns. Tests verify that: - Multi-threading is disabled by default - `SupportsMultiThreadedExtraction` property accurately reflects multi-threading availability - Extraction works correctly with and without multi-threading enabled <issue_title>Support multi-threading when supplying a path/FileInfo</issue_title> &gt;<issue_description>Since the ultimate source is a local file, there's no reason multi-threading won't work for some formats. What is required: &gt; &gt; - a format that supports header reading separate from bytes or one that can skip entries (e.g. Zip, Rar, Tar) &gt; - compressed entries can be read from separate parts simultaneously (probably most?) &gt; - known non-supporting as things are single compressed streams that usually wrap a Tar (Gzip, BZip2, etc) &gt; - unknown: "SOLID" support. SOLID Rars and 7ZIps compress headers with entry bytes. 7Zip might be able to work but differently. If the entire Rar is SOLID then it's like a Gzipped Tar which can't really work either.</issue_description> &gt; &gt;<agent_instructions>make sure things are FileInfo and RandomAccess. Allow multiple independent streams to ensure isolation across threads. new streams should probably seek back to the file to get the stream of bytes</agent_instructions> &gt; &gt; ## Comments on the Issue (you are @copilot in this section) &gt; &gt;<comments> &gt;</comments> <!-- START COPILOT ORIGINAL PROMPT --> <details> <summary>Original prompt</summary> > > ---- > > *This section details on the original issue you should resolve* > > <issue_title>Support multi-threading when supplying a path/FileInfo</issue_title> > <issue_description>Since the ultimate source is a local file, there's no reason multi-threading won't work for some formats. What is required: > > - a format that supports header reading separate from bytes or one that can skip entries (e.g. Zip, Rar, Tar) > - compressed entries can be read from separate parts simultaneously (probably most?) > - known non-supporting as things are single compressed streams that usually wrap a Tar (Gzip, BZip2, etc) > - unknown: "SOLID" support. SOLID Rars and 7ZIps compress headers with entry bytes. 7Zip might be able to work but differently. If the entire Rar is SOLID then it's like a Gzipped Tar which can't really work either.</issue_description> > > <agent_instructions>make sure things are FileInfo and RandomAccess. Allow multiple independent streams to ensure isolation across threads. new streams should probably seek back to the file to get the stream of bytes</agent_instructions> > > ## Comments on the Issue (you are @copilot in this section) > > <comments> > </comments> > </details> <!-- START COPILOT CODING AGENT SUFFIX --> - Fixes adamhathcock/sharpcompress#1001 <!-- START COPILOT CODING AGENT TIPS --> --- 💬 We'd love your input! Share your thoughts on Copilot coding agent in our [2 minute survey](https://gh.io/copilot-coding-agent-survey). --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
claunia added the pull-request label 2026-01-29 22:21:14 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/sharpcompress#1581