[PR #1114] [MERGED] Fix async decompression of .7z files by implementing Memory<byte> ReadAsync overload #1544

New Issue

claunia · 2026-01-29T22:21:04Z

claunia commented

2026-01-29 22:21:04 +00:00

📋 Pull Request Information

Original PR: https://github.com/adamhathcock/sharpcompress/pull/1114
Author: @Copilot
Created: 1/6/2026
Status: ✅ Merged
Merged: 1/7/2026
Merged by: @adamhathcock

Base: master ← Head: copilot/fix-7z-file-decompression-error

📝 Commits (6)

2fde843 Initial plan
7116c0d Add async support to BufferedSubStream for 7zip decompression
0678318 Fix async decompression by implementing Memory ReadAsync overload
b9258ad use more ValueTask methods but types are still created because of state machine suspension
833dd7b fix tests and fmt
fd968b3 Update src/SharpCompress/IO/ReadOnlySubStream.cs

📊 Changes

5 files changed (+354 additions, -5 deletions)

View changed files

📝 src/SharpCompress/Compressors/LZMA/LZ/LzOutWindow.cs (+25 -3)
📝 src/SharpCompress/Compressors/LZMA/LzmaDecoder.cs (+2 -1)
📝 src/SharpCompress/Compressors/LZMA/LzmaStream.cs (+114 -1)
📝 src/SharpCompress/IO/BufferedSubStream.cs (+74 -0)
➕ tests/SharpCompress.Test/SevenZip/SevenZipArchiveAsyncTests.cs (+139 -0)

📄 Description

Async extraction of .7z files with LZMA/LZMA2 compression threw DataErrorException when using CopyToAsync(), while synchronous CopyTo() worked correctly.

Root Cause

In .NET 6+, ReadExactlyAsync calls ReadAsync(Memory<byte>, CancellationToken). BufferedSubStream only implemented the legacy byte[] overload, causing the base Stream class to fall back to synchronous reads. This corrupted cache state when LZMA's RangeCoder mixed sync ReadByte() calls with async operations.

Changes

BufferedSubStream: Added ReadAsync(Memory<byte>, CancellationToken) and RefillCacheAsync() for true async I/O
Tests: Added async test coverage for LZMA, LZMA2, Solid, BZip2, and PPMd archives

Example

// This now works correctly with async operations
using var archive = ArchiveFactory.Open(archivePath);
foreach (var entry in archive.Entries.Where(e => !e.IsDirectory))
{
    using var stream = await entry.OpenEntryStreamAsync(cancellationToken);
    await stream.CopyToAsync(outputStream, cancellationToken);  // Previously threw DataErrorException
}

The fix ensures async operations remain async throughout the decompression pipeline, preventing sync-over-async patterns.

Original prompt

This section details on the original issue you should resolve

<issue_title>decompressing big .7z file throws error</issue_title>
<issue_description>lib version 0.42.1
under .net 10

code:
public class SharpCompressExtractor : IArchiveExtractor
{
    public async Task<IReadOnlyCollection<FileInfo>> ExtractAsync(
        string archivePath,
        string destinationDirectory,
        CancellationToken token)
    {
        if (!File.Exists(archivePath))
        {
            throw new FileNotFoundException($"Nie znaleziono archiwum: {archivePath}");
        }

        var extractedFiles = new List<FileInfo>();

        using var archive = ArchiveFactory.Open(archivePath);

        foreach (var entry in archive.Entries)
        {
            if (entry.IsDirectory)
            {
                continue;
            }

            token.ThrowIfCancellationRequested();

            var targetPath = Path.Combine(destinationDirectory, entry.Key);

            var targetDir = Path.GetDirectoryName(targetPath);

            if (!string.IsNullOrEmpty(targetDir) && !Directory.Exists(targetDir))
            {
                Directory.CreateDirectory(targetDir);
            }

            using var sourceStream = await entry.OpenEntryStreamAsync(token);

            var fileOptions = new FileStreamOptions
            {
                Mode = FileMode.Create,
                Access = FileAccess.Write,
                Share = FileShare.None,
                BufferSize = 1_024 * 1_024,
                Options = FileOptions.Asynchronous
            };

            await using var targetStream = new FileStream(targetPath, fileOptions);

            await sourceStream.CopyToAsync(targetStream, token);

            var fileInfo = new FileInfo(targetPath);

            extractedFiles.Add(fileInfo);
        }

        return extractedFiles;
    }
}
usage
            try
            {
                var files = await fileExtractor.ExtractAsync(
    "C:\\whitelist\\20251223.7z",
    "c:/whitelist/out",
    CancellationToken.None);
            }
            catch (Exception ex)
            {
                
            }
Data Error

at SharpCompress.Compressors.LZMA.LzmaStream.d__52.MoveNext()
at SharpCompress.Compressors.LZMA.LzmaStream.d__56.MoveNext()
at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable`1.ConfiguredValueTaskAwaiter.GetResult()
at SharpCompress.IO.ReadOnlySubStream.d__24.MoveNext()
at System.IO.Stream.<g__Core|30_0>d.MoveNext()
at SharpCompressExtractor.d__0.MoveNext() in SharpCompressExtractor.cs

file : probably any from https://www.podatki.gov.pl/narzedzia/white-list/

but for sure exist for this url: https://plikplaski.mf.gov.pl/pliki//20251223.7z

this file perfectly uncompress under win11 using native extractor or 7zip manager
</issue_description>

<agent_instructions>compare the synchronous code path to the new asynchronous code path to reproduce the issue</agent_instructions>

Comments on the Issue (you are @copilot in this section)
@adamhathcock Sounds like the fix I did for https://github.com/adamhathcock/sharpcompress/pull/1081
I can validate next week though.</comment_new>
<comment_new>@adamhathcock
You're right....testing this myself the async path is broken. However, the sync path works.

If you change CopyToAsync to CopytTo then it works for me

gonna look at a real fix though</comment_new>

Fixes adamhathcock/sharpcompress#1086

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/adamhathcock/sharpcompress/pull/1114 **Author:** [@Copilot](https://github.com/apps/copilot-swe-agent) **Created:** 1/6/2026 **Status:** ✅ Merged **Merged:** 1/7/2026 **Merged by:** [@adamhathcock](https://github.com/adamhathcock) **Base:** `master` ← **Head:** `copilot/fix-7z-file-decompression-error` --- ### 📝 Commits (6) - [`2fde843`](https://github.com/adamhathcock/sharpcompress/commit/2fde8436fbf060bed63ac68bb8feddd3e66534ef) Initial plan - [`7116c0d`](https://github.com/adamhathcock/sharpcompress/commit/7116c0d098a2a9e53f859cb951aabba036004d93) Add async support to BufferedSubStream for 7zip decompression - [`0678318`](https://github.com/adamhathcock/sharpcompress/commit/0678318dde53b8a46fc468d7aba77c300278d2dd) Fix async decompression by implementing Memory<byte> ReadAsync overload - [`b9258ad`](https://github.com/adamhathcock/sharpcompress/commit/b9258ad496ab4ec2566faf85364f0df5d44bfbf2) use more ValueTask methods but types are still created because of state machine suspension - [`833dd7b`](https://github.com/adamhathcock/sharpcompress/commit/833dd7b3a23d61138287a9221e36e6f7754b900d) fix tests and fmt - [`fd968b3`](https://github.com/adamhathcock/sharpcompress/commit/fd968b3f789c142df1815673fb5b2f571449a9d7) Update src/SharpCompress/IO/ReadOnlySubStream.cs ### 📊 Changes **5 files changed** (+354 additions, -5 deletions) <details> <summary>View changed files</summary> 📝 `src/SharpCompress/Compressors/LZMA/LZ/LzOutWindow.cs` (+25 -3) 📝 `src/SharpCompress/Compressors/LZMA/LzmaDecoder.cs` (+2 -1) 📝 `src/SharpCompress/Compressors/LZMA/LzmaStream.cs` (+114 -1) 📝 `src/SharpCompress/IO/BufferedSubStream.cs` (+74 -0) ➕ `tests/SharpCompress.Test/SevenZip/SevenZipArchiveAsyncTests.cs` (+139 -0) </details> ### 📄 Description Async extraction of .7z files with LZMA/LZMA2 compression threw `DataErrorException` when using `CopyToAsync()`, while synchronous `CopyTo()` worked correctly. ## Root Cause In .NET 6+, `ReadExactlyAsync` calls `ReadAsync(Memory<byte>, CancellationToken)`. `BufferedSubStream` only implemented the legacy `byte[]` overload, causing the base `Stream` class to fall back to synchronous reads. This corrupted cache state when LZMA's `RangeCoder` mixed sync `ReadByte()` calls with async operations. ## Changes - **`BufferedSubStream`**: Added `ReadAsync(Memory<byte>, CancellationToken)` and `RefillCacheAsync()` for true async I/O - **Tests**: Added async test coverage for LZMA, LZMA2, Solid, BZip2, and PPMd archives ## Example ```csharp // This now works correctly with async operations using var archive = ArchiveFactory.Open(archivePath); foreach (var entry in archive.Entries.Where(e => !e.IsDirectory)) { using var stream = await entry.OpenEntryStreamAsync(cancellationToken); await stream.CopyToAsync(outputStream, cancellationToken); // Previously threw DataErrorException } ``` The fix ensures async operations remain async throughout the decompression pipeline, preventing sync-over-async patterns.  <details> <summary>Original prompt</summary> > > ---- > > *This section details on the original issue you should resolve* > > <issue_title>decompressing big .7z file throws error</issue_title> > <issue_description>lib version 0.42.1 > under .net 10 > > code: > > ``` > public class SharpCompressExtractor : IArchiveExtractor > { > public async Task<IReadOnlyCollection<FileInfo>> ExtractAsync( > string archivePath, > string destinationDirectory, > CancellationToken token) > { > if (!File.Exists(archivePath)) > { > throw new FileNotFoundException($"Nie znaleziono archiwum: {archivePath}"); > } > > var extractedFiles = new List<FileInfo>(); > > using var archive = ArchiveFactory.Open(archivePath); > > foreach (var entry in archive.Entries) > { > if (entry.IsDirectory) > { > continue; > } > > token.ThrowIfCancellationRequested(); > > var targetPath = Path.Combine(destinationDirectory, entry.Key); > > var targetDir = Path.GetDirectoryName(targetPath); > > if (!string.IsNullOrEmpty(targetDir) && !Directory.Exists(targetDir)) > { > Directory.CreateDirectory(targetDir); > } > > using var sourceStream = await entry.OpenEntryStreamAsync(token); > > var fileOptions = new FileStreamOptions > { > Mode = FileMode.Create, > Access = FileAccess.Write, > Share = FileShare.None, > BufferSize = 1_024 * 1_024, > Options = FileOptions.Asynchronous > }; > > await using var targetStream = new FileStream(targetPath, fileOptions); > > await sourceStream.CopyToAsync(targetStream, token); > > var fileInfo = new FileInfo(targetPath); > > extractedFiles.Add(fileInfo); > } > > return extractedFiles; > } > } > ``` > > usage > ``` > try > { > var files = await fileExtractor.ExtractAsync( > "C:\\whitelist\\20251223.7z", > "c:/whitelist/out", > CancellationToken.None); > } > catch (Exception ex) > { > > } > ``` > > Data Error > > at SharpCompress.Compressors.LZMA.LzmaStream.<DecodeChunkHeaderAsync>d__52.MoveNext() > at SharpCompress.Compressors.LZMA.LzmaStream.<ReadAsync>d__56.MoveNext() > at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable`1.ConfiguredValueTaskAwaiter.GetResult() > at SharpCompress.IO.ReadOnlySubStream.<ReadAsync>d__24.MoveNext() > at System.IO.Stream.<<CopyToAsync>g__Core|30_0>d.MoveNext() > at SharpCompressExtractor.<ExtractAsync>d__0.MoveNext() in SharpCompressExtractor.cs > > file : probably any from https://www.podatki.gov.pl/narzedzia/white-list/ > > but for sure exist for this url: https://plikplaski.mf.gov.pl/pliki//20251223.7z > > this file perfectly uncompress under win11 using native extractor or 7zip manager > </issue_description> > > <agent_instructions>compare the synchronous code path to the new asynchronous code path to reproduce the issue</agent_instructions> > > ## Comments on the Issue (you are @copilot in this section) > > <comments> > <comment_new><author>@adamhathcock</author><body> > Sounds like the fix I did for https://github.com/adamhathcock/sharpcompress/pull/1081 > > I can validate next week though.</body></comment_new> > <comment_new><author>@adamhathcock</author><body> > You're right....testing this myself the async path is broken. However, the sync path works. > > If you change `CopyToAsync` to `CopytTo` then it works for me > > gonna look at a real fix though</body></comment_new> > </comments> > </details>  - Fixes adamhathcock/sharpcompress#1086  --- ✨ Let Copilot coding agent [set things up for you](https://github.com/adamhathcock/sharpcompress/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>

claunia added the pull-request label 2026-01-29 22:21:04 +00:00

Sign in to join this conversation.

Branches Tags

master

release

adam/merge-release-to-master

dependabot/nuget/xunit.v3-3.2.2

adam/more-explode-async

copilot/fix-infinite-loop-rar-archive

adam/data-descriptor-fix

adam/fix-tests-with-proper-rewind

copilot/fix-data-descriptor-stream-bug

adam/lmza-investigation

adam/create-rar-async

adam/async-rar2

copilot/support-multi-threading-path

copilot/sub-pr-1132-again

adam/memory-perf

copilot/add-performance-benchmarking

copilot/sub-pr-1121

copilot/add-password-support-zip-files

copilot/add-so-optimized-zip-support

adam/rar-async-only

copilot/add-buffered-stream-async-read

copilot/sub-pr-1076

copilot/fix-decompression-exception

copilot/fix-archivefactory-issue

copilot/rationalize-sourcestream-volumes

adam/open-async

copilot/add-ace-archive-support

copilot/sub-pr-1040-again

adam/more-async-3

copilot/fix-tararchive-incomplete-iteration

adam/multi-threaded

copilot/sub-pr-1040

adam/awesome-copilot

copilot/fix-ziparchive-extraction-issue

copilot/fix-tararchive-open-crash

copilot/fix-tar-xz-file-reading-issue

copilot/setup-copilot-instructions

copilot/fix-decompression-performance-issue

copilot/convert-stream-access-to-async

adam/enable-agent

adam/async-deflate

adam/async-rar

adam/more-cleanup

adam/zstd

async-2

zstandard

net461-tests

dmg

async

build-netcore3

recycle-memory-stream

presentation

pax

netcore2

zip_encryption

dotnet-tool

tar_redux

native_zlib

Issue-197

system_buffers

TarNames

7zip_sfx

portable_crypto

WinRT

new_7zip

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/sharpcompress#1544