[PR #1114] [MERGED] Fix async decompression of .7z files by implementing Memory<byte> ReadAsync overload #1544

Open
opened 2026-01-29 22:21:04 +00:00 by claunia · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/adamhathcock/sharpcompress/pull/1114
Author: @Copilot
Created: 1/6/2026
Status: Merged
Merged: 1/7/2026
Merged by: @adamhathcock

Base: masterHead: copilot/fix-7z-file-decompression-error


📝 Commits (6)

  • 2fde843 Initial plan
  • 7116c0d Add async support to BufferedSubStream for 7zip decompression
  • 0678318 Fix async decompression by implementing Memory ReadAsync overload
  • b9258ad use more ValueTask methods but types are still created because of state machine suspension
  • 833dd7b fix tests and fmt
  • fd968b3 Update src/SharpCompress/IO/ReadOnlySubStream.cs

📊 Changes

5 files changed (+354 additions, -5 deletions)

View changed files

📝 src/SharpCompress/Compressors/LZMA/LZ/LzOutWindow.cs (+25 -3)
📝 src/SharpCompress/Compressors/LZMA/LzmaDecoder.cs (+2 -1)
📝 src/SharpCompress/Compressors/LZMA/LzmaStream.cs (+114 -1)
📝 src/SharpCompress/IO/BufferedSubStream.cs (+74 -0)
tests/SharpCompress.Test/SevenZip/SevenZipArchiveAsyncTests.cs (+139 -0)

📄 Description

Async extraction of .7z files with LZMA/LZMA2 compression threw DataErrorException when using CopyToAsync(), while synchronous CopyTo() worked correctly.

Root Cause

In .NET 6+, ReadExactlyAsync calls ReadAsync(Memory<byte>, CancellationToken). BufferedSubStream only implemented the legacy byte[] overload, causing the base Stream class to fall back to synchronous reads. This corrupted cache state when LZMA's RangeCoder mixed sync ReadByte() calls with async operations.

Changes

  • BufferedSubStream: Added ReadAsync(Memory<byte>, CancellationToken) and RefillCacheAsync() for true async I/O
  • Tests: Added async test coverage for LZMA, LZMA2, Solid, BZip2, and PPMd archives

Example

// This now works correctly with async operations
using var archive = ArchiveFactory.Open(archivePath);
foreach (var entry in archive.Entries.Where(e => !e.IsDirectory))
{
    using var stream = await entry.OpenEntryStreamAsync(cancellationToken);
    await stream.CopyToAsync(outputStream, cancellationToken);  // Previously threw DataErrorException
}

The fix ensures async operations remain async throughout the decompression pipeline, preventing sync-over-async patterns.

Original prompt

This section details on the original issue you should resolve

<issue_title>decompressing big .7z file throws error</issue_title>
<issue_description>lib version 0.42.1
under .net 10

code:

public class SharpCompressExtractor : IArchiveExtractor
{
    public async Task<IReadOnlyCollection<FileInfo>> ExtractAsync(
        string archivePath,
        string destinationDirectory,
        CancellationToken token)
    {
        if (!File.Exists(archivePath))
        {
            throw new FileNotFoundException($"Nie znaleziono archiwum: {archivePath}");
        }

        var extractedFiles = new List<FileInfo>();

        using var archive = ArchiveFactory.Open(archivePath);

        foreach (var entry in archive.Entries)
        {
            if (entry.IsDirectory)
            {
                continue;
            }

            token.ThrowIfCancellationRequested();

            var targetPath = Path.Combine(destinationDirectory, entry.Key);

            var targetDir = Path.GetDirectoryName(targetPath);

            if (!string.IsNullOrEmpty(targetDir) && !Directory.Exists(targetDir))
            {
                Directory.CreateDirectory(targetDir);
            }

            using var sourceStream = await entry.OpenEntryStreamAsync(token);

            var fileOptions = new FileStreamOptions
            {
                Mode = FileMode.Create,
                Access = FileAccess.Write,
                Share = FileShare.None,
                BufferSize = 1_024 * 1_024,
                Options = FileOptions.Asynchronous
            };

            await using var targetStream = new FileStream(targetPath, fileOptions);

            await sourceStream.CopyToAsync(targetStream, token);

            var fileInfo = new FileInfo(targetPath);

            extractedFiles.Add(fileInfo);
        }

        return extractedFiles;
    }
}

usage

            try
            {
                var files = await fileExtractor.ExtractAsync(
    "C:\\whitelist\\20251223.7z",
    "c:/whitelist/out",
    CancellationToken.None);
            }
            catch (Exception ex)
            {
                
            }

Data Error

at SharpCompress.Compressors.LZMA.LzmaStream.d__52.MoveNext()
at SharpCompress.Compressors.LZMA.LzmaStream.d__56.MoveNext()
at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable`1.ConfiguredValueTaskAwaiter.GetResult()
at SharpCompress.IO.ReadOnlySubStream.d__24.MoveNext()
at System.IO.Stream.<g__Core|30_0>d.MoveNext()
at SharpCompressExtractor.d__0.MoveNext() in SharpCompressExtractor.cs

file : probably any from https://www.podatki.gov.pl/narzedzia/white-list/

but for sure exist for this url: https://plikplaski.mf.gov.pl/pliki//20251223.7z

this file perfectly uncompress under win11 using native extractor or 7zip manager
</issue_description>

<agent_instructions>compare the synchronous code path to the new asynchronous code path to reproduce the issue</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@adamhathcock Sounds like the fix I did for https://github.com/adamhathcock/sharpcompress/pull/1081

I can validate next week though.</comment_new>
<comment_new>@adamhathcock
You're right....testing this myself the async path is broken. However, the sync path works.

If you change CopyToAsync to CopytTo then it works for me

gonna look at a real fix though</comment_new>


Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/adamhathcock/sharpcompress/pull/1114 **Author:** [@Copilot](https://github.com/apps/copilot-swe-agent) **Created:** 1/6/2026 **Status:** ✅ Merged **Merged:** 1/7/2026 **Merged by:** [@adamhathcock](https://github.com/adamhathcock) **Base:** `master` ← **Head:** `copilot/fix-7z-file-decompression-error` --- ### 📝 Commits (6) - [`2fde843`](https://github.com/adamhathcock/sharpcompress/commit/2fde8436fbf060bed63ac68bb8feddd3e66534ef) Initial plan - [`7116c0d`](https://github.com/adamhathcock/sharpcompress/commit/7116c0d098a2a9e53f859cb951aabba036004d93) Add async support to BufferedSubStream for 7zip decompression - [`0678318`](https://github.com/adamhathcock/sharpcompress/commit/0678318dde53b8a46fc468d7aba77c300278d2dd) Fix async decompression by implementing Memory<byte> ReadAsync overload - [`b9258ad`](https://github.com/adamhathcock/sharpcompress/commit/b9258ad496ab4ec2566faf85364f0df5d44bfbf2) use more ValueTask methods but types are still created because of state machine suspension - [`833dd7b`](https://github.com/adamhathcock/sharpcompress/commit/833dd7b3a23d61138287a9221e36e6f7754b900d) fix tests and fmt - [`fd968b3`](https://github.com/adamhathcock/sharpcompress/commit/fd968b3f789c142df1815673fb5b2f571449a9d7) Update src/SharpCompress/IO/ReadOnlySubStream.cs ### 📊 Changes **5 files changed** (+354 additions, -5 deletions) <details> <summary>View changed files</summary> 📝 `src/SharpCompress/Compressors/LZMA/LZ/LzOutWindow.cs` (+25 -3) 📝 `src/SharpCompress/Compressors/LZMA/LzmaDecoder.cs` (+2 -1) 📝 `src/SharpCompress/Compressors/LZMA/LzmaStream.cs` (+114 -1) 📝 `src/SharpCompress/IO/BufferedSubStream.cs` (+74 -0) ➕ `tests/SharpCompress.Test/SevenZip/SevenZipArchiveAsyncTests.cs` (+139 -0) </details> ### 📄 Description Async extraction of .7z files with LZMA/LZMA2 compression threw `DataErrorException` when using `CopyToAsync()`, while synchronous `CopyTo()` worked correctly. ## Root Cause In .NET 6+, `ReadExactlyAsync` calls `ReadAsync(Memory<byte>, CancellationToken)`. `BufferedSubStream` only implemented the legacy `byte[]` overload, causing the base `Stream` class to fall back to synchronous reads. This corrupted cache state when LZMA's `RangeCoder` mixed sync `ReadByte()` calls with async operations. ## Changes - **`BufferedSubStream`**: Added `ReadAsync(Memory<byte>, CancellationToken)` and `RefillCacheAsync()` for true async I/O - **Tests**: Added async test coverage for LZMA, LZMA2, Solid, BZip2, and PPMd archives ## Example ```csharp // This now works correctly with async operations using var archive = ArchiveFactory.Open(archivePath); foreach (var entry in archive.Entries.Where(e => !e.IsDirectory)) { using var stream = await entry.OpenEntryStreamAsync(cancellationToken); await stream.CopyToAsync(outputStream, cancellationToken); // Previously threw DataErrorException } ``` The fix ensures async operations remain async throughout the decompression pipeline, preventing sync-over-async patterns. <!-- START COPILOT ORIGINAL PROMPT --> <details> <summary>Original prompt</summary> > > ---- > > *This section details on the original issue you should resolve* > > <issue_title>decompressing big .7z file throws error</issue_title> > <issue_description>lib version 0.42.1 > under .net 10 > > code: > > ``` > public class SharpCompressExtractor : IArchiveExtractor > { > public async Task<IReadOnlyCollection<FileInfo>> ExtractAsync( > string archivePath, > string destinationDirectory, > CancellationToken token) > { > if (!File.Exists(archivePath)) > { > throw new FileNotFoundException($"Nie znaleziono archiwum: {archivePath}"); > } > > var extractedFiles = new List<FileInfo>(); > > using var archive = ArchiveFactory.Open(archivePath); > > foreach (var entry in archive.Entries) > { > if (entry.IsDirectory) > { > continue; > } > > token.ThrowIfCancellationRequested(); > > var targetPath = Path.Combine(destinationDirectory, entry.Key); > > var targetDir = Path.GetDirectoryName(targetPath); > > if (!string.IsNullOrEmpty(targetDir) && !Directory.Exists(targetDir)) > { > Directory.CreateDirectory(targetDir); > } > > using var sourceStream = await entry.OpenEntryStreamAsync(token); > > var fileOptions = new FileStreamOptions > { > Mode = FileMode.Create, > Access = FileAccess.Write, > Share = FileShare.None, > BufferSize = 1_024 * 1_024, > Options = FileOptions.Asynchronous > }; > > await using var targetStream = new FileStream(targetPath, fileOptions); > > await sourceStream.CopyToAsync(targetStream, token); > > var fileInfo = new FileInfo(targetPath); > > extractedFiles.Add(fileInfo); > } > > return extractedFiles; > } > } > ``` > > usage > ``` > try > { > var files = await fileExtractor.ExtractAsync( > "C:\\whitelist\\20251223.7z", > "c:/whitelist/out", > CancellationToken.None); > } > catch (Exception ex) > { > > } > ``` > > Data Error > > at SharpCompress.Compressors.LZMA.LzmaStream.<DecodeChunkHeaderAsync>d__52.MoveNext() > at SharpCompress.Compressors.LZMA.LzmaStream.<ReadAsync>d__56.MoveNext() > at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable`1.ConfiguredValueTaskAwaiter.GetResult() > at SharpCompress.IO.ReadOnlySubStream.<ReadAsync>d__24.MoveNext() > at System.IO.Stream.<<CopyToAsync>g__Core|30_0>d.MoveNext() > at SharpCompressExtractor.<ExtractAsync>d__0.MoveNext() in SharpCompressExtractor.cs > > file : probably any from https://www.podatki.gov.pl/narzedzia/white-list/ > > but for sure exist for this url: https://plikplaski.mf.gov.pl/pliki//20251223.7z > > this file perfectly uncompress under win11 using native extractor or 7zip manager > </issue_description> > > <agent_instructions>compare the synchronous code path to the new asynchronous code path to reproduce the issue</agent_instructions> > > ## Comments on the Issue (you are @copilot in this section) > > <comments> > <comment_new><author>@adamhathcock</author><body> > Sounds like the fix I did for https://github.com/adamhathcock/sharpcompress/pull/1081 > > I can validate next week though.</body></comment_new> > <comment_new><author>@adamhathcock</author><body> > You're right....testing this myself the async path is broken. However, the sync path works. > > If you change `CopyToAsync` to `CopytTo` then it works for me > > gonna look at a real fix though</body></comment_new> > </comments> > </details> <!-- START COPILOT CODING AGENT SUFFIX --> - Fixes adamhathcock/sharpcompress#1086 <!-- START COPILOT CODING AGENT TIPS --> --- ✨ Let Copilot coding agent [set things up for you](https://github.com/adamhathcock/sharpcompress/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
claunia added the pull-request label 2026-01-29 22:21:04 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/sharpcompress#1544