[PR #1087] [CLOSED] Fix async extraction of 7Zip archives with LZMA compression #1509

New Issue

claunia · 2026-01-29T22:20:55Z

claunia commented

2026-01-29 22:20:55 +00:00

📋 Pull Request Information

Original PR: https://github.com/adamhathcock/sharpcompress/pull/1087
Author: @Copilot
Created: 12/26/2025
Status: ❌ Closed

Base: master ← Head: copilot/fix-decompressing-big-archive

📝 Commits (1)

b9b77e0 Initial plan

📄 Description

Async extraction of large 7z files fails with DataErrorException in LzmaStream.ReadAsync. The LZMA decoder has state corruption bugs in its async implementation (in LzmaStream.ReadAsync, Decoder.CodeAsync, and OutWindow operations).

Changes

Added SyncOnlyStream wrapper in SevenZipArchive.cs: Forces async operations to use synchronous equivalents, bypassing buggy LZMA async code paths while maintaining async API surface
Modified SevenZipReader.GetEntryStream(): Wraps each entry's decompression stream in SyncOnlyStream
Modified SevenZipReader.GetEntries(): Creates fresh streams per entry instead of sharing LZMA streams across multiple files

Example

The following code now works correctly for large 7z files:

using var archive = ArchiveFactory.Open(archivePath);

foreach (var entry in archive.Entries)
{
    if (entry.IsDirectory) continue;
    
    using var sourceStream = await entry.OpenEntryStreamAsync(token);
    await using var targetStream = new FileStream(targetPath, options);
    await sourceStream.CopyToAsync(targetStream, token);  // No longer throws DataErrorException
}

Technical Notes

This is a workaround. The proper fix requires repairing the LZMA decoder's async state machine, which requires deep changes to the decoder implementation. The SyncOnlyStream wrapper ensures correctness at minimal performance cost for the 7Zip use case.

Original prompt

This section details on the original issue you should resolve

<issue_title>decompressing big .7z file throws error</issue_title>
<issue_description>lib version 0.42.1
under .net 10

code:
public class SharpCompressExtractor : IArchiveExtractor
{
    public async Task<IReadOnlyCollection<FileInfo>> ExtractAsync(
        string archivePath,
        string destinationDirectory,
        CancellationToken token)
    {
        if (!File.Exists(archivePath))
        {
            throw new FileNotFoundException($"Nie znaleziono archiwum: {archivePath}");
        }

        var extractedFiles = new List<FileInfo>();

        using var archive = ArchiveFactory.Open(archivePath);

        foreach (var entry in archive.Entries)
        {
            if (entry.IsDirectory)
            {
                continue;
            }

            token.ThrowIfCancellationRequested();

            var targetPath = Path.Combine(destinationDirectory, entry.Key);

            var targetDir = Path.GetDirectoryName(targetPath);

            if (!string.IsNullOrEmpty(targetDir) && !Directory.Exists(targetDir))
            {
                Directory.CreateDirectory(targetDir);
            }

            using var sourceStream = await entry.OpenEntryStreamAsync(token);

            var fileOptions = new FileStreamOptions
            {
                Mode = FileMode.Create,
                Access = FileAccess.Write,
                Share = FileShare.None,
                BufferSize = 1_024 * 1_024,
                Options = FileOptions.Asynchronous
            };

            await using var targetStream = new FileStream(targetPath, fileOptions);

            await sourceStream.CopyToAsync(targetStream, token);

            var fileInfo = new FileInfo(targetPath);

            extractedFiles.Add(fileInfo);
        }

        return extractedFiles;
    }
}
usage
            try
            {
                var files = await fileExtractor.ExtractAsync(
    "C:\\whitelist\\20251223.7z",
    "c:/whitelist/out",
    CancellationToken.None);
            }
            catch (Exception ex)
            {
                
            }
Data Error

at SharpCompress.Compressors.LZMA.LzmaStream.d__52.MoveNext()
at SharpCompress.Compressors.LZMA.LzmaStream.d__56.MoveNext()
at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable`1.ConfiguredValueTaskAwaiter.GetResult()
at SharpCompress.IO.ReadOnlySubStream.d__24.MoveNext()
at System.IO.Stream.<g__Core|30_0>d.MoveNext()
at SharpCompressExtractor.d__0.MoveNext() in SharpCompressExtractor.cs

file : probably any from https://www.podatki.gov.pl/narzedzia/white-list/

but for sure exist for this url: https://plikplaski.mf.gov.pl/pliki//20251223.7z

this file perfectly uncompress under win11 using native extractor or 7zip manager
</issue_description>

Comments on the Issue (you are @copilot in this section)
@adamhathcock Sounds like the fix I did for https://github.com/adamhathcock/sharpcompress/pull/1081
I can validate next week though.</comment_new>

Fixes adamhathcock/sharpcompress#1086

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/adamhathcock/sharpcompress/pull/1087 **Author:** [@Copilot](https://github.com/apps/copilot-swe-agent) **Created:** 12/26/2025 **Status:** ❌ Closed **Base:** `master` ← **Head:** `copilot/fix-decompressing-big-archive` --- ### 📝 Commits (1) - [`b9b77e0`](https://github.com/adamhathcock/sharpcompress/commit/b9b77e0b48488b88eb7ec9227d2b7c0b1cc20391) Initial plan ### 📄 Description Async extraction of large 7z files fails with `DataErrorException` in `LzmaStream.ReadAsync`. The LZMA decoder has state corruption bugs in its async implementation (in `LzmaStream.ReadAsync`, `Decoder.CodeAsync`, and `OutWindow` operations). ## Changes - **Added `SyncOnlyStream` wrapper in `SevenZipArchive.cs`**: Forces async operations to use synchronous equivalents, bypassing buggy LZMA async code paths while maintaining async API surface - **Modified `SevenZipReader.GetEntryStream()`**: Wraps each entry's decompression stream in `SyncOnlyStream` - **Modified `SevenZipReader.GetEntries()`**: Creates fresh streams per entry instead of sharing LZMA streams across multiple files ## Example The following code now works correctly for large 7z files: ```csharp using var archive = ArchiveFactory.Open(archivePath); foreach (var entry in archive.Entries) { if (entry.IsDirectory) continue; using var sourceStream = await entry.OpenEntryStreamAsync(token); await using var targetStream = new FileStream(targetPath, options); await sourceStream.CopyToAsync(targetStream, token); // No longer throws DataErrorException } ``` ## Technical Notes This is a workaround. The proper fix requires repairing the LZMA decoder's async state machine, which requires deep changes to the decoder implementation. The `SyncOnlyStream` wrapper ensures correctness at minimal performance cost for the 7Zip use case.  <details> <summary>Original prompt</summary> > > ---- > > *This section details on the original issue you should resolve* > > <issue_title>decompressing big .7z file throws error</issue_title> > <issue_description>lib version 0.42.1 > under .net 10 > > code: > > ``` > public class SharpCompressExtractor : IArchiveExtractor > { > public async Task<IReadOnlyCollection<FileInfo>> ExtractAsync( > string archivePath, > string destinationDirectory, > CancellationToken token) > { > if (!File.Exists(archivePath)) > { > throw new FileNotFoundException($"Nie znaleziono archiwum: {archivePath}"); > } > > var extractedFiles = new List<FileInfo>(); > > using var archive = ArchiveFactory.Open(archivePath); > > foreach (var entry in archive.Entries) > { > if (entry.IsDirectory) > { > continue; > } > > token.ThrowIfCancellationRequested(); > > var targetPath = Path.Combine(destinationDirectory, entry.Key); > > var targetDir = Path.GetDirectoryName(targetPath); > > if (!string.IsNullOrEmpty(targetDir) && !Directory.Exists(targetDir)) > { > Directory.CreateDirectory(targetDir); > } > > using var sourceStream = await entry.OpenEntryStreamAsync(token); > > var fileOptions = new FileStreamOptions > { > Mode = FileMode.Create, > Access = FileAccess.Write, > Share = FileShare.None, > BufferSize = 1_024 * 1_024, > Options = FileOptions.Asynchronous > }; > > await using var targetStream = new FileStream(targetPath, fileOptions); > > await sourceStream.CopyToAsync(targetStream, token); > > var fileInfo = new FileInfo(targetPath); > > extractedFiles.Add(fileInfo); > } > > return extractedFiles; > } > } > ``` > > usage > ``` > try > { > var files = await fileExtractor.ExtractAsync( > "C:\\whitelist\\20251223.7z", > "c:/whitelist/out", > CancellationToken.None); > } > catch (Exception ex) > { > > } > ``` > > Data Error > > at SharpCompress.Compressors.LZMA.LzmaStream.<DecodeChunkHeaderAsync>d__52.MoveNext() > at SharpCompress.Compressors.LZMA.LzmaStream.<ReadAsync>d__56.MoveNext() > at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable`1.ConfiguredValueTaskAwaiter.GetResult() > at SharpCompress.IO.ReadOnlySubStream.<ReadAsync>d__24.MoveNext() > at System.IO.Stream.<<CopyToAsync>g__Core|30_0>d.MoveNext() > at SharpCompressExtractor.<ExtractAsync>d__0.MoveNext() in SharpCompressExtractor.cs > > file : probably any from https://www.podatki.gov.pl/narzedzia/white-list/ > > but for sure exist for this url: https://plikplaski.mf.gov.pl/pliki//20251223.7z > > this file perfectly uncompress under win11 using native extractor or 7zip manager > </issue_description> > > ## Comments on the Issue (you are @copilot in this section) > > <comments> > <comment_new><author>@adamhathcock</author><body> > Sounds like the fix I did for https://github.com/adamhathcock/sharpcompress/pull/1081 > > I can validate next week though.</body></comment_new> > </comments> > </details>  - Fixes adamhathcock/sharpcompress#1086  --- 💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>

claunia added the pull-request label 2026-01-29 22:20:55 +00:00

Sign in to join this conversation.

Branches Tags

master

release

adam/merge-release-to-master

dependabot/nuget/xunit.v3-3.2.2

adam/more-explode-async

copilot/fix-infinite-loop-rar-archive

adam/data-descriptor-fix

adam/fix-tests-with-proper-rewind

copilot/fix-data-descriptor-stream-bug

adam/lmza-investigation

adam/create-rar-async

adam/async-rar2

copilot/support-multi-threading-path

copilot/sub-pr-1132-again

adam/memory-perf

copilot/add-performance-benchmarking

copilot/sub-pr-1121

copilot/add-password-support-zip-files

copilot/add-so-optimized-zip-support

adam/rar-async-only

copilot/add-buffered-stream-async-read

copilot/sub-pr-1076

copilot/fix-decompression-exception

copilot/fix-archivefactory-issue

copilot/rationalize-sourcestream-volumes

adam/open-async

copilot/add-ace-archive-support

copilot/sub-pr-1040-again

adam/more-async-3

copilot/fix-tararchive-incomplete-iteration

adam/multi-threaded

copilot/sub-pr-1040

adam/awesome-copilot

copilot/fix-ziparchive-extraction-issue

copilot/fix-tararchive-open-crash

copilot/fix-tar-xz-file-reading-issue

copilot/setup-copilot-instructions

copilot/fix-decompression-performance-issue

copilot/convert-stream-access-to-async

adam/enable-agent

adam/async-deflate

adam/async-rar

adam/more-cleanup

adam/zstd

async-2

zstandard

net461-tests

dmg

async

build-netcore3

recycle-memory-stream

presentation

pax

netcore2

zip_encryption

dotnet-tool

tar_redux

native_zlib

Issue-197

system_buffers

TarNames

7zip_sfx

portable_crypto

WinRT

new_7zip

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/sharpcompress#1509