[PR #1169] Fix ZIP parsing failure on non-seekable streams with short reads #1611

Closed
opened 2026-01-29 22:21:22 +00:00 by claunia · 0 comments
Owner

Original Pull Request: https://github.com/adamhathcock/sharpcompress/pull/1169

State: closed
Merged: Yes


Fix ZIP parsing regression with short reads on non-seekable streams

Completed:

  • Understand the issue: BinaryReader doesn't handle short reads properly on non-seekable streams
  • Create a test that reproduces the issue with different chunking patterns
  • Fix the issue by ensuring SharpCompressStream handles short reads correctly
  • Implement buffer fill methods following the ReadFully pattern
  • Update all Read methods (sync and async variants)
  • Verify all ZIP tests pass (424 tests passed)
  • Verify all Reader tests pass (205 tests passed)
  • Apply code formatting with CSharpier
  • Run code review (2 minor comments about parameter validation in private methods - not critical)
  • Run security checks (0 alerts found)
  • Refactor to use ReadFully pattern as requested

Fix Details:

The regression occurred because SharpCompressStream.Read() with buffering enabled would return short reads when the underlying stream returned less data than requested. This caused BinaryReader to receive incomplete data when parsing ZIP headers, leading to ZlibException failures depending on the chunking pattern.

Solution: Modified SharpCompressStream to implement buffer fill methods that follow the ReadFully/ReadFullyAsync pattern from Utility.cs. These methods read from the underlying stream in a loop until the internal buffer is filled or EOF is reached, but return the actual count read (rather than a boolean) to handle EOF gracefully.

Changes made to SharpCompressStream.cs:

  • Modified FillBuffer() to follow the ReadFully pattern (same logic as Utility.ReadFully but returns count)
  • Modified FillBufferAsync() to follow the ReadFullyAsync pattern
  • Modified FillBufferMemoryAsync() to follow the ReadFullyAsync pattern
  • All methods use the same loop-until-full-or-EOF approach as the existing utility methods
  • Updated comments to reference the ReadFully pattern

Test coverage:

  • Created ZipShortReadTests with 9 test cases covering various chunking patterns
  • Includes extreme case of 1 byte at a time which previously failed
  • All 424 ZIP tests pass

Summary

This fix resolves the reported issue where SharpCompress ZIP parsing became sensitive to Stream.Read chunking patterns on non-seekable streams. The solution follows the established ReadFully pattern used elsewhere in the codebase.

Original prompt

This section details on the original issue you should resolve

<issue_title>Regression: ZIP parsing fails depending on Stream.Read chunking on non-seekable streams</issue_title>
<issue_description>Hi,

I’m hitting what looks like a stream chunking sensitivity regression when reading ZIP archives from a non-seekable stream. With the exact same ZIP bytes, SharpCompress will sometimes fail depending only on how the underlying Stream.Read() splits the data.


Regression note

The exact same code path works correctly with SharpCompress 0.40.0.
The failure (ZlibException in the chunked case) starts occurring in newer versions.


Context / real-world scenario

This happens in a real ASP.NET Core streaming pipeline (multipart/form-data):

  • Source stream: HttpRequest.Body
  • Read via MultipartReader (multipart/form-data)
  • Archive entries are processed sequentially using ReaderFactory.Open(...).MoveToNextEntry()
  • Entry streams are non-seekable by design

A seemingly unrelated change (for example changing a text field value from "my-value" to "my-valu") shifts the alignment of the ZIP part by 1 byte, which changes the short-read pattern seen by SharpCompress and triggers a failure.

To make this report independent of ASP.NET / multipart, the repro below uses a custom non-seekable stream that returns legal short reads.


Reproduction

This snippet (with stream.zip) reads the same ZIP bytes three ways:

  1. Baseline: MemoryStream
  2. Non-seekable stream with short reads (first = 3816)
  3. Non-seekable stream with short reads (first = 3815)

Only case (3) fails with a ZlibException. Wrapping the stream with a simple coalescing wrapper fixes the issue.

var bytes = System.IO.File.ReadAllBytes("stream.zip");

Console.WriteLine($"SharpCompress: {typeof(ReaderFactory).Assembly.FullName}");
Console.WriteLine($"Input bytes: {bytes.Length}");

//
// Case 1 - Baseline
// Read the ZIP from a file-like stream (MemoryStream).
// This always works and serves as the reference behavior.
//
Console.WriteLine("\n== Baseline (MemoryStream) ==");
Dump(ReadEntries(new MemoryStream(bytes, writable: false)));

//
// Case 2 - Chunked non-seekable stream (first read = 3816 bytes)
// This simulates a network/multipart stream with legal short reads.
// This case still works.
//
Console.WriteLine("\n== Chunked (first=3816, then=4096) ==");
Dump(ReadEntries(new PatternReadStream(bytes, first: 3816, chunk: 4096)));

try
{
	//
	// Case 3 - Chunked non-seekable stream (first read = 3815 bytes)
	// Exact same input bytes, only the first Read() returns 1 byte less.
	//
	// This case works correctly on SharpCompress 0.40.0,
	// but throws a ZlibException on newer versions.
	//
	Console.WriteLine("\n== Chunked (first=3815, then=4096) ==");
	Dump(ReadEntries(new PatternReadStream(bytes, first: 3815, chunk: 4096)));
}
catch (ZlibException)
{
	//
	// Case 4 - Workaround
	// Wrap the same failing stream with a coalescing wrapper that
	// fills short reads. This makes SharpCompress behave correctly again.
	//
	Console.WriteLine("\n== Workaround: FillReadStream over chunked(3815/4096) ==");
	using (var s = new PatternReadStream(bytes, first: 3815, chunk: 4096))
	using (var fill = new FillReadStream(s))
	{
		Dump(ReadEntries(fill));
	}
}
static List<string> ReadEntries(Stream s)
{
	var names = new List<string>();
	using var reader = ReaderFactory.Open(s, new ReaderOptions { LeaveStreamOpen = true });

	while (reader.MoveToNextEntry())
	{
		if (reader.Entry.IsDirectory)
		{
			continue;
		}

		names.Add(reader.Entry.Key);

		using var es = reader.OpenEntryStream();
		es.CopyTo(Stream.Null);
	}

	return names;
}

static void Dump(List<string> names)
{
	Console.WriteLine($"Count={names.Count}");

	foreach (var n in names.Take(10))
	{
		Console.WriteLine(" - " + n);
	}

	if (names.Count > 10)
	{
		Console.WriteLine(" - ...");
	}
}

sealed class PatternReadStream : Stream
{
    private readonly MemoryStream inner;
    private readonly int first;
    private readonly int chunk;
    private bool firstDone;

    public PatternReadStream(byte[] bytes, int first, int chunk)
    {
        inner = new MemoryStream(bytes, writable: false);
        this.first = first;
        this.chunk = chunk;
    }

    public override int Read(byte[] buffer, int offset, int count)
    {
        int limit = !firstDone ? first : chunk;
        firstDone = true;

        int toRead = Math.Min(count, limit);
        return inner.Read(buffer, offset, toRead);
    }

    public override bool CanRead => true;

    public override bool CanSeek => false;

    public override bool CanWrite => false;

    public override long Length => throw new NotSupportedException();

    public override long Position { get => throw new NotSupportedException(); set => throw new NotSupportedException(); }

    public override void Flush() => throw new NotSupportedEx...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes adamhathcock/sharpcompress#1168

<!-- START COPILOT CODING AGENT TIPS -->
---

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs.
**Original Pull Request:** https://github.com/adamhathcock/sharpcompress/pull/1169 **State:** closed **Merged:** Yes --- ## Fix ZIP parsing regression with short reads on non-seekable streams ### Completed: - [x] Understand the issue: `BinaryReader` doesn't handle short reads properly on non-seekable streams - [x] Create a test that reproduces the issue with different chunking patterns - [x] Fix the issue by ensuring SharpCompressStream handles short reads correctly - [x] Implement buffer fill methods following the ReadFully pattern - [x] Update all Read methods (sync and async variants) - [x] Verify all ZIP tests pass (424 tests passed) - [x] Verify all Reader tests pass (205 tests passed) - [x] Apply code formatting with CSharpier - [x] Run code review (2 minor comments about parameter validation in private methods - not critical) - [x] Run security checks (0 alerts found) - [x] Refactor to use ReadFully pattern as requested ### Fix Details: The regression occurred because `SharpCompressStream.Read()` with buffering enabled would return short reads when the underlying stream returned less data than requested. This caused `BinaryReader` to receive incomplete data when parsing ZIP headers, leading to `ZlibException` failures depending on the chunking pattern. **Solution**: Modified `SharpCompressStream` to implement buffer fill methods that follow the `ReadFully`/`ReadFullyAsync` pattern from `Utility.cs`. These methods read from the underlying stream in a loop until the internal buffer is filled or EOF is reached, but return the actual count read (rather than a boolean) to handle EOF gracefully. **Changes made to `SharpCompressStream.cs`**: - Modified `FillBuffer()` to follow the ReadFully pattern (same logic as `Utility.ReadFully` but returns count) - Modified `FillBufferAsync()` to follow the ReadFullyAsync pattern - Modified `FillBufferMemoryAsync()` to follow the ReadFullyAsync pattern - All methods use the same loop-until-full-or-EOF approach as the existing utility methods - Updated comments to reference the ReadFully pattern **Test coverage**: - Created `ZipShortReadTests` with 9 test cases covering various chunking patterns - Includes extreme case of 1 byte at a time which previously failed - All 424 ZIP tests pass ### Summary This fix resolves the reported issue where SharpCompress ZIP parsing became sensitive to Stream.Read chunking patterns on non-seekable streams. The solution follows the established ReadFully pattern used elsewhere in the codebase. <!-- START COPILOT ORIGINAL PROMPT --> <details> <summary>Original prompt</summary> ---- *This section details on the original issue you should resolve* <issue_title>Regression: ZIP parsing fails depending on Stream.Read chunking on non-seekable streams</issue_title> <issue_description>Hi, I’m hitting what looks like a **stream chunking sensitivity regression** when reading ZIP archives from a **non-seekable stream**. With the exact same ZIP bytes, SharpCompress will sometimes fail depending only on how the underlying `Stream.Read()` splits the data. --- ### Regression note The exact same code path works correctly with **SharpCompress 0.40.0**. The failure (`ZlibException` in the chunked case) starts occurring in newer versions. --- ### Context / real-world scenario This happens in a real ASP.NET Core streaming pipeline (multipart/form-data): - Source stream: `HttpRequest.Body` - Read via `MultipartReader` (multipart/form-data) - Archive entries are processed sequentially using `ReaderFactory.Open(...).MoveToNextEntry()` - Entry streams are non-seekable by design A seemingly unrelated change (for example changing a text field value from `"my-value"` to `"my-valu"`) shifts the alignment of the ZIP part by 1 byte, which changes the short-read pattern seen by SharpCompress and triggers a failure. To make this report independent of ASP.NET / multipart, the repro below uses a custom non-seekable stream that returns **legal short reads**. --- ### Reproduction This snippet (with [stream.zip](https://github.com/user-attachments/files/24886895/stream.zip)) reads the same ZIP bytes three ways: 1. Baseline: `MemoryStream` 2. Non-seekable stream with short reads (first = 3816) 3. Non-seekable stream with short reads (first = 3815) Only case (3) fails with a `ZlibException`. Wrapping the stream with a simple coalescing wrapper fixes the issue. ```cs var bytes = System.IO.File.ReadAllBytes("stream.zip"); Console.WriteLine($"SharpCompress: {typeof(ReaderFactory).Assembly.FullName}"); Console.WriteLine($"Input bytes: {bytes.Length}"); // // Case 1 - Baseline // Read the ZIP from a file-like stream (MemoryStream). // This always works and serves as the reference behavior. // Console.WriteLine("\n== Baseline (MemoryStream) =="); Dump(ReadEntries(new MemoryStream(bytes, writable: false))); // // Case 2 - Chunked non-seekable stream (first read = 3816 bytes) // This simulates a network/multipart stream with legal short reads. // This case still works. // Console.WriteLine("\n== Chunked (first=3816, then=4096) =="); Dump(ReadEntries(new PatternReadStream(bytes, first: 3816, chunk: 4096))); try { // // Case 3 - Chunked non-seekable stream (first read = 3815 bytes) // Exact same input bytes, only the first Read() returns 1 byte less. // // This case works correctly on SharpCompress 0.40.0, // but throws a ZlibException on newer versions. // Console.WriteLine("\n== Chunked (first=3815, then=4096) =="); Dump(ReadEntries(new PatternReadStream(bytes, first: 3815, chunk: 4096))); } catch (ZlibException) { // // Case 4 - Workaround // Wrap the same failing stream with a coalescing wrapper that // fills short reads. This makes SharpCompress behave correctly again. // Console.WriteLine("\n== Workaround: FillReadStream over chunked(3815/4096) =="); using (var s = new PatternReadStream(bytes, first: 3815, chunk: 4096)) using (var fill = new FillReadStream(s)) { Dump(ReadEntries(fill)); } } ``` ```cs static List<string> ReadEntries(Stream s) { var names = new List<string>(); using var reader = ReaderFactory.Open(s, new ReaderOptions { LeaveStreamOpen = true }); while (reader.MoveToNextEntry()) { if (reader.Entry.IsDirectory) { continue; } names.Add(reader.Entry.Key); using var es = reader.OpenEntryStream(); es.CopyTo(Stream.Null); } return names; } static void Dump(List<string> names) { Console.WriteLine($"Count={names.Count}"); foreach (var n in names.Take(10)) { Console.WriteLine(" - " + n); } if (names.Count > 10) { Console.WriteLine(" - ..."); } } sealed class PatternReadStream : Stream { private readonly MemoryStream inner; private readonly int first; private readonly int chunk; private bool firstDone; public PatternReadStream(byte[] bytes, int first, int chunk) { inner = new MemoryStream(bytes, writable: false); this.first = first; this.chunk = chunk; } public override int Read(byte[] buffer, int offset, int count) { int limit = !firstDone ? first : chunk; firstDone = true; int toRead = Math.Min(count, limit); return inner.Read(buffer, offset, toRead); } public override bool CanRead => true; public override bool CanSeek => false; public override bool CanWrite => false; public override long Length => throw new NotSupportedException(); public override long Position { get => throw new NotSupportedException(); set => throw new NotSupportedException(); } public override void Flush() => throw new NotSupportedEx... </details> <!-- START COPILOT CODING AGENT SUFFIX --> - Fixes adamhathcock/sharpcompress#1168 <!-- START COPILOT CODING AGENT TIPS --> --- 💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs.
claunia added the pull-request label 2026-01-29 22:21:22 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/sharpcompress#1611