Streams larger than uint.MaxValue cause broken zip files #157

New Issue

claunia · 2026-01-29T22:07:30Z

claunia commented

2026-01-29 22:07:30 +00:00

Originally created by @kenkendk on GitHub (Mar 9, 2017).

I have tracked down an issue causing failures when attempting to read zip files with SharpCompress (files created AND read by SharpCompress).

The error message is Unknown header {value}, where {value} is some random bytes from the file. This is similar to issue #33, but they report it for a much smaller file (I have tested the file mentioned in the issue, and it does not appear to cause any errors).

The problem is that the Central Directory Entry is limited to storing the Size and HeaderOffset as uint values.
There are no checks in SharpCompress if this limit is exceeded, causing the creation to succeed, but then failing to read them later. In the example below, this is done with a single file of 4GB size, but it can also be achieved with many smaller files, as long as the HeaderOffset value becomes larger than 2^32.

There is another issue in that the number of files are limited to ushort, but this appears to have no effects other than reporting the wrong number of files, which is not directly exposed.

A workaround to reading such a file is using the forward-only interface, which does not read the Central Directory Entry, and this can correctly read the file contents unless there is a single file larger than 2^32.

I can think of two solutions:

Prevent creating files with offsets larger than 2^32, simply throwing an exception if this is detected.
Support zip64, which replaces the size and offset with 0xffffffff and stores a 64bit value in the extended information.

I think support for zip64 is the better choice here. Reading support for zip64 has already been added, but it requires that the correct zip64 records are written.

I will see if I can make a PR that adds zip64 support.

Example code that reproduces the issue:

using System;
using System.IO;
using System.Linq;
using SharpCompress.Archives;
using SharpCompress.Common;
using SharpCompress.Writers;
using SharpCompress.Writers.Zip;

namespace BreakTester
{
	class MainClass
	{
		public static void Main(string[] args)
		{
			// Set up the parameters
			var files = 1;
			var filesize = 4 * 1024 * 1024 * 1024L;
			var write_chunk_size = 1024 * 1024;

			var filename = "broken.zip";
			if (!File.Exists(filename))
				CreateBrokenZip(filename, files, filesize, write_chunk_size);

			var res = ReadBrokenZip(filename);
			if (res.Item1 != files)
				throw new Exception($"Incorrect number of items reported: {res.Item1}, should have been {files}");
			if (res.Item2 != filesize)
				throw new Exception($"Incorrect number of items reported: {res.Item2}, should have been {files * filesize}");
		}

		public static void CreateBrokenZip(string filename, long files, long filesize, long chunksize)
		{
			var data = new byte[chunksize];

			// We use the store method to make sure our data is not compressed,
			// and thus it is easier to make large files
			var opts = new ZipWriterOptions(CompressionType.None) { };

			using (var zip = File.OpenWrite(filename))
			using (var zipWriter = (ZipWriter)WriterFactory.Open(zip, ArchiveType.Zip, opts))
			{

				for (var i = 0; i < files; i++)
					using (var str = zipWriter.WriteToStream(i.ToString(), new ZipWriterEntryOptions() { DeflateCompressionLevel = SharpCompress.Compressors.Deflate.CompressionLevel.None }))
					{
						var left = filesize;
						while (left > 0)
						{
							var b = (int)Math.Min(left, data.Length);
							str.Write(data, 0, b);
							left -= b;
						}
					}
			}
		}

		public static Tuple<long, long> ReadBrokenZip(string filename)
		{
			using (var archive = ArchiveFactory.Open(filename))
			{
				return new Tuple<long, long>(
					archive.Entries.Count(),
					archive.Entries.Select(x => x.Size).Sum()
				);
			}
		}

		public static Tuple<long, long> ReadBrokenZipForwardOnly(string filename)
		{
			long count = 0;
			long size = 0;
			using (var fs = File.OpenRead(filename))
			using (var rd = ZipReader.Open(fs, new ReaderOptions() { LookForHeader = false }))
				while (rd.MoveToNextEntry())
				{
					count++;
					size += rd.Entry.Size;
				}

			return new Tuple<long, long>(count, size);
		}
	}
}

Originally created by @kenkendk on GitHub (Mar 9, 2017). I have tracked down an issue causing failures when attempting to read zip files with SharpCompress (files created AND read by SharpCompress). The error message is `Unknown header {value}`, where `{value}` is some random bytes from the file. This is similar to issue #33, but they report it for a much smaller file (I have tested the file mentioned in the issue, and it does not appear to cause any errors). The problem is that the Central Directory Entry is limited to storing the `Size` and `HeaderOffset` as `uint` values. There are no checks in SharpCompress if this limit is exceeded, causing the creation to succeed, but then failing to read them later. In the example below, this is done with a single file of 4GB size, but it can also be achieved with many smaller files, as long as the `HeaderOffset` value becomes larger than `2^32`. There is another issue in that the number of files are limited to `ushort`, but this appears to have no effects other than reporting the wrong number of files, which is not directly exposed. A workaround to reading such a file is using the forward-only interface, which does not read the Central Directory Entry, and this can correctly read the file contents unless there is a single file larger than `2^32`. I can think of two solutions: 1) Prevent creating files with offsets larger than `2^32`, simply throwing an exception if this is detected. 2) Support zip64, which replaces the size and offset with `0xffffffff` and stores a 64bit value in the extended information. I think support for zip64 is the better choice here. Reading support for zip64 has already been added, but it requires that the correct zip64 records are written. I will see if I can make a PR that adds zip64 support. Example code that reproduces the issue: ```csharp using System; using System.IO; using System.Linq; using SharpCompress.Archives; using SharpCompress.Common; using SharpCompress.Writers; using SharpCompress.Writers.Zip; namespace BreakTester { class MainClass { public static void Main(string[] args) { // Set up the parameters var files = 1; var filesize = 4 * 1024 * 1024 * 1024L; var write_chunk_size = 1024 * 1024; var filename = "broken.zip"; if (!File.Exists(filename)) CreateBrokenZip(filename, files, filesize, write_chunk_size); var res = ReadBrokenZip(filename); if (res.Item1 != files) throw new Exception($"Incorrect number of items reported: {res.Item1}, should have been {files}"); if (res.Item2 != filesize) throw new Exception($"Incorrect number of items reported: {res.Item2}, should have been {files * filesize}"); } public static void CreateBrokenZip(string filename, long files, long filesize, long chunksize) { var data = new byte[chunksize]; // We use the store method to make sure our data is not compressed, // and thus it is easier to make large files var opts = new ZipWriterOptions(CompressionType.None) { }; using (var zip = File.OpenWrite(filename)) using (var zipWriter = (ZipWriter)WriterFactory.Open(zip, ArchiveType.Zip, opts)) { for (var i = 0; i < files; i++) using (var str = zipWriter.WriteToStream(i.ToString(), new ZipWriterEntryOptions() { DeflateCompressionLevel = SharpCompress.Compressors.Deflate.CompressionLevel.None })) { var left = filesize; while (left > 0) { var b = (int)Math.Min(left, data.Length); str.Write(data, 0, b); left -= b; } } } } public static Tuple<long, long> ReadBrokenZip(string filename) { using (var archive = ArchiveFactory.Open(filename)) { return new Tuple<long, long>( archive.Entries.Count(), archive.Entries.Select(x => x.Size).Sum() ); } } public static Tuple<long, long> ReadBrokenZipForwardOnly(string filename) { long count = 0; long size = 0; using (var fs = File.OpenRead(filename)) using (var rd = ZipReader.Open(fs, new ReaderOptions() { LookForHeader = false })) while (rd.MoveToNextEntry()) { count++; size += rd.Entry.Size; } return new Tuple<long, long>(count, size); } } } ```

claunia closed this issue

2026-01-29 22:07:30 +00:00

claunia commented

2026-01-29 22:07:31 +00:00

@adamhathcock commented on GitHub (Mar 9, 2017):

Thanks for this.

If you could implement zip64 writing that would be great! I imagine both of the solutions you mentioned will be required (unless changing to zip64 from zip can be automatic)

I have to confess I haven't looked too deeply at the zip64 format at the moment.

@adamhathcock commented on GitHub (Mar 9, 2017): Thanks for this. If you could implement zip64 writing that would be great! I imagine both of the solutions you mentioned will be required (unless changing to zip64 from zip can be automatic) I have to confess I haven't looked too deeply at the zip64 format at the moment.

claunia commented

2026-01-29 22:07:31 +00:00

@kenkendk commented on GitHub (Mar 9, 2017):

Urgh, I just found an error from this commit:
6be6ef0b5c

The values here are changed:
6be6ef0b5c/src/SharpCompress/Common/Zip/Headers/ZipFileEntry.cs (L60)

But the write uses the type here:
6be6ef0b5c/src/SharpCompress/Common/Zip/Headers/LocalEntryHeader.cs (L45)

So now it writes invalid local headers, as they have 8 byte sizes instead of 4 byte sizes....

@kenkendk commented on GitHub (Mar 9, 2017): Urgh, I just found an error from this commit: https://github.com/adamhathcock/sharpcompress/commit/6be6ef0b5c3ed6a4d40d00c8fb133518e75e4a6f The values here are changed: https://github.com/adamhathcock/sharpcompress/blob/6be6ef0b5c3ed6a4d40d00c8fb133518e75e4a6f/src/SharpCompress/Common/Zip/Headers/ZipFileEntry.cs#L60 But the write uses the type here: https://github.com/adamhathcock/sharpcompress/blob/6be6ef0b5c3ed6a4d40d00c8fb133518e75e4a6f/src/SharpCompress/Common/Zip/Headers/LocalEntryHeader.cs#L45 So now it writes invalid local headers, as they have 8 byte sizes instead of 4 byte sizes....

claunia commented

2026-01-29 22:07:31 +00:00

@kenkendk commented on GitHub (Mar 9, 2017):

Urgh, and in the central directory too!

@kenkendk commented on GitHub (Mar 9, 2017): Urgh, and in the central directory too!

claunia commented

2026-01-29 22:07:32 +00:00

@adamhathcock commented on GitHub (Mar 9, 2017):

Wow that is a bad error. The writing needs to know if it's zip64 or not and write the correct byte size.

I guess zip64 writing needs done asap.

@adamhathcock commented on GitHub (Mar 9, 2017): Wow that is a bad error. The writing needs to know if it's zip64 or not and write the correct byte size. I guess zip64 writing needs done asap.

claunia commented

2026-01-29 22:07:32 +00:00

@kenkendk commented on GitHub (Mar 9, 2017):

zip64 maintains the size, but sets it to 0xffffffff and then writes an "extra" record with the sizes having 64bit values.

@kenkendk commented on GitHub (Mar 9, 2017): zip64 maintains the size, but sets it to `0xffffffff` and then writes an "extra" record with the sizes having 64bit values.

claunia commented

2026-01-29 22:07:32 +00:00

@kenkendk commented on GitHub (Mar 9, 2017):

Thanks for the quick merge and update.

I will investigate some more, because there must be some mitigating factor, otherwise the filenames would be garbled in all produced zip files, and that would break my unittests, so I would have caught it there.

@kenkendk commented on GitHub (Mar 9, 2017): Thanks for the quick merge and update. I will investigate some more, because there must be some mitigating factor, otherwise the filenames would be garbled in all produced zip files, and that would break my unittests, so I would have caught it there.

claunia commented

2026-01-29 22:07:32 +00:00

@adamhathcock commented on GitHub (Mar 9, 2017):

If you've got some tests that can be added to SharpCompress, I would appreciate it. Thanks for finding this.

Fixed by https://github.com/adamhathcock/sharpcompress/pull/210

@adamhathcock commented on GitHub (Mar 9, 2017): If you've got some tests that can be added to SharpCompress, I would appreciate it. Thanks for finding this. Fixed by https://github.com/adamhathcock/sharpcompress/pull/210

claunia commented

2026-01-29 22:07:32 +00:00

@kenkendk commented on GitHub (Mar 9, 2017):

This issue is not fixed, the #210 was a problem with writing invalid headers.
There is still an issue with creating files larger than 4GB.

@kenkendk commented on GitHub (Mar 9, 2017): This issue is not fixed, the #210 was a problem with writing invalid headers. There is still an issue with creating files larger than 4GB.

claunia commented

2026-01-29 22:07:32 +00:00

@kenkendk commented on GitHub (Mar 9, 2017):

Phew, found the mitigating factor:
06e3486ec4/src/SharpCompress/Writers/Zip/ZipCentralDirectoryEntry.cs (L42)

This is still wrong, as it converts to 8 bytes, and then takes only 4. Fortunately little endian notation means that the value is written correctly on any x86 or ARM machine.

@kenkendk commented on GitHub (Mar 9, 2017): Phew, found the mitigating factor: https://github.com/adamhathcock/sharpcompress/blob/06e3486ec4c67377c4aa6c65b79d50d7e7925e56/src/SharpCompress/Writers/Zip/ZipCentralDirectoryEntry.cs#L42 This is still wrong, as it converts to 8 bytes, and then takes only 4. Fortunately little endian notation means that the value is written correctly on any x86 or ARM machine.

claunia commented

2026-01-29 22:07:33 +00:00

@kenkendk commented on GitHub (Mar 9, 2017):

Nope, wrong again, the actual entry used is:
06e3486ec4/src/SharpCompress/Writers/Zip/ZipCentralDirectoryEntry.cs (L17)

Which has the uint type. I do not know where the other serialization methods are called.

@kenkendk commented on GitHub (Mar 9, 2017): Nope, wrong again, the actual entry used is: https://github.com/adamhathcock/sharpcompress/blob/06e3486ec4c67377c4aa6c65b79d50d7e7925e56/src/SharpCompress/Writers/Zip/ZipCentralDirectoryEntry.cs#L17 Which has the `uint` type. I do not know where the other serialization methods are called.

claunia commented

2026-01-29 22:07:33 +00:00

@adamhathcock commented on GitHub (Mar 9, 2017):

You can't zip files larger than 4 GB. The Zip64 support is currently read only

@adamhathcock commented on GitHub (Mar 9, 2017): You can't zip files larger than 4 GB. The Zip64 support is currently read only

claunia commented

2026-01-29 22:07:33 +00:00

@kenkendk commented on GitHub (Mar 9, 2017):

Sure you can, try the example code in this issue.

@kenkendk commented on GitHub (Mar 9, 2017): Sure you can, try the example code in this issue.

claunia commented

2026-01-29 22:07:33 +00:00

@adamhathcock commented on GitHub (Mar 9, 2017):

If it works, then it's unintentional and you're finding the errors because of it.

@adamhathcock commented on GitHub (Mar 9, 2017): If it works, then it's unintentional and you're finding the errors because of it.

claunia commented

2026-01-29 22:07:33 +00:00

@kenkendk commented on GitHub (Mar 9, 2017):

Yes, that is why I opened the issue. You can create a zip file larger than 4gb without errors, and you see the errors only when you try to read the file.

I would expect some error to happen when crossing the 4gb limit, otherwise I will not discover the problem before I attempt to read the file.

@kenkendk commented on GitHub (Mar 9, 2017): Yes, that is why I opened the issue. You can create a zip file larger than 4gb without errors, and you see the errors only when you try to read the file. I would expect some error to happen when crossing the 4gb limit, otherwise I will not discover the problem before I attempt to read the file.

claunia commented

2026-01-29 22:07:33 +00:00

@adamhathcock commented on GitHub (Mar 9, 2017):

Okay, sorry. I thought this was a discussion of the continuation of the writing error.

Yes, something should be done to know files are too large for zip and/or implement zip64

@adamhathcock commented on GitHub (Mar 9, 2017): Okay, sorry. I thought this was a discussion of the continuation of the writing error. Yes, something should be done to know files are too large for zip and/or implement zip64

Sign in to join this conversation.

Branches Tags

master

release

adam/merge-release-to-master

dependabot/nuget/xunit.v3-3.2.2

adam/more-explode-async

copilot/fix-infinite-loop-rar-archive

adam/data-descriptor-fix

adam/fix-tests-with-proper-rewind

copilot/fix-data-descriptor-stream-bug

adam/lmza-investigation

adam/create-rar-async

adam/async-rar2

copilot/support-multi-threading-path

copilot/sub-pr-1132-again

adam/memory-perf

copilot/add-performance-benchmarking

copilot/sub-pr-1121

copilot/add-password-support-zip-files

copilot/add-so-optimized-zip-support

adam/rar-async-only

copilot/add-buffered-stream-async-read

copilot/sub-pr-1076

copilot/fix-decompression-exception

copilot/fix-archivefactory-issue

copilot/rationalize-sourcestream-volumes

adam/open-async

copilot/add-ace-archive-support

copilot/sub-pr-1040-again

adam/more-async-3

copilot/fix-tararchive-incomplete-iteration

adam/multi-threaded

copilot/sub-pr-1040

adam/awesome-copilot

copilot/fix-ziparchive-extraction-issue

copilot/fix-tararchive-open-crash

copilot/fix-tar-xz-file-reading-issue

copilot/setup-copilot-instructions

copilot/fix-decompression-performance-issue

copilot/convert-stream-access-to-async

adam/enable-agent

adam/async-deflate

adam/async-rar

adam/more-cleanup

adam/zstd

async-2

zstandard

net461-tests

dmg

async

build-netcore3

recycle-memory-stream

presentation

pax

netcore2

zip_encryption

dotnet-tool

tar_redux

native_zlib

Issue-197

system_buffers

TarNames

7zip_sfx

portable_crypto

WinRT

new_7zip

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/sharpcompress#157

Streams larger than uint.MaxValue cause broken zip files #157

Streams larger than `uint.MaxValue` cause broken zip files #157