ZIP archive file entries with an "data descriptor structure" will confuse ZipReader #57

New Issue

claunia · 2026-01-29T22:05:56Z

claunia commented

2026-01-29 22:05:56 +00:00

Originally created by @elgonzo on GitHub (Sep 4, 2015).

Originally assigned to: @adamhathcock on GitHub.

When a ZIP archive file entry has a data descriptor structure following its compressed file data, then ZipReader will falsely report the CRC and file size for this entry being zero. This in itself is more an inconvenience than an error when considering "streaming" of an ZIP archive. Note that it is still possible to obtain the decompressed file data of such a file entry by reading its EntryStream (ZipReader.OpenEntryStream()) until the end of the EntryStream.

However, calling ZipReader.MoveToNextEntry() without reading the EntryStream of such a file entry will upset the ZipReader and make it seek to some arbitrary position in the ZIP file. It will read 4 bytes at this position, expecting to find a local file header signature (i guess). Since those 4 bytes at this arbitrary file position will not be a valid signature, the ZipHeaderFactory.ReadHeader(...) method will throw a NotSupportedException telling: "Unknown header: <random number>".

I have seen a few reports about NotSupportedExceptions telling "Unknown header: <some random number>". Although i cannot be sure what caused the NotSupportedExceptions in those cases, it is certainly a possibility that they might have been caused by the problem i explain here.

What i believe ZipReader should do:

ZipReader can check for "Crc-32" and "Compressed size" fields being zero. If that is the case and this file entry should be skipped (instead of being extracted), then ZipReader could (A) check the compression mode and/or if a signature is following this file entry -- which would indicate a zero-byte. If the entry has not been identified as a zero-byte file, then (B) ZipReader can attempt decompressing the file data in memory to get to the end of the compressed data and thus reaching the optional data descriptor of this entry or the local file header of the next archive entry.

Background info:

The ".ZIP File Format Specification" contains more information with regard to data descriptor structures. Especially the following chapters are worth a read:

4.3.7 Local file header
4.3.9 Data descriptor
4.4.4 General purpose bit flag, Bit 3
4.4.7 CRC-32
4.4.8 compressed size
4.4.9 uncompressed size

Link to the ".ZIP File Format Specification": https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

Also pay attention to the paragraphs 4.3.5 (a data descriptor structure may be present even if general purpose bit 3 is not set), 4.3.9.3 (optional data descriptor signature 0x08074b50) and 4.3.9.6 (data descriptor and central directory encryption).

Remarks
ZipArchive and its ZipArchive.Entries enumeration do not seem to be affected by this issue.

The ZIP archive i found to have file entries as described above is about 300MB in size. This is obviously too large for uploading it as a sample. I will provide a small ZIP file with the described file entries as soon as i managed to produce one myself ;)

Originally created by @elgonzo on GitHub (Sep 4, 2015). Originally assigned to: @adamhathcock on GitHub. When a ZIP archive file entry has a **data descriptor structure** following its compressed file data, then ZipReader will falsely report the CRC and file size for this entry being zero. This in itself is more an inconvenience than an error when considering "streaming" of an ZIP archive. Note that it is still possible to obtain the decompressed file data of such a file entry by reading its EntryStream (ZipReader.OpenEntryStream()) until the end of the EntryStream. However, **calling ZipReader.MoveToNextEntry() without reading the EntryStream of such a file entry** will upset the ZipReader and make it seek to some arbitrary position in the ZIP file. It will read 4 bytes at this position, expecting to find a local file header signature (i guess). Since those 4 bytes at this arbitrary file position will not be a valid signature, the ZipHeaderFactory.ReadHeader(...) method will throw a **NotSupportedException** telling: "Unknown header: _<random number>_". I have seen a few reports about NotSupportedExceptions telling "Unknown header: _<some random number>_". Although i cannot be sure what caused the NotSupportedExceptions in those cases, it is certainly a possibility that they might have been caused by the problem i explain here. **What i believe ZipReader should do:** ZipReader can check for "Crc-32" and "Compressed size" fields being zero. If that is the case and this file entry should be skipped (instead of being extracted), then ZipReader could (A) check the compression mode and/or if a signature is following this file entry -- which would indicate a zero-byte. If the entry has not been identified as a zero-byte file, then (B) ZipReader can attempt decompressing the file data in memory to get to the end of the compressed data and thus reaching the optional data descriptor of this entry or the local file header of the next archive entry. **Background info:** The ".ZIP File Format Specification" contains more information with regard to data descriptor structures. Especially the following chapters are worth a read: 4.3.7 Local file header 4.3.9 Data descriptor 4.4.4 General purpose bit flag, **Bit 3** 4.4.7 CRC-32 4.4.8 compressed size 4.4.9 uncompressed size Link to the ".ZIP File Format Specification": https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT Also pay attention to the paragraphs **4.3.5** (a data descriptor structure _may_ be present even if general purpose bit 3 is not set), **4.3.9.3** (optional data descriptor signature 0x08074b50) and **4.3.9.6** (data descriptor and central directory encryption). **Remarks** ZipArchive and its ZipArchive.Entries enumeration do not seem to be affected by this issue. The ZIP archive i found to have file entries as described above is about 300MB in size. This is obviously too large for uploading it as a sample. I will provide a small ZIP file with the described file entries as soon as i managed to produce one myself ;)

claunia added the bug label 2026-01-29 22:05:56 +00:00

Sign in to join this conversation.

Branches Tags

master

release

adam/merge-release-to-master

dependabot/nuget/xunit.v3-3.2.2

adam/more-explode-async

copilot/fix-infinite-loop-rar-archive

adam/data-descriptor-fix

adam/fix-tests-with-proper-rewind

copilot/fix-data-descriptor-stream-bug

adam/lmza-investigation

adam/create-rar-async

adam/async-rar2

copilot/support-multi-threading-path

copilot/sub-pr-1132-again

adam/memory-perf

copilot/add-performance-benchmarking

copilot/sub-pr-1121

copilot/add-password-support-zip-files

copilot/add-so-optimized-zip-support

adam/rar-async-only

copilot/add-buffered-stream-async-read

copilot/sub-pr-1076

copilot/fix-decompression-exception

copilot/fix-archivefactory-issue

copilot/rationalize-sourcestream-volumes

adam/open-async

copilot/add-ace-archive-support

copilot/sub-pr-1040-again

adam/more-async-3

copilot/fix-tararchive-incomplete-iteration

adam/multi-threaded

copilot/sub-pr-1040

adam/awesome-copilot

copilot/fix-ziparchive-extraction-issue

copilot/fix-tararchive-open-crash

copilot/fix-tar-xz-file-reading-issue

copilot/setup-copilot-instructions

copilot/fix-decompression-performance-issue

copilot/convert-stream-access-to-async

adam/enable-agent

adam/async-deflate

adam/async-rar

adam/more-cleanup

adam/zstd

async-2

zstandard

net461-tests

dmg

async

build-netcore3

recycle-memory-stream

presentation

pax

netcore2

zip_encryption

dotnet-tool

tar_redux

native_zlib

Issue-197

system_buffers

TarNames

7zip_sfx

portable_crypto

WinRT

new_7zip

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/sharpcompress#57