ZIP archive file entries with an "data descriptor structure" will confuse ZipReader #57

Open
opened 2026-01-29 22:05:56 +00:00 by claunia · 0 comments
Owner

Originally created by @elgonzo on GitHub (Sep 4, 2015).

Originally assigned to: @adamhathcock on GitHub.

When a ZIP archive file entry has a data descriptor structure following its compressed file data, then ZipReader will falsely report the CRC and file size for this entry being zero. This in itself is more an inconvenience than an error when considering "streaming" of an ZIP archive. Note that it is still possible to obtain the decompressed file data of such a file entry by reading its EntryStream (ZipReader.OpenEntryStream()) until the end of the EntryStream.

However, calling ZipReader.MoveToNextEntry() without reading the EntryStream of such a file entry will upset the ZipReader and make it seek to some arbitrary position in the ZIP file. It will read 4 bytes at this position, expecting to find a local file header signature (i guess). Since those 4 bytes at this arbitrary file position will not be a valid signature, the ZipHeaderFactory.ReadHeader(...) method will throw a NotSupportedException telling: "Unknown header: <random number>".

I have seen a few reports about NotSupportedExceptions telling "Unknown header: <some random number>". Although i cannot be sure what caused the NotSupportedExceptions in those cases, it is certainly a possibility that they might have been caused by the problem i explain here.

What i believe ZipReader should do:

ZipReader can check for "Crc-32" and "Compressed size" fields being zero. If that is the case and this file entry should be skipped (instead of being extracted), then ZipReader could (A) check the compression mode and/or if a signature is following this file entry -- which would indicate a zero-byte. If the entry has not been identified as a zero-byte file, then (B) ZipReader can attempt decompressing the file data in memory to get to the end of the compressed data and thus reaching the optional data descriptor of this entry or the local file header of the next archive entry.

Background info:

The ".ZIP File Format Specification" contains more information with regard to data descriptor structures. Especially the following chapters are worth a read:

4.3.7 Local file header
4.3.9 Data descriptor
4.4.4 General purpose bit flag, Bit 3
4.4.7 CRC-32
4.4.8 compressed size
4.4.9 uncompressed size

Link to the ".ZIP File Format Specification": https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

Also pay attention to the paragraphs 4.3.5 (a data descriptor structure may be present even if general purpose bit 3 is not set), 4.3.9.3 (optional data descriptor signature 0x08074b50) and 4.3.9.6 (data descriptor and central directory encryption).

Remarks
ZipArchive and its ZipArchive.Entries enumeration do not seem to be affected by this issue.

The ZIP archive i found to have file entries as described above is about 300MB in size. This is obviously too large for uploading it as a sample. I will provide a small ZIP file with the described file entries as soon as i managed to produce one myself ;)

Originally created by @elgonzo on GitHub (Sep 4, 2015). Originally assigned to: @adamhathcock on GitHub. When a ZIP archive file entry has a **data descriptor structure** following its compressed file data, then ZipReader will falsely report the CRC and file size for this entry being zero. This in itself is more an inconvenience than an error when considering "streaming" of an ZIP archive. Note that it is still possible to obtain the decompressed file data of such a file entry by reading its EntryStream (ZipReader.OpenEntryStream()) until the end of the EntryStream. However, **calling ZipReader.MoveToNextEntry() without reading the EntryStream of such a file entry** will upset the ZipReader and make it seek to some arbitrary position in the ZIP file. It will read 4 bytes at this position, expecting to find a local file header signature (i guess). Since those 4 bytes at this arbitrary file position will not be a valid signature, the ZipHeaderFactory.ReadHeader(...) method will throw a **NotSupportedException** telling: "Unknown header: _&lt;random number&gt;_". I have seen a few reports about NotSupportedExceptions telling "Unknown header: _&lt;some random number&gt;_". Although i cannot be sure what caused the NotSupportedExceptions in those cases, it is certainly a possibility that they might have been caused by the problem i explain here. **What i believe ZipReader should do:** ZipReader can check for "Crc-32" and "Compressed size" fields being zero. If that is the case and this file entry should be skipped (instead of being extracted), then ZipReader could (A) check the compression mode and/or if a signature is following this file entry -- which would indicate a zero-byte. If the entry has not been identified as a zero-byte file, then (B) ZipReader can attempt decompressing the file data in memory to get to the end of the compressed data and thus reaching the optional data descriptor of this entry or the local file header of the next archive entry. **Background info:** The ".ZIP File Format Specification" contains more information with regard to data descriptor structures. Especially the following chapters are worth a read: 4.3.7 Local file header 4.3.9 Data descriptor 4.4.4 General purpose bit flag, **Bit 3** 4.4.7 CRC-32 4.4.8 compressed size 4.4.9 uncompressed size Link to the ".ZIP File Format Specification": https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT Also pay attention to the paragraphs **4.3.5** (a data descriptor structure _may_ be present even if general purpose bit 3 is not set), **4.3.9.3** (optional data descriptor signature 0x08074b50) and **4.3.9.6** (data descriptor and central directory encryption). **Remarks** ZipArchive and its ZipArchive.Entries enumeration do not seem to be affected by this issue. The ZIP archive i found to have file entries as described above is about 300MB in size. This is obviously too large for uploading it as a sample. I will provide a small ZIP file with the described file entries as soon as i managed to produce one myself ;)
claunia added the bug label 2026-01-29 22:05:56 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/sharpcompress#57