ZIP archive file entries with an "data descriptor structure" will confuse ZipReader #60

Closed
opened 2026-01-29 22:05:59 +00:00 by claunia · 7 comments
Owner

Originally created by @elgonzo on GitHub (Sep 4, 2015).

Originally assigned to: @adamhathcock on GitHub.

When a ZIP archive file entry has a data descriptor structure following its compressed file data, then ZipReader will falsely report the CRC and file size for this entry being zero. This in itself is more an inconvenience than an error when considering "streaming" of an ZIP archive. Note that it is still possible to obtain the decompressed file data of such a file entry by reading its EntryStream (ZipReader.OpenEntryStream()) until the end of the EntryStream.

However, calling ZipReader.MoveToNextEntry() without reading the EntryStream of such a file entry will upset the ZipReader and make it seek to some arbitrary position in the ZIP file. It will read 4 bytes at this position, expecting to find a local file header signature (i guess). Since those 4 bytes at this arbitrary file position will not be a valid signature, the ZipHeaderFactory.ReadHeader(...) method will throw a NotSupportedException telling: "Unknown header: <random number>".

I have seen a few reports about NotSupportedExceptions telling "Unknown header: <some random number>". Although i cannot be sure what caused the NotSupportedExceptions in those cases, it is certainly a possibility that they might have been caused by the problem i explain here.

What i believe ZipReader should do:

ZipReader can check for "Crc-32" and "Compressed size" fields being zero. If that is the case and this file entry should be skipped (instead of being extracted), then ZipReader could (A) check the compression mode and/or if a signature is following this file entry -- which would indicate a zero-byte. If the entry has not been identified as a zero-byte file, then (B) ZipReader can attempt decompressing the file data in memory to get to the end of the compressed data and thus reaching the optional data descriptor of this entry or the local file header of the next archive entry.

Background info:

The ".ZIP File Format Specification" contains more information with regard to data descriptor structures. Especially the following chapters are worth a read:

4.3.7 Local file header
4.3.9 Data descriptor
4.4.4 General purpose bit flag, Bit 3
4.4.7 CRC-32
4.4.8 compressed size
4.4.9 uncompressed size

Link to the ".ZIP File Format Specification": https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

Also pay attention to the paragraphs 4.3.5 (a data descriptor structure may be present even if general purpose bit 3 is not set), 4.3.9.3 (optional data descriptor signature 0x08074b50) and 4.3.9.6 (data descriptor and central directory encryption).

Remarks
ZipArchive and its ZipArchive.Entries enumeration do not seem to be affected by this issue.

The ZIP archive i found to have file entries as described above is about 300MB in size. This is obviously too large for uploading it as a sample. I will provide a small ZIP file with the described file entries as soon as i managed to produce one myself ;)

Originally created by @elgonzo on GitHub (Sep 4, 2015). Originally assigned to: @adamhathcock on GitHub. When a ZIP archive file entry has a **data descriptor structure** following its compressed file data, then ZipReader will falsely report the CRC and file size for this entry being zero. This in itself is more an inconvenience than an error when considering "streaming" of an ZIP archive. Note that it is still possible to obtain the decompressed file data of such a file entry by reading its EntryStream (ZipReader.OpenEntryStream()) until the end of the EntryStream. However, **calling ZipReader.MoveToNextEntry() without reading the EntryStream of such a file entry** will upset the ZipReader and make it seek to some arbitrary position in the ZIP file. It will read 4 bytes at this position, expecting to find a local file header signature (i guess). Since those 4 bytes at this arbitrary file position will not be a valid signature, the ZipHeaderFactory.ReadHeader(...) method will throw a **NotSupportedException** telling: "Unknown header: _&lt;random number&gt;_". I have seen a few reports about NotSupportedExceptions telling "Unknown header: _&lt;some random number&gt;_". Although i cannot be sure what caused the NotSupportedExceptions in those cases, it is certainly a possibility that they might have been caused by the problem i explain here. **What i believe ZipReader should do:** ZipReader can check for "Crc-32" and "Compressed size" fields being zero. If that is the case and this file entry should be skipped (instead of being extracted), then ZipReader could (A) check the compression mode and/or if a signature is following this file entry -- which would indicate a zero-byte. If the entry has not been identified as a zero-byte file, then (B) ZipReader can attempt decompressing the file data in memory to get to the end of the compressed data and thus reaching the optional data descriptor of this entry or the local file header of the next archive entry. **Background info:** The ".ZIP File Format Specification" contains more information with regard to data descriptor structures. Especially the following chapters are worth a read: 4.3.7 Local file header 4.3.9 Data descriptor 4.4.4 General purpose bit flag, **Bit 3** 4.4.7 CRC-32 4.4.8 compressed size 4.4.9 uncompressed size Link to the ".ZIP File Format Specification": https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT Also pay attention to the paragraphs **4.3.5** (a data descriptor structure _may_ be present even if general purpose bit 3 is not set), **4.3.9.3** (optional data descriptor signature 0x08074b50) and **4.3.9.6** (data descriptor and central directory encryption). **Remarks** ZipArchive and its ZipArchive.Entries enumeration do not seem to be affected by this issue. The ZIP archive i found to have file entries as described above is about 300MB in size. This is obviously too large for uploading it as a sample. I will provide a small ZIP file with the described file entries as soon as i managed to produce one myself ;)
claunia added the bug label 2026-01-29 22:05:59 +00:00
Author
Owner

@elgonzo commented on GitHub (Sep 5, 2015):

Okay, i managed to produce a small ZIP archive. It can be downloaded from:

URL: http://wikisend.com/download/589558/test_data_descriptor.zip
Password: adamhathcock

The ZIP file contains two files:

  • "-" (the file name is just a dash)
    A text file containing the word "FIRST" repeated a few times
  • "second.txt"
    Text file containing the word "SECOND" repeated a few times

(I used Info-ZIP's command line utility zip.exe 3.0 to create this file, which explains why the file name of one of the file entries is just a dash...)

Look at the local file header of the first file ("-"). It has the general purpose bit 3 set, and the fields "crc-32", "compressed size" and "uncompressed size" are zeroed. As required by the general purpose bit 3 being set, this file entry has a data descriptor following its file data.

Also interesting is the local file header of the second file entry "second.txt". It has the general purpose bit 3 set too and has a data descriptor as well, but notice that only the fields "crc-32" and "compressed size" are zeroed, whereas the field "uncompressed size" is not zero (it contains the actual correct uncompressed file size for this entry). If the ZIP file format specification is followed to the letter, then this local file header is actually violating the specification. One has to assume that Info-ZIP is not the only software which could create such local file headers...

Note that this small ZIP file will not produce a NotSupportedException as described in my report above, but rather an EndOfStreamException. I guess the arbitrary stream position the ZipReader wants to jump to after getting confused is beyond the end of the zip archive file, which would explain the different exception i observed when testing ZipReader with this small ZIP archive.

Some boring tidbits about how i created the ZIP file

There are basically two ways to create file entries with data descriptors using Info-ZIP's zip utility.

The first way is to use the "-fd" command line switch, which will enforce data descriptors and sets the general purpose bit 3 on the affected archive entries. I used this switch to add "second.txt" to the archive. However, as i explained, the Info-ZIP zip utility forgets to set the "compressed size" field to zero. And i wanted to get an archive entry where "compressed size" is properly set to zero.

The other way is to provide the data to be comressed via stdin. In this case, Info-ZIP's zip utility will also use a data descriptor and set the general purpose bit 3 for the resulting archive entry. It will also properly zero out "crc-32", "compressed size" as well as "uncompressed size".

Hence, i used the following command line to create the small test ZIP file:

type first.txt | zip -fd -fz- test_data_descriptor.zip - second.txt

**General purpose bit 3 and uncompressed file entries**

The ZIP file format specification mentions about the general purpose bit 3:

If this bit is set, the fields crc-32, compressed
size and uncompressed size are set to zero in the
local header. The correct values are put in the
data descriptor immediately following the compressed
data
. (Note: PKZIP version 2.04g for DOS only
recognizes this bit for method 8 compression, newer
versions of PKZIP recognize this bit for any
compression method.)

The remark that the data descriptor has to follow the compressed data when the general purpose bit 3 is set means in consequence that setting general purpose bit 3 for uncompressed file entries is not allowed (as there would be no compressed data...).

This means, encountering a local file header where the "crc-32", "compressed size" and/or "uncompressed size" fields are zero, it should be sufficient to check the compression mode and the general purpose bit 3 to know whether this entry represents a zero-byte file or whether the entries size and CRC values are to be found in a data descriptor following the compressed data...

@elgonzo commented on GitHub (Sep 5, 2015): Okay, i managed to produce a small ZIP archive. It can be downloaded from: URL: http://wikisend.com/download/589558/test_data_descriptor.zip Password: adamhathcock The ZIP file contains two files: - "-" (the file name is just a dash)<BR>A text file containing the word "FIRST" repeated a few times - "second.txt"<BR>Text file containing the word "SECOND" repeated a few times (I used Info-ZIP's command line utility zip.exe 3.0 to create this file, which explains why the file name of one of the file entries is just a dash...) Look at the local file header of the first file ("-"). It has the general purpose bit 3 set, and the fields "crc-32", "compressed size" and "uncompressed size" are zeroed. As required by the general purpose bit 3 being set, this file entry has a data descriptor following its file data. Also interesting is the local file header of the second file entry "second.txt". It has the general purpose bit 3 set too and has a data descriptor as well, but notice that only the fields "crc-32" and "compressed size" are zeroed, whereas the field "uncompressed size" is not zero (it contains the actual correct uncompressed file size for this entry). If the ZIP file format specification is followed to the letter, then this local file header is actually violating the specification. One has to assume that Info-ZIP is not the only software which could create such local file headers... Note that this small ZIP file will not produce a NotSupportedException as described in my report above, but rather an EndOfStreamException. I guess the arbitrary stream position the ZipReader wants to jump to after getting confused is beyond the end of the zip archive file, which would explain the different exception i observed when testing ZipReader with this small ZIP archive. **Some boring tidbits about how i created the ZIP file** There are basically two ways to create file entries with data descriptors using Info-ZIP's zip utility. The first way is to use the "-fd" command line switch, which will enforce data descriptors and sets the general purpose bit 3 on the affected archive entries. I used this switch to add "second.txt" to the archive. However, as i explained, the Info-ZIP zip utility forgets to set the "compressed size" field to zero. And i wanted to get an archive entry where "compressed size" is properly set to zero. The other way is to provide the data to be comressed via stdin. In this case, Info-ZIP's zip utility will also use a data descriptor and set the general purpose bit 3 for the resulting archive entry. It will also properly zero out "crc-32", "compressed size" as well as "uncompressed size". Hence, i used the following command line to create the small test ZIP file: ``` type first.txt | zip -fd -fz- test_data_descriptor.zip - second.txt ``` <BR> **General purpose bit 3 and uncompressed file entries** The ZIP file format specification mentions about the general purpose bit 3: > If this bit is set, the fields crc-32, compressed > size and uncompressed size are set to zero in the > local header. The correct values are put in the > data descriptor immediately following the **compressed > data**. (Note: PKZIP version 2.04g for DOS only > recognizes this bit for method 8 compression, newer > versions of PKZIP recognize this bit for any > compression method.) The remark that the data descriptor has to follow the **compressed data** when the general purpose bit 3 is set means in consequence that setting general purpose bit 3 for uncompressed file entries is not allowed (as there would be no compressed data...). This means, encountering a local file header where the "crc-32", "compressed size" and/or "uncompressed size" fields are zero, it should be sufficient to check the compression mode and the general purpose bit 3 to know whether this entry represents a zero-byte file or whether the entries size and CRC values are to be found in a data descriptor following the compressed data...
Author
Owner

@adamhathcock commented on GitHub (Sep 5, 2015):

Thanks for this info. My brain is baby fried so I'll have to look at this a bit later. Just wanted you to know I'm not ignoring you.

@adamhathcock commented on GitHub (Sep 5, 2015): Thanks for this info. My brain is baby fried so I'll have to look at this a bit later. Just wanted you to know I'm not ignoring you.
Author
Owner

@elgonzo commented on GitHub (Sep 6, 2015):

Don't worry. I am not expecting you to start rushing just because i wrote something. I am sure Github will be around for quite some time and so will the stuff i wrote... :)

In case you will not find time to look at the issue in the next weeks, you might still want to grab and make a backup of that small ZIP file i mentioned in my second comment. The hosting site (Wikisend) will delete it after 90 days. (Sorry, i forgot to mention this earlier...)

@elgonzo commented on GitHub (Sep 6, 2015): Don't worry. I am not expecting you to start rushing just because i wrote something. I am sure Github will be around for quite some time and so will the stuff i wrote... :) In case you will not find time to look at the issue in the next weeks, you might still want to grab and make a backup of that small ZIP file i mentioned in my second comment. The hosting site (Wikisend) will delete it after 90 days. (Sorry, i forgot to mention this earlier...)
Author
Owner

@mewalig commented on GitHub (Apr 28, 2016):

could you repost the file?

@mewalig commented on GitHub (Apr 28, 2016): could you repost the file?
Author
Owner

@mewalig commented on GitHub (Apr 28, 2016):

nm, made my own thanks to your helpful notes. I'm on osx, used:

cat myfile.txt | zip -fz- target.zip -
@mewalig commented on GitHub (Apr 28, 2016): nm, made my own thanks to your helpful notes. I'm on osx, used: ``` cat myfile.txt | zip -fz- target.zip - ```
Author
Owner

@mewalig commented on GitHub (Apr 28, 2016):

Shucks, I was hoping that would create a zip file with General purpose bit 3 set, but it looks like it doesn't...

@mewalig commented on GitHub (Apr 28, 2016): Shucks, I was hoping that would create a zip file with General purpose bit 3 set, but it looks like it doesn't...
Author
Owner

@elgonzo commented on GitHub (Apr 29, 2016):

Here the ZIP file again: test_data_descriptor.zip

Just FYI, when creating the ZIP file i also used the the command line parameter -fd which enforces usage of data descriptors. Not sure whether the ZIP tool on OSX provides this parameter, but i noticed that you didn't use it when creating your ZIP file (which could explain why your ZIP tool did not choose to use data descriptors based on whatever reasons and circumstances)

@elgonzo commented on GitHub (Apr 29, 2016): Here the ZIP file again: [test_data_descriptor.zip](https://github.com/adamhathcock/sharpcompress/files/242365/test_data_descriptor.zip) Just FYI, when creating the ZIP file i also used the the command line parameter `-fd` which enforces usage of data descriptors. Not sure whether the ZIP tool on OSX provides this parameter, but i noticed that you didn't use it when creating your ZIP file (which could explain why your ZIP tool did not choose to use data descriptors based on whatever reasons and circumstances)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/sharpcompress#60