mirror of
https://github.com/adamhathcock/sharpcompress.git
synced 2026-02-04 05:25:00 +00:00
ZIP archive file entries with an "data descriptor structure" will confuse ZipReader #57
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @elgonzo on GitHub (Sep 4, 2015).
Originally assigned to: @adamhathcock on GitHub.
When a ZIP archive file entry has a data descriptor structure following its compressed file data, then ZipReader will falsely report the CRC and file size for this entry being zero. This in itself is more an inconvenience than an error when considering "streaming" of an ZIP archive. Note that it is still possible to obtain the decompressed file data of such a file entry by reading its EntryStream (ZipReader.OpenEntryStream()) until the end of the EntryStream.
However, calling ZipReader.MoveToNextEntry() without reading the EntryStream of such a file entry will upset the ZipReader and make it seek to some arbitrary position in the ZIP file. It will read 4 bytes at this position, expecting to find a local file header signature (i guess). Since those 4 bytes at this arbitrary file position will not be a valid signature, the ZipHeaderFactory.ReadHeader(...) method will throw a NotSupportedException telling: "Unknown header: <random number>".
I have seen a few reports about NotSupportedExceptions telling "Unknown header: <some random number>". Although i cannot be sure what caused the NotSupportedExceptions in those cases, it is certainly a possibility that they might have been caused by the problem i explain here.
What i believe ZipReader should do:
ZipReader can check for "Crc-32" and "Compressed size" fields being zero. If that is the case and this file entry should be skipped (instead of being extracted), then ZipReader could (A) check the compression mode and/or if a signature is following this file entry -- which would indicate a zero-byte. If the entry has not been identified as a zero-byte file, then (B) ZipReader can attempt decompressing the file data in memory to get to the end of the compressed data and thus reaching the optional data descriptor of this entry or the local file header of the next archive entry.
Background info:
The ".ZIP File Format Specification" contains more information with regard to data descriptor structures. Especially the following chapters are worth a read:
4.3.7 Local file header
4.3.9 Data descriptor
4.4.4 General purpose bit flag, Bit 3
4.4.7 CRC-32
4.4.8 compressed size
4.4.9 uncompressed size
Link to the ".ZIP File Format Specification": https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT
Also pay attention to the paragraphs 4.3.5 (a data descriptor structure may be present even if general purpose bit 3 is not set), 4.3.9.3 (optional data descriptor signature 0x08074b50) and 4.3.9.6 (data descriptor and central directory encryption).
Remarks
ZipArchive and its ZipArchive.Entries enumeration do not seem to be affected by this issue.
The ZIP archive i found to have file entries as described above is about 300MB in size. This is obviously too large for uploading it as a sample. I will provide a small ZIP file with the described file entries as soon as i managed to produce one myself ;)