Return bounded substreams when data descriptors are used in seekable zips #355

Closed
opened 2026-01-29 22:10:34 +00:00 by claunia · 1 comment
Owner

Originally created by @DannyBoyk on GitHub (Jun 3, 2019).

At my work, we have to process quite a few zip files whose generation we have very little control over. One of the zips we are receiving uses data descriptors (meaning the local file header lengths are zeroed out) and no compression.

SeekableZipFilePart::CreateBaseStream always returns the original BaseStream, which is the entire zip file. When compression is used, the algorithms are able to detect the end of their compressed parts and stop processing correctly, but, when no compression is used and the local file header lengths are zeroed (as was the case we have to handle), the entire zip file is returned for the ZipFilePart. So, for ever entry you ask for in the zip file, you essentially get the entire zip file following the entry's local file header.

What I believe SeekableZipFilePart should be doing is returning a bounded substream that utilizes the central directory information when the data descriptors are being used. I have coded up an implementation that does just this. It is very narrowly focused on this exact scenario.

PR will be put up momentarily.

Originally created by @DannyBoyk on GitHub (Jun 3, 2019). At my work, we have to process quite a few zip files whose generation we have very little control over. One of the zips we are receiving uses data descriptors (meaning the local file header lengths are zeroed out) and no compression. SeekableZipFilePart::CreateBaseStream always returns the original BaseStream, which is the entire zip file. When compression is used, the algorithms are able to detect the end of their compressed parts and stop processing correctly, but, when no compression is used and the local file header lengths are zeroed (as was the case we have to handle), the entire zip file is returned for the ZipFilePart. So, for ever entry you ask for in the zip file, you essentially get the entire zip file following the entry's local file header. What I believe SeekableZipFilePart should be doing is returning a bounded substream that utilizes the central directory information when the data descriptors are being used. I have coded up an implementation that does just this. It is very narrowly focused on this exact scenario. PR will be put up momentarily.
Author
Owner

@adamhathcock commented on GitHub (Jun 4, 2019):

Makes sense to me I believe. I think this is just a use-case I didn't try.

@adamhathcock commented on GitHub (Jun 4, 2019): Makes sense to me I believe. I think this is just a use-case I didn't try.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/sharpcompress#355