mirror of
https://github.com/adamhathcock/sharpcompress.git
synced 2026-02-03 21:23:38 +00:00
[PR #211] Add zip64 #903
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Original Pull Request: https://github.com/adamhathcock/sharpcompress/pull/211
State: closed
Merged: Yes
This adds zip64 writing support.
The way zip64 is implemented is by appending a set of "extra" values to the header.
In the zip file there are two headers for each file: local (before the stream) and central (end of file).
The central header is simple enough, and most implementations simply use this and mostly ignore the other header. Once we are writing the central header, we have all the information required, so we can just write it.
For the local header, there is a tradeoff. The "extra" bytes take up 2+2+8+8=20 bytes pr. entry. This header is only required if either stream size (compressed and uncompressed) exceeds
uint.MaxValue, but we do not know the length of the stream before writing.The dilemma is: do we write it for all files, in case one is too long? Or do we not write it and risk overflowing the values?
Since the header is "mostly ignored" we could live with this being broken. On the other hand, if we can use it we should.
I have added a hint value to the
ZipWriteEntryOptionsthat can force this header on and off. I have also added a flag inZipWriterOptionsthat enable the extra fields for all entries by default. If the caller does not set any flags, as I would assume most would, I use a threshold of2GiBto toggle zip64 headers. If the underlying stream is already 2GiB or more, zip64 is automatically enabled in the local header.This is not a perfect solution, but the I figure most users would write smaller zip files and never notices. Larger zip files are not really impacted by 20 bytes here and there. You can of course defeat the scheme by writing a 1.9GiB file, and then a +4GiB file, thus hitting the limitations before the automatic upgrade kicks in.
If the stream is non-seekable, we have another issue, namely that the file would normally set a flag and then write the Crc, uncompressed size, and compressed size in a special trailing header. This trailing header has not been updated to use zip64, so we cannot write the correct values in it. We can also not use both trailing headers and "extra" data. This was clarified from PKWare: https://blogs.oracle.com/xuemingshen/entry/is_zipinput_outputstream_handling_of
In the case of streaming, the local headers are written with the trailing data, which may overflow. But the headers do contain the crc32 value, and may contain correct data if the sizes are less than
uint.MaxValue. Again, the central directory header has the correct values.Not sure how to deal with testing, as it requires files +4GiB to hit the limitations.