[PR #211] Add zip64 #903

Open
opened 2026-01-29 22:18:08 +00:00 by claunia · 0 comments
Owner

Original Pull Request: https://github.com/adamhathcock/sharpcompress/pull/211

State: closed
Merged: Yes


This adds zip64 writing support.

The way zip64 is implemented is by appending a set of "extra" values to the header.
In the zip file there are two headers for each file: local (before the stream) and central (end of file).

The central header is simple enough, and most implementations simply use this and mostly ignore the other header. Once we are writing the central header, we have all the information required, so we can just write it.

For the local header, there is a tradeoff. The "extra" bytes take up 2+2+8+8=20 bytes pr. entry. This header is only required if either stream size (compressed and uncompressed) exceeds uint.MaxValue, but we do not know the length of the stream before writing.

The dilemma is: do we write it for all files, in case one is too long? Or do we not write it and risk overflowing the values?

Since the header is "mostly ignored" we could live with this being broken. On the other hand, if we can use it we should.

I have added a hint value to the ZipWriteEntryOptions that can force this header on and off. I have also added a flag in ZipWriterOptions that enable the extra fields for all entries by default. If the caller does not set any flags, as I would assume most would, I use a threshold of 2GiB to toggle zip64 headers. If the underlying stream is already 2GiB or more, zip64 is automatically enabled in the local header.
This is not a perfect solution, but the I figure most users would write smaller zip files and never notices. Larger zip files are not really impacted by 20 bytes here and there. You can of course defeat the scheme by writing a 1.9GiB file, and then a +4GiB file, thus hitting the limitations before the automatic upgrade kicks in.

If the stream is non-seekable, we have another issue, namely that the file would normally set a flag and then write the Crc, uncompressed size, and compressed size in a special trailing header. This trailing header has not been updated to use zip64, so we cannot write the correct values in it. We can also not use both trailing headers and "extra" data. This was clarified from PKWare: https://blogs.oracle.com/xuemingshen/entry/is_zipinput_outputstream_handling_of

In the case of streaming, the local headers are written with the trailing data, which may overflow. But the headers do contain the crc32 value, and may contain correct data if the sizes are less than uint.MaxValue. Again, the central directory header has the correct values.

Not sure how to deal with testing, as it requires files +4GiB to hit the limitations.

**Original Pull Request:** https://github.com/adamhathcock/sharpcompress/pull/211 **State:** closed **Merged:** Yes --- This adds zip64 writing support. The way zip64 is implemented is by appending a set of "extra" values to the header. In the zip file there are two headers for each file: local (before the stream) and central (end of file). The central header is simple enough, and most implementations simply use this and mostly ignore the other header. Once we are writing the central header, we have all the information required, so we can just write it. For the local header, there is a tradeoff. The "extra" bytes take up 2+2+8+8=20 bytes pr. entry. This header is only required if either stream size (compressed and uncompressed) exceeds `uint.MaxValue`, but we do not know the length of the stream before writing. The dilemma is: do we write it for all files, in case one is too long? Or do we not write it and risk overflowing the values? Since the header is "mostly ignored" we could live with this being broken. On the other hand, if we can use it we should. I have added a hint value to the `ZipWriteEntryOptions` that can force this header on and off. I have also added a flag in `ZipWriterOptions` that enable the extra fields for all entries by default. If the caller does not set any flags, as I would assume most would, I use a threshold of `2GiB` to toggle zip64 headers. If the underlying stream is already 2GiB or more, zip64 is automatically enabled in the local header. This is not a perfect solution, but the I figure most users would write smaller zip files and never notices. Larger zip files are not really impacted by 20 bytes here and there. You can of course defeat the scheme by writing a 1.9GiB file, and then a +4GiB file, thus hitting the limitations before the automatic upgrade kicks in. If the stream is non-seekable, we have another issue, namely that the file would normally set a flag and then write the Crc, uncompressed size, and compressed size in a special trailing header. This trailing header has not been updated to use zip64, so we cannot write the correct values in it. We can also not use both trailing headers and "extra" data. This was clarified from PKWare: https://blogs.oracle.com/xuemingshen/entry/is_zipinput_outputstream_handling_of In the case of streaming, the local headers are written with the trailing data, which may overflow. But the headers do contain the crc32 value, and may contain correct data if the sizes are less than `uint.MaxValue`. Again, the central directory header has the correct values. Not sure how to deal with testing, as it requires files +4GiB to hit the limitations.
claunia added the pull-request label 2026-01-29 22:18:08 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/sharpcompress#903