[PR #1038] Add SOZip (Seek-Optimized ZIP) support #1461

Open
opened 2026-01-29 22:20:41 +00:00 by claunia · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/adamhathcock/sharpcompress/pull/1038
Author: @Copilot
Created: 11/26/2025
Status: 🔄 Open

Base: masterHead: copilot/add-so-optimized-zip-support


📝 Commits (10+)

  • 0ac6b46 Initial plan
  • ac4bcd0 Add SOZip index data structure and basic tests
  • a350899 Add SOZip detection in ZipEntry and additional tests
  • d9c9612 Update documentation for SOZip support
  • 8c6d914 reader tests don't pass or make sense
  • 7339567 Fix SOZip tests to work correctly with ZipReader and ZipArchive
  • 9058645 sozip writing and validation
  • 0dc6322 Merge master branch and resolve FORMATS.md conflict
  • 130e169 Merge remote-tracking branch 'origin/master' into copilot/add-so-optimized-zip-support
  • b3ce90a Remove foo.zip and add Zip.sozip.zip test archive with tests

📊 Changes

13 files changed (+1354 additions, -7 deletions)

View changed files

📝 FORMATS.md (+1 -1)
📝 src/SharpCompress/Common/Zip/Headers/LocalEntryHeaderExtraFactory.cs (+43 -0)
src/SharpCompress/Common/Zip/SOZip/SOZipDeflateStream.cs (+150 -0)
src/SharpCompress/Common/Zip/SOZip/SOZipIndex.cs (+367 -0)
📝 src/SharpCompress/Common/Zip/ZipEntry.cs (+22 -1)
📝 src/SharpCompress/Writers/Zip/ZipCentralDirectoryEntry.cs (+1 -0)
📝 src/SharpCompress/Writers/Zip/ZipWriter.cs (+104 -2)
📝 src/SharpCompress/Writers/Zip/ZipWriterEntryOptions.cs (+7 -0)
📝 src/SharpCompress/Writers/Zip/ZipWriterOptions.cs (+27 -0)
📝 tests/SharpCompress.Test/TestBase.cs (+17 -3)
tests/SharpCompress.Test/Zip/SOZipReaderTests.cs (+257 -0)
tests/SharpCompress.Test/Zip/SoZipWriterTests.cs (+358 -0)
tests/TestArchives/Archives/Zip.sozip.zip (+0 -0)

📄 Description

Adds support for SOZip (Seek-Optimized ZIP), a ZIP profile that enables random access within DEFLATE-compressed files by storing sync flush point offsets in hidden index files.

Changes Made

Core SOZip Implementation

  • Added SOZipIndex class to read/write index files containing chunk offsets, sizes, and metadata
  • Added SOZipExtraField class to parse SOZip extra field (0x564B) linking entries to their index files
  • Added SOZipDeflateStream for Deflate compression that inserts sync flush points at chunk boundaries
  • Added SOZip extra data type (0x564B) to ExtraDataType enum

ZipEntry Extensions

  • Added IsSozip property to detect entries with SOZip extra field
  • Added IsSozipIndexFile property to identify hidden index files by name pattern (.filename.sozip.idx)

Writer Options

  • Added EnableSOZip, SOZipChunkSize, and SOZipMinFileSize configuration options on ZipWriterOptions
  • Added per-entry EnableSOZip override on ZipWriterEntryOptions

Test Archive and Tests

  • Created Zip.sozip.zip test archive with proper SOZip structure (data file with index file containing chunk offsets)
  • Added comprehensive tests for SOZip reading with both ZipReader and ZipArchive
  • Added tests for SOZipIndex serialization/deserialization
  • Added round-trip compression and decompression tests
  • Removed old foo.zip test archive

Documentation

  • Updated FORMATS.md to document SOZip detection support for reading

Testing

  • All 19 SOZip tests passing
  • SOZip archive creation and reading validated with ZipReader and ZipArchive
  • Index file structure validated (magic bytes, chunk offsets, metadata)
  • Data integrity verified for compressed and decompressed content
  • Backward compatibility maintained (regular ZIP files work as expected)

Notes

Reading and detection of SOZip archives is fully implemented. Writer configuration options are in place; full implementation of writing SOZip archives with sync flush compression is available for testing.

Original prompt

This section details on the original issue you should resolve

<issue_title>Seek Optimized Zip </issue_title>
<issue_description>Is there plans to support reading and writing of SO SEEK OPTIMIZED ZIP ARCHIVES?</issue_description>

Comments on the Issue (you are @copilot in this section)

@adamhathcock What is this? A new Zip style/feature?

PRs are welcome.</comment_new>
<comment_new>@adamhathcock

Version

  • Version: 0.5.0
  • Date: 2023-Jan-06

License

This specification document is (C) 2022-2023 Even Rouault and licensed under the
CC-BY-4.0 terms.

Note: the scope of the copyrighted material does, of course, not extend onto
any source or binary code derived from the specification.

What is SOZip ?

A Seek-Optimized ZIP file (SOZip) is a
ZIP file that contains one
or several Deflate-compressed files
that are organized and annotated such that a SOZip-aware reader can perform
very fast random access (seek) within a compressed file.

SOZip makes it possible to access large compressed files directly from a .zip
file without prior decompression. It is not a new file format, but a profile
of the existing ZIP format, done in a fully backward compatible way. ZIP
readers that are non-SOZip aware can read a SOZip-enabled file
normally and ignore the extended features that support efficient seek
capability.

Use cases

This specification is intended to be general purpose / not domain specific.

SOZip was first developed to serve geospatial use cases, which commonly
have large compressed files inside of ZIP archives. In particular, it makes it
possible for users to read large Geographic Information Systems (GIS) files using the
Shapefile,
GeoPackage or
FlatGeobuf formats (which have no native provision
for compression) compressed in .zip files without prior decompression.

Efficient random access and selective decompression are a requirement to provide
acceptable performance in many usage scenarios: spatial index filtering, access to a
feature by its identifier, etc.

High-level specification

The SOZip optimization relies on two independent and combined mechanisms:

  • The first mechanism is the generation of a [Deflate](htt...</comment_new>

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/adamhathcock/sharpcompress/pull/1038 **Author:** [@Copilot](https://github.com/apps/copilot-swe-agent) **Created:** 11/26/2025 **Status:** 🔄 Open **Base:** `master` ← **Head:** `copilot/add-so-optimized-zip-support` --- ### 📝 Commits (10+) - [`0ac6b46`](https://github.com/adamhathcock/sharpcompress/commit/0ac6b4637935ba5ffe40256684dcff5cd1e76da0) Initial plan - [`ac4bcd0`](https://github.com/adamhathcock/sharpcompress/commit/ac4bcd0fe3ad45a014a7035c14b15f4073663190) Add SOZip index data structure and basic tests - [`a350899`](https://github.com/adamhathcock/sharpcompress/commit/a35089900fb6142fa72fb2ea99c33e5aefc3247c) Add SOZip detection in ZipEntry and additional tests - [`d9c9612`](https://github.com/adamhathcock/sharpcompress/commit/d9c9612b8f71eb608282724979f0884d2c1909a3) Update documentation for SOZip support - [`8c6d914`](https://github.com/adamhathcock/sharpcompress/commit/8c6d9140040190a3ecba630f35eefbd1fda8dafa) reader tests don't pass or make sense - [`7339567`](https://github.com/adamhathcock/sharpcompress/commit/7339567880c9e28bf52a59f7038669797360798f) Fix SOZip tests to work correctly with ZipReader and ZipArchive - [`9058645`](https://github.com/adamhathcock/sharpcompress/commit/9058645feae652505042663a14df2de3a4234b64) sozip writing and validation - [`0dc6322`](https://github.com/adamhathcock/sharpcompress/commit/0dc63223abd2b71c9ea6878cae4c68c209e9e509) Merge master branch and resolve FORMATS.md conflict - [`130e169`](https://github.com/adamhathcock/sharpcompress/commit/130e1698623b954c8ac29829f45a3e4da385bc3d) Merge remote-tracking branch 'origin/master' into copilot/add-so-optimized-zip-support - [`b3ce90a`](https://github.com/adamhathcock/sharpcompress/commit/b3ce90ae942d4d7638826368995559f5e1917b1e) Remove foo.zip and add Zip.sozip.zip test archive with tests ### 📊 Changes **13 files changed** (+1354 additions, -7 deletions) <details> <summary>View changed files</summary> 📝 `FORMATS.md` (+1 -1) 📝 `src/SharpCompress/Common/Zip/Headers/LocalEntryHeaderExtraFactory.cs` (+43 -0) ➕ `src/SharpCompress/Common/Zip/SOZip/SOZipDeflateStream.cs` (+150 -0) ➕ `src/SharpCompress/Common/Zip/SOZip/SOZipIndex.cs` (+367 -0) 📝 `src/SharpCompress/Common/Zip/ZipEntry.cs` (+22 -1) 📝 `src/SharpCompress/Writers/Zip/ZipCentralDirectoryEntry.cs` (+1 -0) 📝 `src/SharpCompress/Writers/Zip/ZipWriter.cs` (+104 -2) 📝 `src/SharpCompress/Writers/Zip/ZipWriterEntryOptions.cs` (+7 -0) 📝 `src/SharpCompress/Writers/Zip/ZipWriterOptions.cs` (+27 -0) 📝 `tests/SharpCompress.Test/TestBase.cs` (+17 -3) ➕ `tests/SharpCompress.Test/Zip/SOZipReaderTests.cs` (+257 -0) ➕ `tests/SharpCompress.Test/Zip/SoZipWriterTests.cs` (+358 -0) ➕ `tests/TestArchives/Archives/Zip.sozip.zip` (+0 -0) </details> ### 📄 Description Adds support for SOZip (Seek-Optimized ZIP), a ZIP profile that enables random access within DEFLATE-compressed files by storing sync flush point offsets in hidden index files. ## Changes Made ### Core SOZip Implementation - Added `SOZipIndex` class to read/write index files containing chunk offsets, sizes, and metadata - Added `SOZipExtraField` class to parse SOZip extra field (0x564B) linking entries to their index files - Added `SOZipDeflateStream` for Deflate compression that inserts sync flush points at chunk boundaries - Added SOZip extra data type (0x564B) to `ExtraDataType` enum ### ZipEntry Extensions - Added `IsSozip` property to detect entries with SOZip extra field - Added `IsSozipIndexFile` property to identify hidden index files by name pattern (`.filename.sozip.idx`) ### Writer Options - Added `EnableSOZip`, `SOZipChunkSize`, and `SOZipMinFileSize` configuration options on `ZipWriterOptions` - Added per-entry `EnableSOZip` override on `ZipWriterEntryOptions` ### Test Archive and Tests - Created `Zip.sozip.zip` test archive with proper SOZip structure (data file with index file containing chunk offsets) - Added comprehensive tests for SOZip reading with both ZipReader and ZipArchive - Added tests for SOZipIndex serialization/deserialization - Added round-trip compression and decompression tests - Removed old `foo.zip` test archive ### Documentation - Updated FORMATS.md to document SOZip detection support for reading ## Testing - ✅ All 19 SOZip tests passing - ✅ SOZip archive creation and reading validated with ZipReader and ZipArchive - ✅ Index file structure validated (magic bytes, chunk offsets, metadata) - ✅ Data integrity verified for compressed and decompressed content - ✅ Backward compatibility maintained (regular ZIP files work as expected) ## Notes Reading and detection of SOZip archives is fully implemented. Writer configuration options are in place; full implementation of writing SOZip archives with sync flush compression is available for testing. <!-- START COPILOT CODING AGENT SUFFIX --> <details> <summary>Original prompt</summary> > > ---- > > *This section details on the original issue you should resolve* > > <issue_title>Seek Optimized Zip </issue_title> > <issue_description>Is there plans to support reading and writing of SO SEEK OPTIMIZED ZIP ARCHIVES?</issue_description> > > ## Comments on the Issue (you are @copilot in this section) > > <comments> > <comment_new><author>@adamhathcock</author><body> > What is this? A new Zip style/feature? > > PRs are welcome.</body></comment_new> > <comment_new><author>@adamhathcock</author><body> > # Version > > - Version: 0.5.0 > - Date: 2023-Jan-06 > > # License > > This specification document is (C) 2022-2023 Even Rouault and licensed under the > [CC-BY-4.0](https://spdx.org/licenses/CC-BY-4.0.html) terms. > > Note: the scope of the copyrighted material does, of course, not extend onto > any source or binary code derived from the specification. > > # What is SOZip ? > > A Seek-Optimized ZIP file (SOZip) is a > [ZIP](https://en.wikipedia.org/wiki/ZIP_(file_format)) file that contains one > or several [Deflate](https://www.ietf.org/rfc/rfc1951.txt)-compressed files > that are organized and annotated such that a SOZip-aware reader can perform > very fast random access (seek) within a compressed file. > > SOZip makes it possible to access large compressed files directly from a .zip > file without prior decompression. It is *not* a new file format, but a profile > of the existing ZIP format, done in a fully backward compatible way. ZIP > readers that are non-SOZip aware can read a SOZip-enabled file > normally and ignore the extended features that support efficient seek > capability. > > # Use cases > > This specification is intended to be general purpose / not domain specific. > > SOZip was first developed to serve geospatial use cases, which commonly > have large compressed files inside of ZIP archives. In particular, it makes it > possible for users to read large Geographic Information Systems (GIS) files using the > [Shapefile](https://en.wikipedia.org/wiki/Shapefile), > [GeoPackage](https://www.geopackage.org/) or > [FlatGeobuf](http://flatgeobuf.org/) formats (which have no native provision > for compression) compressed in .zip files without prior decompression. > > Efficient random access and selective decompression are a requirement to provide > acceptable performance in many usage scenarios: spatial index filtering, access to a > feature by its identifier, etc. > > > # High-level specification > > The SOZip optimization relies on two independent and combined mechanisms: > > * The first mechanism is the generation of a [Deflate](htt...</body></comment_new> > </comments> > </details> - Fixes adamhathcock/sharpcompress#719 <!-- START COPILOT CODING AGENT TIPS --> --- 💬 We'd love your input! Share your thoughts on Copilot coding agent in our [2 minute survey](https://gh.io/copilot-coding-agent-survey). --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
claunia added the pull-request label 2026-01-29 22:20:41 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/sharpcompress#1461