- Each 7Zip file now gets a fresh decompression stream instead of reusing shared streams - Added SyncOnlyStream wrapper to force async operations to use synchronous equivalents - This avoids LZMA decoder state corruption bugs in async operations - Performance trade-off: slower async extraction for 7Zip but correctness guaranteed - Documented behavior and performance implications in FORMATS.md All 14 ExtractAll tests now pass (both async and sync) Co-authored-by: adamhathcock <527620+adamhathcock@users.noreply.github.com>
6.0 KiB
Formats
Accessing Archives
- Archive classes allow random access to a seekable stream.
- Reader classes allow forward-only reading on a stream.
- Writer classes allow forward-only Writing on a stream.
Supported Format Table
| Archive Format | Compression Format(s) | Compress/Decompress | Archive API | Reader API | Writer API |
|---|---|---|---|---|---|
| Rar | Rar | Decompress (1) | RarArchive | RarReader | N/A |
| Zip (2) | None, Shrink, Reduce, Implode, DEFLATE, Deflate64, BZip2, LZMA/LZMA2, PPMd | Both | ZipArchive | ZipReader | ZipWriter |
| Tar | None | Both | TarArchive | TarReader | TarWriter (3) |
| Tar.GZip | DEFLATE | Both | TarArchive | TarReader | TarWriter (3) |
| Tar.BZip2 | BZip2 | Both | TarArchive | TarReader | TarWriter (3) |
| Tar.Zstandard | ZStandard | Decompress | TarArchive | TarReader | N/A |
| Tar.LZip | LZMA | Both | TarArchive | TarReader | TarWriter (3) |
| Tar.XZ | LZMA2 | Decompress | TarArchive | TarReader | TarWriter (3) |
| GZip (single file) | DEFLATE | Both | GZipArchive | GZipReader | GZipWriter |
| 7Zip (4) | LZMA, LZMA2, BZip2, PPMd, BCJ, BCJ2, Deflate | Decompress | SevenZipArchive | N/A | N/A |
- SOLID Rars are only supported in the RarReader API.
- Zip format supports pkware and WinzipAES encryption. However, encrypted LZMA is not supported. Zip64 reading/writing is supported but only with seekable streams as the Zip spec doesn't support Zip64 data in post data descriptors. Deflate64 is only supported for reading. See Zip Format Notes for details on multi-volume archives and streaming behavior.
- The Tar format requires a file size in the header. If no size is specified to the TarWriter and the stream is not seekable, then an exception will be thrown.
- The 7Zip format doesn't allow for reading as a forward-only stream so 7Zip is only supported through the Archive API. See 7Zip Format Notes for details on async extraction behavior.
- LZip has no support for extra data like the file name or timestamp. There is a default filename used when looking at the entry Key on the archive.
Zip Format Notes
- Multi-volume/split ZIP archives require ZipArchive (seekable streams) as ZipReader cannot seek across volume files.
- ZipReader processes entries from LocalEntry headers (which include directory entries ending with
/) and intentionally skips DirectoryEntry headers from the central directory, as they are redundant in streaming mode - all entry data comes from LocalEntry headers which ZipReader has already processed.
7Zip Format Notes
-
Async Extraction Performance: When using async extraction methods (e.g.,
ExtractAllEntries()withMoveToNextEntryAsync()), each file creates its own decompression stream to avoid state corruption in the LZMA decoder. This is less efficient than synchronous extraction, which can reuse a single decompression stream for multiple files in the same folder.Performance Impact: For archives with many small files in the same compression folder, async extraction will be slower than synchronous extraction because it must:
- Create a new LZMA decoder for each file
- Skip through the decompressed data to reach each file's starting position
Recommendation: For best performance with 7Zip archives, use synchronous extraction methods (
MoveToNextEntry()andWriteEntryToDirectory()) when possible. Use async methods only when you need to avoid blocking the thread (e.g., in UI applications or async-only contexts).Technical Details: 7Zip archives group files into "folders" (compression units), where all files in a folder share one continuous LZMA-compressed stream. The LZMA decoder maintains internal state (dictionary window, decoder positions) that assumes sequential, non-interruptible processing. Async operations can yield control during awaits, which would corrupt this shared state. To avoid this, async extraction creates a fresh decoder stream for each file.
Compression Streams
For those who want to directly compress/decompress bits. The single file formats are represented here as well. However, BZip2, LZip and XZ have no metadata (GZip has a little) so using them without something like a Tar file makes little sense.
| Compressor | Compress/Decompress |
|---|---|
| BZip2Stream | Both |
| GZipStream | Both |
| DeflateStream | Both |
| Deflate64Stream | Decompress |
| LZMAStream | Both |
| PPMdStream | Both |
| ADCStream | Decompress |
| LZipStream | Both |
| XZStream | Decompress |
| ZStandardStream | Decompress |
Archive Formats vs Compression
Sometimes the terminology gets mixed.
Compression
DEFLATE, LZMA are pure compression algorithms
Formats
Formats like Zip, 7Zip, Rar are archive formats only. They use other compression methods (e.g. DEFLATE, LZMA, etc.) or propriatory (e.g RAR)
Overlap
GZip, BZip2 and LZip are single file archival formats. The overlap in the API happens because Tar uses the single file formats as "compression" methods and the API tries to hide this a bit.