mirror of
https://github.com/adamhathcock/sharpcompress.git
synced 2026-02-09 05:24:55 +00:00
[PR #1163] [WIP] Improve I/O operations while reading big solid archives #1609
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Original Pull Request: https://github.com/adamhathcock/sharpcompress/pull/1163
State: closed
Merged: No
This PR is a tentative effort to improve the extraction time for 7z archives, especially ones that are compressed as 1 block solid using 16MB dictionary.
The main idea here is to reduce the number of access to cache ( while doing Skip ), improve performance in writing files to the disk ( use 1MB buffer by default ) and aggressively inline codepaths that are reused across executions ( avoiding jumps in the code but instead favoring a more linear execution at the expense of code size ).
Feel free to test it and report back. In my own case I can finally see the extraction of files in the archive mentioned in #1105
Below Claude Sonnet 4.5 summary
Performance Optimization Summary for 7Zip Solid Archive Extraction
Problem
7Zip extraction with large solid archives (16MB dictionaries, 1-block compression) was extremely slow in version 0.42.0+ compared to 0.41.0, taking hours instead of minutes on high-end hardware.
Root Causes Identified Through Profiling
Optimizations Implemented
1. Skip Operation Optimization
File: StreamExtensions.cs
ReadOnlySubStream + CopyTo(Stream.Null)(byte-by-byte) to buffered readingBufferedSubStreamto skip via internal methodFile: BufferedSubStream.cs
SkipInternal()method with 1MB buffer for efficient skipping2. Increased I/O Buffer Sizes (80KB → 1MB)
Files Modified:
Rationale: Modern NVMe Gen 5 drives benefit significantly from larger sequential I/O operations
3. FileStream Optimization
File: IArchiveEntryExtensions.cs
File.Open()with explicitFileStreamconstructoruseAsync: truefor async operations to enable overlapped I/O on Windows4. Path Processing Optimization
File: ExtractionMethods.cs
entry.Keyto avoid multiple property accessesCombine(folder, file)instead of separate operations5. LZMA Decompression Micro-optimizations
Files Modified:
Added
[MethodImpl(MethodImplOptions.AggressiveInlining)]to hot-path methods:RangeCoder.Decoder: GetThreshold, Decode, DecodeBit, DecodeDirectBitsBitDecoder.Decode(called millions of times)LzmaDecoder.LenDecoder.DecodeLzmaDecoder.Decoder2: DecodeNormal, DecodeWithMatchByteLzmaDecoder.LiteralDecoder: GetState, DecodeNormal, DecodeWithMatchByteRationale: These methods are in tight decompression loops; inlining reduces call overhead and enables cross-method JIT optimizations
Performance Results
Technical Notes
ArrayPool<byte>to minimize GC pressureTesting
Tested with 700MB+ solid 7zip archive with 16MB dictionary on AMD 9800X3D with NVMe Gen 5 drive. Extraction time improved from hours to minutes, matching performance of version 0.41.0.