[PR #1163] [CLOSED] [WIP] Improve I/O operations while reading big solid archives #1604

New Issue

claunia · 2026-01-29T22:21:21Z

claunia commented

2026-01-29 22:21:21 +00:00

📋 Pull Request Information

Original PR: https://github.com/adamhathcock/sharpcompress/pull/1163
Author: @julianxhokaxhiu
Created: 1/27/2026
Status: ❌ Closed

Base: master ← Head: fix/lzma-slow

📝 Commits (1)

bae32c4 Improve skip call CPU hotpath

📊 Changes

8 files changed (+191 additions, -50 deletions)

View changed files

📝 src/SharpCompress/Archives/IArchiveEntryExtensions.cs (+18 -3)
📝 src/SharpCompress/Common/ExtractionMethods.cs (+58 -42)
📝 src/SharpCompress/Compressors/LZMA/LzmaDecoder.cs (+7 -0)
📝 src/SharpCompress/Compressors/LZMA/RangeCoder/RangeCoder.cs (+4 -0)
📝 src/SharpCompress/Compressors/LZMA/RangeCoder/RangeCoderBit.cs (+5 -0)
📝 src/SharpCompress/IO/BufferedSubStream.cs (+64 -0)
📝 src/SharpCompress/Polyfills/StreamExtensions.cs (+28 -2)
📝 src/SharpCompress/Readers/AbstractReader.cs (+7 -3)

📄 Description

This PR is a tentative effort to improve the extraction time for 7z archives, especially ones that are compressed as 1 block solid using 16MB dictionary.

The main idea here is to reduce the number of access to cache ( while doing Skip ), improve performance in writing files to the disk ( use 1MB buffer by default ) and aggressively inline codepaths that are reused across executions ( avoiding jumps in the code but instead favoring a more linear execution at the expense of code size ).

Feel free to test it and report back. In my own case I can finally see the extraction of files in the archive mentioned in #1105

Below Claude Sonnet 4.5 summary

Performance Optimization Summary for 7Zip Solid Archive Extraction

Problem

7Zip extraction with large solid archives (16MB dictionaries, 1-block compression) was extremely slow in version 0.42.0+ compared to 0.41.0, taking hours instead of minutes on high-end hardware.

Root Causes Identified Through Profiling

StreamExtensions.Skip() consuming 94.54% CPU - byte-by-byte reading via CopyTo(Stream.Null)
Excessive Path.GetFullPath() calls - Called 3+ times per extracted file
Small I/O buffers - 80KB buffers insufficient for modern NVMe drives
Method call overhead - Hot-path LZMA decompression methods not inlined

Optimizations Implemented

1. Skip Operation Optimization

File: StreamExtensions.cs

Changed from ReadOnlySubStream + CopyTo(Stream.Null) (byte-by-byte) to buffered reading
Increased buffer size from implicit default to 1MB using ArrayPool
Added fast path for BufferedSubStream to skip via internal method
Impact: Reduced Skip CPU usage from 94.54% → 82.24% → negligible

File: BufferedSubStream.cs

Added SkipInternal() method with 1MB buffer for efficient skipping
Skips cached data instantly, uses large buffered reads for remainder
Impact: Eliminates repeated RefillCache() calls when skipping large amounts of data

2. Increased I/O Buffer Sizes (80KB → 1MB)

Files Modified:

IArchiveEntryExtensions.cs - BufferSize constant
AbstractReader.cs - CopyTo/CopyToAsync calls

Rationale: Modern NVMe Gen 5 drives benefit significantly from larger sequential I/O operations

Impact: Better disk I/O throughput, reduced system calls

3. FileStream Optimization

File: IArchiveEntryExtensions.cs

Replaced File.Open() with explicit FileStream constructor
Specified 1MB buffer size explicitly
Added useAsync: true for async operations to enable overlapped I/O on Windows
Impact: Better async I/O performance, reduced context switching

4. Path Processing Optimization

File: ExtractionMethods.cs

Cached entry.Key to avoid multiple property accesses
Reduced Path.GetFullPath() calls from 3 per file to 1 by consolidating path operations
Combined Path.Combine calls: Combine(folder, file) instead of separate operations
Moved security validation before filesystem calls to avoid unnecessary I/O
Impact: WriteEntryToDirectory CPU reduced from 84.95% → 31.74% (63% reduction)

5. LZMA Decompression Micro-optimizations

Files Modified:

RangeCoder.cs
RangeCoderBit.cs
LzmaDecoder.cs

Added [MethodImpl(MethodImplOptions.AggressiveInlining)] to hot-path methods:

RangeCoder.Decoder: GetThreshold, Decode, DecodeBit, DecodeDirectBits
BitDecoder.Decode (called millions of times)
LzmaDecoder.LenDecoder.Decode
LzmaDecoder.Decoder2: DecodeNormal, DecodeWithMatchByte
LzmaDecoder.LiteralDecoder: GetState, DecodeNormal, DecodeWithMatchByte

Rationale: These methods are in tight decompression loops; inlining reduces call overhead and enables cross-method JIT optimizations

Impact: Reduced LZMA decompression overhead, though fundamental decompression work remains CPU-intensive as expected

Performance Results

Skip operation: No longer a bottleneck (was 94.54% CPU)
Path processing: 63% reduction in WriteEntryToDirectory overhead
Overall extraction: Significantly faster, especially noticeable with large solid archives
Hot path: Now dominated by actual LZMA decompression work (unavoidable)

Technical Notes

All buffer allocations use ArrayPool<byte> to minimize GC pressure
Changes maintain backward compatibility
Security validations (path traversal checks) preserved
Code formatted with CSharpier per project standards

Testing

Tested with 700MB+ solid 7zip archive with 16MB dictionary on AMD 9800X3D with NVMe Gen 5 drive. Extraction time improved from hours to minutes, matching performance of version 0.41.0.

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/adamhathcock/sharpcompress/pull/1163 **Author:** [@julianxhokaxhiu](https://github.com/julianxhokaxhiu) **Created:** 1/27/2026 **Status:** ❌ Closed **Base:** `master` ← **Head:** `fix/lzma-slow` --- ### 📝 Commits (1) - [`bae32c4`](https://github.com/adamhathcock/sharpcompress/commit/bae32c4c26d80f7ae620a10f33a31cf4f9288ee3) Improve skip call CPU hotpath ### 📊 Changes **8 files changed** (+191 additions, -50 deletions) <details> <summary>View changed files</summary> 📝 `src/SharpCompress/Archives/IArchiveEntryExtensions.cs` (+18 -3) 📝 `src/SharpCompress/Common/ExtractionMethods.cs` (+58 -42) 📝 `src/SharpCompress/Compressors/LZMA/LzmaDecoder.cs` (+7 -0) 📝 `src/SharpCompress/Compressors/LZMA/RangeCoder/RangeCoder.cs` (+4 -0) 📝 `src/SharpCompress/Compressors/LZMA/RangeCoder/RangeCoderBit.cs` (+5 -0) 📝 `src/SharpCompress/IO/BufferedSubStream.cs` (+64 -0) 📝 `src/SharpCompress/Polyfills/StreamExtensions.cs` (+28 -2) 📝 `src/SharpCompress/Readers/AbstractReader.cs` (+7 -3) </details> ### 📄 Description This PR is a tentative effort to improve the extraction time for 7z archives, especially ones that are compressed as 1 block solid using 16MB dictionary. The main idea here is to reduce the number of access to cache ( while doing Skip ), improve performance in writing files to the disk ( use 1MB buffer by default ) and aggressively inline codepaths that are reused across executions ( avoiding jumps in the code but instead favoring a more linear execution at the expense of code size ). Feel free to test it and report back. In my own case I can finally see the extraction of files in the archive mentioned in #1105 Below Claude Sonnet 4.5 summary --- ## Performance Optimization Summary for 7Zip Solid Archive Extraction ### Problem 7Zip extraction with large solid archives (16MB dictionaries, 1-block compression) was extremely slow in version 0.42.0+ compared to 0.41.0, taking hours instead of minutes on high-end hardware. ### Root Causes Identified Through Profiling 1. **StreamExtensions.Skip()** consuming 94.54% CPU - byte-by-byte reading via CopyTo(Stream.Null) 2. **Excessive Path.GetFullPath() calls** - Called 3+ times per extracted file 3. **Small I/O buffers** - 80KB buffers insufficient for modern NVMe drives 4. **Method call overhead** - Hot-path LZMA decompression methods not inlined ### Optimizations Implemented #### 1. Skip Operation Optimization **File**: StreamExtensions.cs - Changed from `ReadOnlySubStream + CopyTo(Stream.Null)` (byte-by-byte) to buffered reading - Increased buffer size from implicit default to **1MB** using ArrayPool - Added fast path for `BufferedSubStream` to skip via internal method - **Impact**: Reduced Skip CPU usage from 94.54% → 82.24% → negligible **File**: BufferedSubStream.cs - Added `SkipInternal()` method with 1MB buffer for efficient skipping - Skips cached data instantly, uses large buffered reads for remainder - **Impact**: Eliminates repeated RefillCache() calls when skipping large amounts of data #### 2. Increased I/O Buffer Sizes (80KB → 1MB) **Files Modified**: - IArchiveEntryExtensions.cs - BufferSize constant - AbstractReader.cs - CopyTo/CopyToAsync calls **Rationale**: Modern NVMe Gen 5 drives benefit significantly from larger sequential I/O operations - **Impact**: Better disk I/O throughput, reduced system calls #### 3. FileStream Optimization **File**: IArchiveEntryExtensions.cs - Replaced `File.Open()` with explicit `FileStream` constructor - Specified 1MB buffer size explicitly - Added `useAsync: true` for async operations to enable overlapped I/O on Windows - **Impact**: Better async I/O performance, reduced context switching #### 4. Path Processing Optimization **File**: ExtractionMethods.cs - **Cached `entry.Key`** to avoid multiple property accesses - **Reduced Path.GetFullPath() calls** from 3 per file to 1 by consolidating path operations - **Combined Path.Combine calls**: `Combine(folder, file)` instead of separate operations - **Moved security validation** before filesystem calls to avoid unnecessary I/O - **Impact**: WriteEntryToDirectory CPU reduced from 84.95% → 31.74% (63% reduction) #### 5. LZMA Decompression Micro-optimizations **Files Modified**: - RangeCoder.cs - RangeCoderBit.cs - LzmaDecoder.cs **Added `[MethodImpl(MethodImplOptions.AggressiveInlining)]` to hot-path methods**: - `RangeCoder.Decoder`: GetThreshold, Decode, DecodeBit, DecodeDirectBits - `BitDecoder.Decode` (called millions of times) - `LzmaDecoder.LenDecoder.Decode` - `LzmaDecoder.Decoder2`: DecodeNormal, DecodeWithMatchByte - `LzmaDecoder.LiteralDecoder`: GetState, DecodeNormal, DecodeWithMatchByte **Rationale**: These methods are in tight decompression loops; inlining reduces call overhead and enables cross-method JIT optimizations - **Impact**: Reduced LZMA decompression overhead, though fundamental decompression work remains CPU-intensive as expected ### Performance Results - **Skip operation**: No longer a bottleneck (was 94.54% CPU) - **Path processing**: 63% reduction in WriteEntryToDirectory overhead - **Overall extraction**: Significantly faster, especially noticeable with large solid archives - **Hot path**: Now dominated by actual LZMA decompression work (unavoidable) ### Technical Notes - All buffer allocations use `ArrayPool<byte>` to minimize GC pressure - Changes maintain backward compatibility - Security validations (path traversal checks) preserved - Code formatted with CSharpier per project standards ### Testing Tested with 700MB+ solid 7zip archive with 16MB dictionary on AMD 9800X3D with NVMe Gen 5 drive. Extraction time improved from hours to minutes, matching performance of version 0.41.0. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>

claunia added the pull-request label 2026-01-29 22:21:21 +00:00

Sign in to join this conversation.

Branches Tags

master

release

adam/merge-release-to-master

dependabot/nuget/xunit.v3-3.2.2

adam/more-explode-async

copilot/fix-infinite-loop-rar-archive

adam/data-descriptor-fix

adam/fix-tests-with-proper-rewind

copilot/fix-data-descriptor-stream-bug

adam/lmza-investigation

adam/create-rar-async

adam/async-rar2

copilot/support-multi-threading-path

copilot/sub-pr-1132-again

adam/memory-perf

copilot/add-performance-benchmarking

copilot/sub-pr-1121

copilot/add-password-support-zip-files

copilot/add-so-optimized-zip-support

adam/rar-async-only

copilot/add-buffered-stream-async-read

copilot/sub-pr-1076

copilot/fix-decompression-exception

copilot/fix-archivefactory-issue

copilot/rationalize-sourcestream-volumes

adam/open-async

copilot/add-ace-archive-support

copilot/sub-pr-1040-again

adam/more-async-3

copilot/fix-tararchive-incomplete-iteration

adam/multi-threaded

copilot/sub-pr-1040

adam/awesome-copilot

copilot/fix-ziparchive-extraction-issue

copilot/fix-tararchive-open-crash

copilot/fix-tar-xz-file-reading-issue

copilot/setup-copilot-instructions

copilot/fix-decompression-performance-issue

copilot/convert-stream-access-to-async

adam/enable-agent

adam/async-deflate

adam/async-rar

adam/more-cleanup

adam/zstd

async-2

zstandard

net461-tests

dmg

async

build-netcore3

recycle-memory-stream

presentation

pax

netcore2

zip_encryption

dotnet-tool

tar_redux

native_zlib

Issue-197

system_buffers

TarNames

7zip_sfx

portable_crypto

WinRT

new_7zip

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/sharpcompress#1604