[PR #1163] [CLOSED] [WIP] Improve I/O operations while reading big solid archives #1604

Open
opened 2026-01-29 22:21:21 +00:00 by claunia · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/adamhathcock/sharpcompress/pull/1163
Author: @julianxhokaxhiu
Created: 1/27/2026
Status: Closed

Base: masterHead: fix/lzma-slow


📝 Commits (1)

  • bae32c4 Improve skip call CPU hotpath

📊 Changes

8 files changed (+191 additions, -50 deletions)

View changed files

📝 src/SharpCompress/Archives/IArchiveEntryExtensions.cs (+18 -3)
📝 src/SharpCompress/Common/ExtractionMethods.cs (+58 -42)
📝 src/SharpCompress/Compressors/LZMA/LzmaDecoder.cs (+7 -0)
📝 src/SharpCompress/Compressors/LZMA/RangeCoder/RangeCoder.cs (+4 -0)
📝 src/SharpCompress/Compressors/LZMA/RangeCoder/RangeCoderBit.cs (+5 -0)
📝 src/SharpCompress/IO/BufferedSubStream.cs (+64 -0)
📝 src/SharpCompress/Polyfills/StreamExtensions.cs (+28 -2)
📝 src/SharpCompress/Readers/AbstractReader.cs (+7 -3)

📄 Description

This PR is a tentative effort to improve the extraction time for 7z archives, especially ones that are compressed as 1 block solid using 16MB dictionary.

The main idea here is to reduce the number of access to cache ( while doing Skip ), improve performance in writing files to the disk ( use 1MB buffer by default ) and aggressively inline codepaths that are reused across executions ( avoiding jumps in the code but instead favoring a more linear execution at the expense of code size ).

Feel free to test it and report back. In my own case I can finally see the extraction of files in the archive mentioned in #1105

Below Claude Sonnet 4.5 summary


Performance Optimization Summary for 7Zip Solid Archive Extraction

Problem

7Zip extraction with large solid archives (16MB dictionaries, 1-block compression) was extremely slow in version 0.42.0+ compared to 0.41.0, taking hours instead of minutes on high-end hardware.

Root Causes Identified Through Profiling

  1. StreamExtensions.Skip() consuming 94.54% CPU - byte-by-byte reading via CopyTo(Stream.Null)
  2. Excessive Path.GetFullPath() calls - Called 3+ times per extracted file
  3. Small I/O buffers - 80KB buffers insufficient for modern NVMe drives
  4. Method call overhead - Hot-path LZMA decompression methods not inlined

Optimizations Implemented

1. Skip Operation Optimization

File: StreamExtensions.cs

  • Changed from ReadOnlySubStream + CopyTo(Stream.Null) (byte-by-byte) to buffered reading
  • Increased buffer size from implicit default to 1MB using ArrayPool
  • Added fast path for BufferedSubStream to skip via internal method
  • Impact: Reduced Skip CPU usage from 94.54% → 82.24% → negligible

File: BufferedSubStream.cs

  • Added SkipInternal() method with 1MB buffer for efficient skipping
  • Skips cached data instantly, uses large buffered reads for remainder
  • Impact: Eliminates repeated RefillCache() calls when skipping large amounts of data

2. Increased I/O Buffer Sizes (80KB → 1MB)

Files Modified:

  • IArchiveEntryExtensions.cs - BufferSize constant
  • AbstractReader.cs - CopyTo/CopyToAsync calls

Rationale: Modern NVMe Gen 5 drives benefit significantly from larger sequential I/O operations

  • Impact: Better disk I/O throughput, reduced system calls

3. FileStream Optimization

File: IArchiveEntryExtensions.cs

  • Replaced File.Open() with explicit FileStream constructor
  • Specified 1MB buffer size explicitly
  • Added useAsync: true for async operations to enable overlapped I/O on Windows
  • Impact: Better async I/O performance, reduced context switching

4. Path Processing Optimization

File: ExtractionMethods.cs

  • Cached entry.Key to avoid multiple property accesses
  • Reduced Path.GetFullPath() calls from 3 per file to 1 by consolidating path operations
  • Combined Path.Combine calls: Combine(folder, file) instead of separate operations
  • Moved security validation before filesystem calls to avoid unnecessary I/O
  • Impact: WriteEntryToDirectory CPU reduced from 84.95% → 31.74% (63% reduction)

5. LZMA Decompression Micro-optimizations

Files Modified:

  • RangeCoder.cs
  • RangeCoderBit.cs
  • LzmaDecoder.cs

Added [MethodImpl(MethodImplOptions.AggressiveInlining)] to hot-path methods:

  • RangeCoder.Decoder: GetThreshold, Decode, DecodeBit, DecodeDirectBits
  • BitDecoder.Decode (called millions of times)
  • LzmaDecoder.LenDecoder.Decode
  • LzmaDecoder.Decoder2: DecodeNormal, DecodeWithMatchByte
  • LzmaDecoder.LiteralDecoder: GetState, DecodeNormal, DecodeWithMatchByte

Rationale: These methods are in tight decompression loops; inlining reduces call overhead and enables cross-method JIT optimizations

  • Impact: Reduced LZMA decompression overhead, though fundamental decompression work remains CPU-intensive as expected

Performance Results

  • Skip operation: No longer a bottleneck (was 94.54% CPU)
  • Path processing: 63% reduction in WriteEntryToDirectory overhead
  • Overall extraction: Significantly faster, especially noticeable with large solid archives
  • Hot path: Now dominated by actual LZMA decompression work (unavoidable)

Technical Notes

  • All buffer allocations use ArrayPool<byte> to minimize GC pressure
  • Changes maintain backward compatibility
  • Security validations (path traversal checks) preserved
  • Code formatted with CSharpier per project standards

Testing

Tested with 700MB+ solid 7zip archive with 16MB dictionary on AMD 9800X3D with NVMe Gen 5 drive. Extraction time improved from hours to minutes, matching performance of version 0.41.0.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/adamhathcock/sharpcompress/pull/1163 **Author:** [@julianxhokaxhiu](https://github.com/julianxhokaxhiu) **Created:** 1/27/2026 **Status:** ❌ Closed **Base:** `master` ← **Head:** `fix/lzma-slow` --- ### 📝 Commits (1) - [`bae32c4`](https://github.com/adamhathcock/sharpcompress/commit/bae32c4c26d80f7ae620a10f33a31cf4f9288ee3) Improve skip call CPU hotpath ### 📊 Changes **8 files changed** (+191 additions, -50 deletions) <details> <summary>View changed files</summary> 📝 `src/SharpCompress/Archives/IArchiveEntryExtensions.cs` (+18 -3) 📝 `src/SharpCompress/Common/ExtractionMethods.cs` (+58 -42) 📝 `src/SharpCompress/Compressors/LZMA/LzmaDecoder.cs` (+7 -0) 📝 `src/SharpCompress/Compressors/LZMA/RangeCoder/RangeCoder.cs` (+4 -0) 📝 `src/SharpCompress/Compressors/LZMA/RangeCoder/RangeCoderBit.cs` (+5 -0) 📝 `src/SharpCompress/IO/BufferedSubStream.cs` (+64 -0) 📝 `src/SharpCompress/Polyfills/StreamExtensions.cs` (+28 -2) 📝 `src/SharpCompress/Readers/AbstractReader.cs` (+7 -3) </details> ### 📄 Description This PR is a tentative effort to improve the extraction time for 7z archives, especially ones that are compressed as 1 block solid using 16MB dictionary. The main idea here is to reduce the number of access to cache ( while doing Skip ), improve performance in writing files to the disk ( use 1MB buffer by default ) and aggressively inline codepaths that are reused across executions ( avoiding jumps in the code but instead favoring a more linear execution at the expense of code size ). Feel free to test it and report back. In my own case I can finally see the extraction of files in the archive mentioned in #1105 Below Claude Sonnet 4.5 summary --- ## Performance Optimization Summary for 7Zip Solid Archive Extraction ### Problem 7Zip extraction with large solid archives (16MB dictionaries, 1-block compression) was extremely slow in version 0.42.0+ compared to 0.41.0, taking hours instead of minutes on high-end hardware. ### Root Causes Identified Through Profiling 1. **StreamExtensions.Skip()** consuming 94.54% CPU - byte-by-byte reading via CopyTo(Stream.Null) 2. **Excessive Path.GetFullPath() calls** - Called 3+ times per extracted file 3. **Small I/O buffers** - 80KB buffers insufficient for modern NVMe drives 4. **Method call overhead** - Hot-path LZMA decompression methods not inlined ### Optimizations Implemented #### 1. Skip Operation Optimization **File**: StreamExtensions.cs - Changed from `ReadOnlySubStream + CopyTo(Stream.Null)` (byte-by-byte) to buffered reading - Increased buffer size from implicit default to **1MB** using ArrayPool - Added fast path for `BufferedSubStream` to skip via internal method - **Impact**: Reduced Skip CPU usage from 94.54% → 82.24% → negligible **File**: BufferedSubStream.cs - Added `SkipInternal()` method with 1MB buffer for efficient skipping - Skips cached data instantly, uses large buffered reads for remainder - **Impact**: Eliminates repeated RefillCache() calls when skipping large amounts of data #### 2. Increased I/O Buffer Sizes (80KB → 1MB) **Files Modified**: - IArchiveEntryExtensions.cs - BufferSize constant - AbstractReader.cs - CopyTo/CopyToAsync calls **Rationale**: Modern NVMe Gen 5 drives benefit significantly from larger sequential I/O operations - **Impact**: Better disk I/O throughput, reduced system calls #### 3. FileStream Optimization **File**: IArchiveEntryExtensions.cs - Replaced `File.Open()` with explicit `FileStream` constructor - Specified 1MB buffer size explicitly - Added `useAsync: true` for async operations to enable overlapped I/O on Windows - **Impact**: Better async I/O performance, reduced context switching #### 4. Path Processing Optimization **File**: ExtractionMethods.cs - **Cached `entry.Key`** to avoid multiple property accesses - **Reduced Path.GetFullPath() calls** from 3 per file to 1 by consolidating path operations - **Combined Path.Combine calls**: `Combine(folder, file)` instead of separate operations - **Moved security validation** before filesystem calls to avoid unnecessary I/O - **Impact**: WriteEntryToDirectory CPU reduced from 84.95% → 31.74% (63% reduction) #### 5. LZMA Decompression Micro-optimizations **Files Modified**: - RangeCoder.cs - RangeCoderBit.cs - LzmaDecoder.cs **Added `[MethodImpl(MethodImplOptions.AggressiveInlining)]` to hot-path methods**: - `RangeCoder.Decoder`: GetThreshold, Decode, DecodeBit, DecodeDirectBits - `BitDecoder.Decode` (called millions of times) - `LzmaDecoder.LenDecoder.Decode` - `LzmaDecoder.Decoder2`: DecodeNormal, DecodeWithMatchByte - `LzmaDecoder.LiteralDecoder`: GetState, DecodeNormal, DecodeWithMatchByte **Rationale**: These methods are in tight decompression loops; inlining reduces call overhead and enables cross-method JIT optimizations - **Impact**: Reduced LZMA decompression overhead, though fundamental decompression work remains CPU-intensive as expected ### Performance Results - **Skip operation**: No longer a bottleneck (was 94.54% CPU) - **Path processing**: 63% reduction in WriteEntryToDirectory overhead - **Overall extraction**: Significantly faster, especially noticeable with large solid archives - **Hot path**: Now dominated by actual LZMA decompression work (unavoidable) ### Technical Notes - All buffer allocations use `ArrayPool<byte>` to minimize GC pressure - Changes maintain backward compatibility - Security validations (path traversal checks) preserved - Code formatted with CSharpier per project standards ### Testing Tested with 700MB+ solid 7zip archive with 16MB dictionary on AMD 9800X3D with NVMe Gen 5 drive. Extraction time improved from hours to minutes, matching performance of version 0.41.0. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
claunia added the pull-request label 2026-01-29 22:21:21 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/sharpcompress#1604