mirror of https://github.com/adamhathcock/sharpcompress.git synced 2026-02-03 21:23:38 +00:00

Files

Adam Hathcock 335db1eb9e fix ValueTask struct copying

2026-01-26 18:10:59 +00:00

11 KiB

Raw Permalink Blame History

description, applyTo

description	applyTo
Guidelines for building SharpCompress - A C# compression library	*/.cs

SharpCompress Development

About SharpCompress

SharpCompress is a pure C# compression library supporting multiple archive formats (Zip, Tar, GZip, BZip2, 7Zip, Rar, LZip, XZ, ZStandard) for .NET Framework 4.62, .NET Standard 2.1, .NET 6.0, and .NET 8.0. The library provides both seekable Archive APIs and forward-only Reader/Writer APIs for streaming scenarios.

C# Instructions

Always use the latest version C#, currently C# 13 features.
Write clear and concise comments for each function.
Follow the existing code style and patterns in the codebase.

General Instructions

Agents should NEVER commit to git - Agents should stage files and leave committing to the user. Only create commits when the user explicitly requests them.
Make only high confidence suggestions when reviewing code changes.
Write code with good maintainability practices, including comments on why certain design decisions were made.
Handle edge cases and write clear exception handling.
For libraries or external dependencies, mention their usage and purpose in comments.
Preserve backward compatibility when making changes to public APIs.

Naming Conventions

Follow PascalCase for component names, method names, and public members.
Use camelCase for private fields and local variables.
Prefix interface names with "I" (e.g., IUserService).

Code Formatting

Copilot agents: You MUST run the format task after making code changes to ensure consistency.

Use CSharpier for code formatting to ensure consistent style across the project
CSharpier is configured as a local tool in .config/dotnet-tools.json

Commands

Restore tools (first time only):
```
dotnet tool restore
```
Check if files are formatted correctly (doesn't modify files):
```
dotnet csharpier check .
```
- Exit code 0: All files are properly formatted
- Exit code 1: Some files need formatting (will show which files and differences)
Format files (modifies files):
```
dotnet csharpier format .
```
- Formats all files in the project to match CSharpier style
- Run from project root directory
Configure your IDE to format on save using CSharpier for the best experience

Additional Notes

The project also uses .editorconfig for editor settings (indentation, encoding, etc.)
Let CSharpier handle code style while .editorconfig handles editor behavior
Always run dotnet csharpier check . before committing to verify formatting

Project Setup and Structure

The project targets multiple frameworks: .NET Framework 4.62, .NET Standard 2.1, .NET 6.0, and .NET 8.0
Main library is in src/SharpCompress/
Tests are in tests/SharpCompress.Test/
Performance tests are in tests/SharpCompress.Performance/
Test archives are in tests/TestArchives/
Build project is in build/
Use dotnet build to build the solution
Use dotnet test to run tests
Solution file: SharpCompress.sln

Directory Structure

src/SharpCompress/
  ├── Archives/        # IArchive implementations (Zip, Tar, Rar, 7Zip, GZip)
  ├── Readers/         # IReader implementations (forward-only)
  ├── Writers/         # IWriter implementations (forward-only)
  ├── Compressors/     # Low-level compression streams (BZip2, Deflate, LZMA, etc.)
  ├── Factories/       # Format detection and factory pattern
  ├── Common/          # Shared types (ArchiveType, Entry, Options)
  ├── Crypto/          # Encryption implementations
  └── IO/              # Stream utilities and wrappers

tests/SharpCompress.Test/
  ├── Zip/, Tar/, Rar/, SevenZip/, GZip/, BZip2/  # Format-specific tests
  ├── TestBase.cs      # Base test class with helper methods
  └── TestArchives/    # Test data (not checked into main test project)

Factory Pattern

All format types implement factory interfaces (IArchiveFactory, IReaderFactory, IWriterFactory) for auto-detection:

ReaderFactory.Open() - Auto-detects format by probing stream
WriterFactory.Open() - Creates writer for specified ArchiveType
Factories located in: src/SharpCompress/Factories/

Nullable Reference Types

Declare variables non-nullable, and check for null at entry points.
Always use is null or is not null instead of == null or != null.
Trust the C# null annotations and don't add null checks when the type system says a value cannot be null.

SharpCompress-Specific Guidelines

Supported Formats

SharpCompress supports multiple archive and compression formats:

Archive Formats: Zip, Tar, 7Zip, Rar (read-only)
Compression: DEFLATE, BZip2, LZMA/LZMA2, PPMd, ZStandard (decompress only), Deflate64 (decompress only)
Combined Formats: Tar.GZip, Tar.BZip2, Tar.LZip, Tar.XZ, Tar.ZStandard
See docs/FORMATS.md for complete format support matrix

Stream Handling Rules

Disposal: As of version 0.21, SharpCompress closes wrapped streams by default
Use ReaderOptions or WriterOptions with LeaveStreamOpen = true to control stream disposal
Use NonDisposingStream wrapper when working with compression streams directly to prevent disposal
Always dispose of readers, writers, and archives in using blocks
For forward-only operations, use Reader/Writer APIs; for random access, use Archive APIs

Async/Await Patterns

All I/O operations support async/await with CancellationToken
Async methods follow the naming convention: MethodNameAsync
Key async methods:
- WriteEntryToAsync - Extract entry asynchronously
- WriteAllToDirectoryAsync - Extract all entries asynchronously
- WriteAsync - Write entry asynchronously
- WriteAllAsync - Write directory asynchronously
- OpenEntryStreamAsync - Open entry stream asynchronously
Always provide CancellationToken parameter in async methods

Archive APIs vs Reader/Writer APIs

Archive API: Use for random access with seekable streams (e.g., ZipArchive, TarArchive)
Reader API: Use for forward-only reading on non-seekable streams (e.g., ZipReader, TarReader)
Writer API: Use for forward-only writing on streams (e.g., ZipWriter, TarWriter)
7Zip only supports Archive API due to format limitations

Tar-Specific Considerations

Tar format requires file size in the header
If no size is specified to TarWriter and the stream is not seekable, an exception will be thrown
Tar combined with compression (GZip, BZip2, LZip, XZ) is supported

Zip-Specific Considerations

Supports Zip64 for large files (seekable streams only)
Supports PKWare and WinZip AES encryption
Multiple compression methods: None, Shrink, Reduce, Implode, DEFLATE, Deflate64, BZip2, LZMA, PPMd
Encrypted LZMA is not supported

Performance Considerations

For large files, use Reader/Writer APIs with non-seekable streams to avoid loading entire file in memory
Leverage async I/O for better scalability
Consider compression level trade-offs (speed vs. size)
Use appropriate buffer sizes for stream operations

Testing

Always include test cases for critical paths of the application.
Test with multiple archive formats when making changes to core functionality.
Include tests for both Archive and Reader/Writer APIs when applicable.
Test async operations with cancellation tokens.
Do not emit "Act", "Arrange" or "Assert" comments.
Copy existing style in nearby files for test method names and capitalization.
Use test archives from tests/TestArchives directory for consistency.
Test stream disposal and LeaveStreamOpen behavior.
Test edge cases: empty archives, large files, corrupted archives, encrypted archives.

Test Organization

Base class: TestBase - Provides TEST_ARCHIVES_PATH, SCRATCH_FILES_PATH, temp directory management
Framework: xUnit with AwesomeAssertions
Test archives: tests/TestArchives/ - Use existing archives, don't create new ones unnecessarily
Match naming style of nearby test files

Common Pitfalls

Don't mix Archive and Reader APIs - Archive needs seekable stream, Reader doesn't
Solid archives (Rar, 7Zip) - Use ExtractAllEntries() for best performance, not individual entry extraction
Stream disposal - Always set LeaveStreamOpen explicitly when needed (default is to close)
Tar + non-seekable stream - Must provide file size or it will throw
Format detection - Use ReaderFactory.Open() for auto-detection, test with actual archive files

Async Struct-Copy Bug in LZMA RangeCoder

When implementing async methods on mutable struct types (like BitEncoder and BitDecoder in the LZMA RangeCoder), be aware that the async state machine copies the struct when await is encountered. This means mutations to struct fields after the await point may not persist back to the original struct stored in arrays or fields.

The Bug:

// BAD: async method on mutable struct
public async ValueTask<uint> DecodeAsync(Decoder decoder, CancellationToken cancellationToken = default)
{
    var newBound = (decoder._range >> K_NUM_BIT_MODEL_TOTAL_BITS) * _prob;
    if (decoder._code < newBound)
    {
        decoder._range = newBound;
        _prob += (K_BIT_MODEL_TOTAL - _prob) >> K_NUM_MOVE_BITS;  // Mutates _prob
        await decoder.Normalize2Async(cancellationToken).ConfigureAwait(false);  // Struct gets copied here
        return 0;  // Original _prob update may be lost
    }
    // ...
}

The Fix: Refactor async methods on mutable structs to perform all struct mutations synchronously before any await, or use a helper method to separate the await from the struct mutation:

// GOOD: struct mutations happen synchronously, await is conditional
public ValueTask<uint> DecodeAsync(Decoder decoder, CancellationToken cancellationToken = default)
{
    var newBound = (decoder._range >> K_NUM_BIT_MODEL_TOTAL_BITS) * _prob;
    if (decoder._code < newBound)
    {
        decoder._range = newBound;
        _prob += (K_BIT_MODEL_TOTAL - _prob) >> K_NUM_MOVE_BITS;  // All mutations complete
        return DecodeAsyncHelper(decoder.Normalize2Async(cancellationToken), 0);  // Await in helper
    }
    decoder._range -= newBound;
    decoder._code -= newBound;
    _prob -= (_prob) >> K_NUM_MOVE_BITS;  // All mutations complete
    return DecodeAsyncHelper(decoder.Normalize2Async(cancellationToken), 1);  // Await in helper
}

private static async ValueTask<uint> DecodeAsyncHelper(ValueTask normalizeTask, uint result)
{
    await normalizeTask.ConfigureAwait(false);
    return result;
}

Why This Matters: In LZMA, the BitEncoder and BitDecoder structs maintain adaptive probability models in their _prob field. When these structs are stored in arrays (e.g., _models[m]), the async state machine copy breaks the adaptive model, causing incorrect bit decoding and eventually DataErrorException exceptions.

Related Files:

src/SharpCompress/Compressors/LZMA/RangeCoder/RangeCoderBit.Async.cs - Fixed
src/SharpCompress/Compressors/LZMA/RangeCoder/RangeCoderBitTree.Async.cs - Uses readonly structs, so this pattern doesn't apply

11 KiB Raw Permalink Blame History