starred/sharpcompress

Fork 0

mirror of https://github.com/adamhathcock/sharpcompress.git synced 2026-02-04 05:25:00 +00:00

Files

Adam Hathcock 7800808648 first pass of instructions...consolidate?

2025-12-18 12:30:38 +00:00

8.6 KiB

Raw Blame History

SharpCompress AI Agent Instructions

Project Overview

SharpCompress is a pure C# compression library supporting multiple archive formats (Zip, Tar, GZip, BZip2, 7Zip, Rar, LZip, XZ, ZStandard) for .NET Framework 4.8, .NET 8.0, and .NET 10.0. The library provides both seekable Archive APIs and forward-only Reader/Writer APIs for streaming scenarios.

Architecture & Design Patterns

Three-Tier API Design

SharpCompress has three distinct API patterns for different use cases:

Archive API (IArchive) - Random access on seekable streams
- Use for: File-based archives where you can seek backward/forward
- Example: ZipArchive.Open(), TarArchive.Open(), RarArchive.Open()
- Located in: src/SharpCompress/Archives/
Reader API (IReader) - Forward-only on non-seekable streams
- Use for: Streaming scenarios (network, pipes) where seeking isn't possible
- Example: ZipReader.Open(), TarReader.Open(), ReaderFactory.Open()
- Located in: src/SharpCompress/Readers/
Writer API (IWriter) - Forward-only writing
- Use for: Creating archives in streaming fashion
- Example: ZipWriter, TarWriter, WriterFactory.Open()
- Located in: src/SharpCompress/Writers/

Important: 7Zip only supports Archive API due to format design limitations.

Factory Pattern

All format types implement factory interfaces (IArchiveFactory, IReaderFactory, IWriterFactory) for auto-detection:

ReaderFactory.Open() - Auto-detects format by probing stream
WriterFactory.Open() - Creates writer for specified ArchiveType
Factories located in: src/SharpCompress/Factories/

Stream Disposal Rules (Changed in v0.21)

Critical: SharpCompress closes wrapped streams by default to align with .NET Framework expectations.

Always use ReaderOptions or WriterOptions with LeaveStreamOpen = true to prevent disposal
Example: new ReaderOptions { LeaveStreamOpen = true }
When working with compression streams directly, use NonDisposingStream wrapper (if it exists)
Always wrap operations in using blocks for proper disposal

Development Workflow

Building & Testing

# Build entire solution
dotnet build SharpCompress.sln

# Build specific framework (library targets: net48, net481, netstandard2.0, net6.0, net8.0)
dotnet build src/SharpCompress/SharpCompress.csproj -f net8.0

# Run tests (targets: net10.0, net48)
dotnet test tests/SharpCompress.Test/SharpCompress.Test.csproj -f net10.0

# Custom build script (Bullseye-based)
dotnet run --project build/build.csproj -- test

Code Formatting (REQUIRED)

# Restore CSharpier tool
dotnet tool restore

# Format code (MUST run before committing)
dotnet csharpier .

# Check formatting
dotnet csharpier check .

Never commit without running dotnet csharpier . from project root.

VS Code Tasks

Build: Ctrl+Shift+B (Cmd+Shift+B on Mac)
Test: Use "test" task or F5 to debug tests
Format: "format" task runs CSharpier

Debugging Features

When building with DEBUG_STREAMS constant (enabled for net10.0 Debug builds):

Stream operations emit debug information
Helps trace stream lifecycle and disposal issues
See #if DEBUG_STREAMS blocks in stream classes

Code Conventions

Nullable Reference Types

All variables are non-nullable by default
Check for null only at entry points (public APIs)
Always use is null or is not null (never == null or != null)
Trust C# null annotations - don't add redundant null checks
Extension method: value.NotNull(nameof(value)) validates parameters

Async/Await Patterns

All I/O operations support async/await with CancellationToken
Naming: Async methods end with Async suffix
Key async methods:
- WriteEntryToAsync(stream, cancellationToken)
- WriteAllToDirectoryAsync(path, options, cancellationToken)
- OpenEntryStreamAsync(cancellationToken)
- MoveToNextEntryAsync(cancellationToken)
Always provide CancellationToken parameter in new async methods

C# Style

Use latest C# features (currently C# 14)
File-scoped namespaces required (namespace SharpCompress.Archives;)
var for all local variables unless type clarity is critical
Expression-bodied members preferred for simple operations
Private fields use _camelCase prefix (enforced by .editorconfig)
Constants use CONSTANT_CASE (all caps with underscores)

Testing Patterns

Test Organization

Base class: TestBase - Provides TEST_ARCHIVES_PATH, SCRATCH_FILES_PATH, temp directory management
Framework: xUnit with AwesomeAssertions
Test archives: tests/TestArchives/ - Use existing archives, don't create new ones unnecessarily
Never emit "Arrange", "Act", "Assert" comments - code should be self-documenting
Match naming style of nearby test files

Common Test Patterns

public class MyFormatTests : TestBase
{
    [Fact]
    public void ExtractTest()
    {
        var testArchive = Path.Combine(TEST_ARCHIVES_PATH, "test.zip");
        using (var archive = ZipArchive.Open(testArchive))
        using (var reader = archive.ExtractAllEntries())
        {
            reader.WriteAllToDirectory(SCRATCH_FILES_PATH, 
                new ExtractionOptions { ExtractFullPath = true, Overwrite = true });
        }
        VerifyFiles(); // Compares against ORIGINAL_FILES_PATH
    }
}

Critical Test Areas

Test both Archive and Reader APIs when format supports both
Test async operations with cancellation tokens
Test stream disposal behavior (LeaveStreamOpen)
Test with multiple target frameworks if behavior differs (net10.0 vs net48)
Edge cases: empty archives, large files, encrypted archives, multi-volume

Format-Specific Knowledge

Tar Considerations

Tar requires file size in header - If stream is non-seekable and no size provided, TarWriter throws
Often combined with compression: .tar.gz, .tar.bz2, .tar.xz, .tar.lz
Long filenames handled via GNU longlink extension

Zip Considerations

Supports Zip64 for large files (seekable streams only)
Encryption: PKWare and WinZip AES supported (except encrypted LZMA)
Compression methods: DEFLATE (default), Deflate64 (read-only), BZip2, LZMA, PPMd, Shrink, Reduce, Implode
Multi-volume Zip requires ZipArchive (Reader can't seek across volumes)
ZipReader processes LocalEntry headers and intentionally skips DirectoryEntry headers (they're redundant in streaming mode)

Rar Considerations

Read-only format (proprietary)
RAR5 decryption supported but CRC check incomplete
SOLID archives require sequential extraction for performance

7Zip Limitations

No Reader/Writer API support (format doesn't support streaming)
Archive API only - requires seekable stream

Project Structure

src/SharpCompress/
  ├── Archives/        # IArchive implementations (Zip, Tar, Rar, 7Zip, GZip)
  ├── Readers/         # IReader implementations (forward-only)
  ├── Writers/         # IWriter implementations (forward-only)
  ├── Compressors/     # Low-level compression streams (BZip2, Deflate, LZMA, etc.)
  ├── Factories/       # Format detection and factory pattern
  ├── Common/          # Shared types (ArchiveType, Entry, Options)
  ├── Crypto/          # Encryption implementations
  └── IO/              # Stream utilities and wrappers

tests/SharpCompress.Test/
  ├── Zip/, Tar/, Rar/, SevenZip/, GZip/, BZip2/  # Format-specific tests
  ├── TestBase.cs      # Base test class with helper methods
  └── TestArchives/    # Test data (not checked into main test project)

Common Pitfalls

Don't mix Archive and Reader APIs - Archive needs seekable stream, Reader doesn't
Solid archives (Rar, 7Zip) - Use ExtractAllEntries() for best performance, not individual entry extraction
Stream disposal - Always set LeaveStreamOpen explicitly when needed (default is to close)
Tar + non-seekable stream - Must provide file size or it will throw
Multi-framework differences - Some features differ between .NET Framework and modern .NET (e.g., Mono.Posix)
Format detection - Use ReaderFactory.Open() for auto-detection, test with actual archive files

Performance Considerations

Use Reader/Writer APIs for large files to avoid loading entire file in memory
Leverage async I/O for better scalability
For solid archives (Rar, 7Zip), sequential extraction is significantly faster
Consider compression level trade-offs when writing (speed vs size)

References

FORMATS.md - Complete format support matrix
USAGE.md - API usage examples
AGENTS.md - Detailed coding conventions

8.6 KiB Raw Blame History