8.6 KiB
SharpCompress AI Agent Instructions
Project Overview
SharpCompress is a pure C# compression library supporting multiple archive formats (Zip, Tar, GZip, BZip2, 7Zip, Rar, LZip, XZ, ZStandard) for .NET Framework 4.8, .NET 8.0, and .NET 10.0. The library provides both seekable Archive APIs and forward-only Reader/Writer APIs for streaming scenarios.
Architecture & Design Patterns
Three-Tier API Design
SharpCompress has three distinct API patterns for different use cases:
-
Archive API (
IArchive) - Random access on seekable streams- Use for: File-based archives where you can seek backward/forward
- Example:
ZipArchive.Open(),TarArchive.Open(),RarArchive.Open() - Located in:
src/SharpCompress/Archives/
-
Reader API (
IReader) - Forward-only on non-seekable streams- Use for: Streaming scenarios (network, pipes) where seeking isn't possible
- Example:
ZipReader.Open(),TarReader.Open(),ReaderFactory.Open() - Located in:
src/SharpCompress/Readers/
-
Writer API (
IWriter) - Forward-only writing- Use for: Creating archives in streaming fashion
- Example:
ZipWriter,TarWriter,WriterFactory.Open() - Located in:
src/SharpCompress/Writers/
Important: 7Zip only supports Archive API due to format design limitations.
Factory Pattern
All format types implement factory interfaces (IArchiveFactory, IReaderFactory, IWriterFactory) for auto-detection:
ReaderFactory.Open()- Auto-detects format by probing streamWriterFactory.Open()- Creates writer for specifiedArchiveType- Factories located in:
src/SharpCompress/Factories/
Stream Disposal Rules (Changed in v0.21)
Critical: SharpCompress closes wrapped streams by default to align with .NET Framework expectations.
- Always use
ReaderOptionsorWriterOptionswithLeaveStreamOpen = trueto prevent disposal - Example:
new ReaderOptions { LeaveStreamOpen = true } - When working with compression streams directly, use
NonDisposingStreamwrapper (if it exists) - Always wrap operations in
usingblocks for proper disposal
Development Workflow
Building & Testing
# Build entire solution
dotnet build SharpCompress.sln
# Build specific framework (library targets: net48, net481, netstandard2.0, net6.0, net8.0)
dotnet build src/SharpCompress/SharpCompress.csproj -f net8.0
# Run tests (targets: net10.0, net48)
dotnet test tests/SharpCompress.Test/SharpCompress.Test.csproj -f net10.0
# Custom build script (Bullseye-based)
dotnet run --project build/build.csproj -- test
Code Formatting (REQUIRED)
# Restore CSharpier tool
dotnet tool restore
# Format code (MUST run before committing)
dotnet csharpier .
# Check formatting
dotnet csharpier check .
Never commit without running dotnet csharpier . from project root.
VS Code Tasks
- Build:
Ctrl+Shift+B(Cmd+Shift+B on Mac) - Test: Use "test" task or F5 to debug tests
- Format: "format" task runs CSharpier
Debugging Features
When building with DEBUG_STREAMS constant (enabled for net10.0 Debug builds):
- Stream operations emit debug information
- Helps trace stream lifecycle and disposal issues
- See
#if DEBUG_STREAMSblocks in stream classes
Code Conventions
Nullable Reference Types
- All variables are non-nullable by default
- Check for
nullonly at entry points (public APIs) - Always use
is nulloris not null(never== nullor!= null) - Trust C# null annotations - don't add redundant null checks
- Extension method:
value.NotNull(nameof(value))validates parameters
Async/Await Patterns
- All I/O operations support async/await with
CancellationToken - Naming: Async methods end with
Asyncsuffix - Key async methods:
WriteEntryToAsync(stream, cancellationToken)WriteAllToDirectoryAsync(path, options, cancellationToken)OpenEntryStreamAsync(cancellationToken)MoveToNextEntryAsync(cancellationToken)
- Always provide
CancellationTokenparameter in new async methods
C# Style
- Use latest C# features (currently C# 14)
- File-scoped namespaces required (
namespace SharpCompress.Archives;) varfor all local variables unless type clarity is critical- Expression-bodied members preferred for simple operations
- Private fields use
_camelCaseprefix (enforced by .editorconfig) - Constants use
CONSTANT_CASE(all caps with underscores)
Testing Patterns
Test Organization
- Base class:
TestBase- ProvidesTEST_ARCHIVES_PATH,SCRATCH_FILES_PATH, temp directory management - Framework: xUnit with AwesomeAssertions
- Test archives:
tests/TestArchives/- Use existing archives, don't create new ones unnecessarily - Never emit "Arrange", "Act", "Assert" comments - code should be self-documenting
- Match naming style of nearby test files
Common Test Patterns
public class MyFormatTests : TestBase
{
[Fact]
public void ExtractTest()
{
var testArchive = Path.Combine(TEST_ARCHIVES_PATH, "test.zip");
using (var archive = ZipArchive.Open(testArchive))
using (var reader = archive.ExtractAllEntries())
{
reader.WriteAllToDirectory(SCRATCH_FILES_PATH,
new ExtractionOptions { ExtractFullPath = true, Overwrite = true });
}
VerifyFiles(); // Compares against ORIGINAL_FILES_PATH
}
}
Critical Test Areas
- Test both Archive and Reader APIs when format supports both
- Test async operations with cancellation tokens
- Test stream disposal behavior (
LeaveStreamOpen) - Test with multiple target frameworks if behavior differs (net10.0 vs net48)
- Edge cases: empty archives, large files, encrypted archives, multi-volume
Format-Specific Knowledge
Tar Considerations
- Tar requires file size in header - If stream is non-seekable and no size provided, TarWriter throws
- Often combined with compression:
.tar.gz,.tar.bz2,.tar.xz,.tar.lz - Long filenames handled via GNU longlink extension
Zip Considerations
- Supports Zip64 for large files (seekable streams only)
- Encryption: PKWare and WinZip AES supported (except encrypted LZMA)
- Compression methods: DEFLATE (default), Deflate64 (read-only), BZip2, LZMA, PPMd, Shrink, Reduce, Implode
- Multi-volume Zip requires
ZipArchive(Reader can't seek across volumes) ZipReaderprocesses LocalEntry headers and intentionally skips DirectoryEntry headers (they're redundant in streaming mode)
Rar Considerations
- Read-only format (proprietary)
- RAR5 decryption supported but CRC check incomplete
- SOLID archives require sequential extraction for performance
7Zip Limitations
- No Reader/Writer API support (format doesn't support streaming)
- Archive API only - requires seekable stream
Project Structure
src/SharpCompress/
├── Archives/ # IArchive implementations (Zip, Tar, Rar, 7Zip, GZip)
├── Readers/ # IReader implementations (forward-only)
├── Writers/ # IWriter implementations (forward-only)
├── Compressors/ # Low-level compression streams (BZip2, Deflate, LZMA, etc.)
├── Factories/ # Format detection and factory pattern
├── Common/ # Shared types (ArchiveType, Entry, Options)
├── Crypto/ # Encryption implementations
└── IO/ # Stream utilities and wrappers
tests/SharpCompress.Test/
├── Zip/, Tar/, Rar/, SevenZip/, GZip/, BZip2/ # Format-specific tests
├── TestBase.cs # Base test class with helper methods
└── TestArchives/ # Test data (not checked into main test project)
Common Pitfalls
- Don't mix Archive and Reader APIs - Archive needs seekable stream, Reader doesn't
- Solid archives (Rar, 7Zip) - Use
ExtractAllEntries()for best performance, not individual entry extraction - Stream disposal - Always set
LeaveStreamOpenexplicitly when needed (default is to close) - Tar + non-seekable stream - Must provide file size or it will throw
- Multi-framework differences - Some features differ between .NET Framework and modern .NET (e.g., Mono.Posix)
- Format detection - Use
ReaderFactory.Open()for auto-detection, test with actual archive files
Performance Considerations
- Use Reader/Writer APIs for large files to avoid loading entire file in memory
- Leverage async I/O for better scalability
- For solid archives (Rar, 7Zip), sequential extraction is significantly faster
- Consider compression level trade-offs when writing (speed vs size)
References
- FORMATS.md - Complete format support matrix
- USAGE.md - API usage examples
- AGENTS.md - Detailed coding conventions