Files
sharpcompress/docs/ARCHITECTURE.md
2026-01-15 13:29:57 +00:00

18 KiB

SharpCompress Architecture Guide

This guide explains the internal architecture and design patterns of SharpCompress for contributors.

Overview

SharpCompress is organized into three main layers:

┌─────────────────────────────────────────┐
│     User-Facing APIs (Top Layer)        │
│  Archive, Reader, Writer Factories      │
├─────────────────────────────────────────┤
│     Format-Specific Implementations     │
│  ZipArchive, TarReader, GZipWriter,     │
│  RarArchive, SevenZipArchive, etc.      │
├─────────────────────────────────────────┤
│     Compression & Crypto (Bottom Layer) │
│  Deflate, LZMA, BZip2, AES, CRC32       │
└─────────────────────────────────────────┘

Directory Structure

src/SharpCompress/

Archives/ - Archive Implementations

Contains IArchive implementations for seekable, random-access APIs.

Key Files:

  • AbstractArchive.cs - Base class for all archives
  • IArchive.cs - Archive interface definition
  • ArchiveFactory.cs - Factory for opening archives
  • Format-specific: ZipArchive.cs, TarArchive.cs, RarArchive.cs, SevenZipArchive.cs, GZipArchive.cs

Use Archive API when:

  • Stream is seekable (file, memory)
  • Need random access to entries
  • Archive fits in memory
  • Simplicity is important

Readers/ - Reader Implementations

Contains IReader implementations for forward-only, non-seekable APIs.

Key Files:

  • AbstractReader.cs - Base reader class
  • IReader.cs - Reader interface
  • ReaderFactory.cs - Auto-detection factory
  • ReaderOptions.cs - Configuration for readers
  • Format-specific: ZipReader.cs, TarReader.cs, GZipReader.cs, RarReader.cs, etc.

Use Reader API when:

  • Stream is non-seekable (network, pipe, compressed)
  • Processing large files
  • Memory is limited
  • Forward-only processing is acceptable

Writers/ - Writer Implementations

Contains IWriter implementations for forward-only writing.

Key Files:

  • AbstractWriter.cs - Base writer class
  • IWriter.cs - Writer interface
  • WriterFactory.cs - Factory for creating writers
  • WriterOptions.cs - Configuration for writers
  • Format-specific: ZipWriter.cs, TarWriter.cs, GZipWriter.cs

Factories/ - Format Detection

Factory classes for auto-detecting archive format and creating appropriate readers/writers.

Key Files:

  • Factory.cs - Base factory class
  • IFactory.cs - Factory interface
  • Format-specific: ZipFactory.cs, TarFactory.cs, RarFactory.cs, etc.

How It Works:

  1. ReaderFactory.OpenReader(stream) probes stream signatures
  2. Identifies format by magic bytes
  3. Creates appropriate reader instance
  4. Returns generic IReader interface

Common/ - Shared Types

Common types, options, and enumerations used across formats.

Key Files:

  • IEntry.cs - Entry interface (file within archive)
  • Entry.cs - Entry implementation
  • ArchiveType.cs - Enum for archive formats
  • CompressionType.cs - Enum for compression methods
  • ArchiveEncoding.cs - Character encoding configuration
  • ExtractionOptions.cs - Extraction configuration
  • Format-specific headers: Zip/Headers/, Tar/Headers/, Rar/Headers/, etc.

Compressors/ - Compression Algorithms

Low-level compression streams implementing specific algorithms.

Algorithms:

  • Deflate/ - DEFLATE compression (Zip default)
  • BZip2/ - BZip2 compression
  • LZMA/ - LZMA compression (7Zip, XZ, LZip)
  • PPMd/ - Prediction by Partial Matching (Zip, 7Zip)
  • ZStandard/ - ZStandard compression (decompression only)
  • Xz/ - XZ format (decompression only)
  • Rar/ - RAR-specific unpacking
  • Arj/, Arc/, Ace/ - Legacy format decompression
  • Filters/ - BCJ/BCJ2 filters for executable compression

Each Compressor:

  • Implements a Stream subclass
  • Provides both compression and decompression
  • Some are read-only (decompression only)

Crypto/ - Encryption & Hashing

Cryptographic functions and stream wrappers.

Key Files:

  • Crc32Stream.cs - CRC32 calculation wrapper
  • BlockTransformer.cs - Block cipher transformations
  • AES, PKWare, WinZip encryption implementations

IO/ - Stream Utilities

Stream wrappers and utilities.

Key Classes:

  • SharpCompressStream - Base stream class
  • ProgressReportingStream - Progress tracking wrapper
  • MarkingBinaryReader - Binary reader with position marks
  • BufferedSubStream - Buffered read-only substream
  • ReadOnlySubStream - Read-only view of parent stream
  • NonDisposingStream - Prevents wrapped stream disposal

Design Patterns

1. Factory Pattern

Purpose: Auto-detect format and create appropriate reader/writer.

Example:

// User calls factory
using (var reader = ReaderFactory.OpenReader(stream))  // Returns IReader
{
    while (reader.MoveToNextEntry())
    {
        // Process entry
    }
}

// Behind the scenes:
// 1. Factory.Open() probes stream signatures
// 2. Detects format (Zip, Tar, Rar, etc.)
// 3. Creates appropriate reader (ZipReader, TarReader, etc.)
// 4. Returns as generic IReader interface

Files:

  • src/SharpCompress/Factories/ReaderFactory.cs
  • src/SharpCompress/Factories/WriterFactory.cs
  • src/SharpCompress/Factories/ArchiveFactory.cs

2. Strategy Pattern

Purpose: Encapsulate compression algorithms as swappable strategies.

Example:

// Different compression strategies
CompressionType.Deflate     // DEFLATE
CompressionType.BZip2       // BZip2
CompressionType.LZMA        // LZMA
CompressionType.PPMd        // PPMd

// Writer uses strategy pattern
var archive = ZipArchive.CreateArchive();
archive.SaveTo("output.zip", CompressionType.Deflate);   // Use Deflate
archive.SaveTo("output.bz2", CompressionType.BZip2);    // Use BZip2

Files:

  • src/SharpCompress/Compressors/ - Strategy implementations

3. Decorator Pattern

Purpose: Wrap streams with additional functionality.

Example:

// Progress reporting decorator
var progressStream = new ProgressReportingStream(baseStream, progressReporter);
progressStream.Read(buffer, 0, buffer.Length);  // Reports progress

// Non-disposing decorator
var nonDisposingStream = new NonDisposingStream(baseStream);
using (var compressor = new DeflateStream(nonDisposingStream))
{
    // baseStream won't be disposed when compressor is disposed
}

Files:

  • src/SharpCompress/IO/ProgressReportingStream.cs
  • src/SharpCompress/IO/NonDisposingStream.cs

4. Template Method Pattern

Purpose: Define algorithm skeleton in base class, let subclasses fill details.

Example:

// AbstractArchive defines common archive operations
public abstract class AbstractArchive : IArchive
{
    // Template methods
    public virtual void WriteToDirectory(string destinationDirectory, ExtractionOptions options)
    {
        // Common extraction logic
        foreach (var entry in Entries)
        {
            // Call subclass method
            entry.WriteToFile(destinationPath, options);
        }
    }
    
    // Subclasses override format-specific details
    protected abstract Entry CreateEntry(EntryData data);
}

Files:

  • src/SharpCompress/Archives/AbstractArchive.cs
  • src/SharpCompress/Readers/AbstractReader.cs

5. Iterator Pattern

Purpose: Provide sequential access to entries.

Example:

// Archive API - provides collection
IEnumerable<IEntry> entries = archive.Entries;
foreach (var entry in entries)
{
    // Random access - entries already in memory
}

// Reader API - provides iterator
IReader reader = ReaderFactory.OpenReader(stream);
while (reader.MoveToNextEntry())
{
    // Forward-only iteration - one entry at a time
    var entry = reader.Entry;
}

Key Interfaces

IArchive - Random Access API

public interface IArchive : IDisposable
{
    IEnumerable<IEntry> Entries { get; }
    
    void WriteToDirectory(string destinationDirectory, 
                         ExtractionOptions options = null);
    
    IEntry FirstOrDefault(Func<IEntry, bool> predicate);
    
    // ... format-specific methods
}

Implementations: ZipArchive, TarArchive, RarArchive, SevenZipArchive, GZipArchive

IReader - Forward-Only API

public interface IReader : IDisposable
{
    IEntry Entry { get; }
    
    bool MoveToNextEntry();
    
    void WriteEntryToDirectory(string destinationDirectory,
                              ExtractionOptions options = null);
    
    Stream OpenEntryStream();
    
    // ... async variants
}

Implementations: ZipReader, TarReader, RarReader, GZipReader, etc.

IWriter - Writing API

public interface IWriter : IDisposable
{
    void Write(string entryPath, Stream source, 
              DateTime? modificationTime = null);
    
    void WriteAll(string sourceDirectory, string searchPattern,
                 SearchOption searchOption);
    
    // ... async variants
}

Implementations: ZipWriter, TarWriter, GZipWriter

IEntry - Archive Entry

public interface IEntry
{
    string Key { get; }
    uint Size { get; }
    uint CompressedSize { get; }
    bool IsDirectory { get; }
    DateTime? LastModifiedTime { get; }
    CompressionType CompressionType { get; }
    
    void WriteToFile(string fullPath, ExtractionOptions options = null);
    void WriteToStream(Stream destinationStream);
    Stream OpenEntryStream();
    
    // ... async variants
}

Adding Support for a New Format

Step 1: Understand the Format

  • Research format specification
  • Understand compression/encryption used
  • Study existing similar formats in codebase

Step 2: Create Format Structure Classes

Create: src/SharpCompress/Common/NewFormat/

// Headers and data structures
public class NewFormatHeader
{
    public uint Magic { get; set; }
    public ushort Version { get; set; }
    // ... other fields
    
    public static NewFormatHeader Read(BinaryReader reader)
    {
        // Deserialize from binary
    }
}

public class NewFormatEntry
{
    public string FileName { get; set; }
    public uint CompressedSize { get; set; }
    public uint UncompressedSize { get; set; }
    // ... other fields
}

Step 3: Create Archive Implementation

Create: src/SharpCompress/Archives/NewFormat/NewFormatArchive.cs

public class NewFormatArchive : AbstractArchive
{
    private NewFormatHeader _header;
    private List<NewFormatEntry> _entries;
    
    public static NewFormatArchive OpenArchive(Stream stream)
    {
        var archive = new NewFormatArchive();
        archive._header = NewFormatHeader.Read(stream);
        archive.LoadEntries(stream);
        return archive;
    }
    
    public override IEnumerable<IEntry> Entries => _entries.Select(e => new Entry(e));
    
    protected override Stream OpenEntryStream(Entry entry)
    {
        // Return decompressed stream for entry
    }
    
    // ... other abstract method implementations
}

Step 4: Create Reader Implementation

Create: src/SharpCompress/Readers/NewFormat/NewFormatReader.cs

public class NewFormatReader : AbstractReader
{
    private NewFormatHeader _header;
    private BinaryReader _reader;
    
    public NewFormatReader(Stream stream)
    {
        _reader = new BinaryReader(stream);
        _header = NewFormatHeader.Read(_reader);
    }
    
    public override bool MoveToNextEntry()
    {
        // Read next entry header
        if (!_reader.BaseStream.CanRead) return false;
        
        var entryData = NewFormatEntry.Read(_reader);
        // ... set this.Entry
        return entryData != null;
    }
    
    // ... other abstract method implementations
}

Step 5: Create Factory

Create: src/SharpCompress/Factories/NewFormatFactory.cs

public class NewFormatFactory : Factory, IArchiveFactory, IReaderFactory
{
    // Archive format magic bytes (signature)
    private static readonly byte[] NewFormatSignature = new byte[] { 0x4E, 0x46 };  // "NF"
    
    public static NewFormatFactory Instance { get; } = new();
    
    public IArchive CreateArchive(Stream stream)
        => NewFormatArchive.OpenArchive(stream);
    
    public IReader CreateReader(Stream stream, ReaderOptions options)
        => new NewFormatReader(stream) { Options = options };
    
    public bool Matches(Stream stream, ReadOnlySpan<byte> signature) 
        => signature.StartsWith(NewFormatSignature);
}

Step 6: Register Factory

Update: src/SharpCompress/Factories/ArchiveFactory.cs

private static readonly IFactory[] Factories = 
{
    ZipFactory.Instance,
    TarFactory.Instance,
    RarFactory.Instance,
    SevenZipFactory.Instance,
    GZipFactory.Instance,
    NewFormatFactory.Instance,  // Add here
    // ... other factories
};

Step 7: Add Tests

Create: tests/SharpCompress.Test/NewFormat/NewFormatTests.cs

public class NewFormatTests : TestBase
{
    [Fact]
    public void NewFormat_Extracts_Successfully()
    {
        var archivePath = Path.Combine(TEST_ARCHIVES_PATH, "archive.newformat");
        using (var archive = NewFormatArchive.OpenArchive(archivePath))
        {
            archive.WriteToDirectory(SCRATCH_FILES_PATH);
            // Assert extraction
        }
    }
    
    [Fact]
    public void NewFormat_Reader_Works()
    {
        var archivePath = Path.Combine(TEST_ARCHIVES_PATH, "archive.newformat");
        using (var stream = File.OpenRead(archivePath))
        using (var reader = new NewFormatReader(stream))
        {
            Assert.True(reader.MoveToNextEntry());
            Assert.NotNull(reader.Entry);
        }
    }
}

Step 8: Add Test Archives

Place test files in tests/TestArchives/Archives/NewFormat/ directory.

Step 9: Document

Update docs/FORMATS.md with format support information.


Compression Algorithm Implementation

Creating a New Compression Stream

Example: Creating CustomStream for a custom compression algorithm

public class CustomStream : Stream
{
    private readonly Stream _baseStream;
    private readonly bool _leaveOpen;
    
    public CustomStream(Stream baseStream, bool leaveOpen = false)
    {
        _baseStream = baseStream;
        _leaveOpen = leaveOpen;
    }
    
    public override int Read(byte[] buffer, int offset, int count)
    {
        // Decompress data from _baseStream into buffer
        // Return number of decompressed bytes
    }
    
    public override void Write(byte[] buffer, int offset, int count)
    {
        // Compress data from buffer into _baseStream
    }
    
    protected override void Dispose(bool disposing)
    {
        if (disposing && !_leaveOpen)
        {
            _baseStream?.Dispose();
        }
        base.Dispose(disposing);
    }
}

Stream Handling Best Practices

Disposal Pattern

// Correct: Nested using blocks
using (var fileStream = File.OpenRead("archive.zip"))
using (var archive = ZipArchive.OpenArchive(fileStream))
{
    archive.WriteToDirectory(@"C:\output");
}
// Both archive and fileStream properly disposed

// Correct: Using with options
var options = new ReaderOptions { LeaveStreamOpen = true };
var stream = File.OpenRead("archive.zip");
using (var archive = ZipArchive.OpenArchive(stream, options))
{
    archive.WriteToDirectory(@"C:\output");
}
stream.Dispose();  // Manually dispose if LeaveStreamOpen = true

NonDisposingStream Wrapper

// Prevent unwanted stream closure
var baseStream = File.OpenRead("data.bin");
var nonDisposing = new NonDisposingStream(baseStream);

using (var compressor = new DeflateStream(nonDisposing))
{
    // Compressor won't close baseStream when disposed
}

// baseStream still usable
baseStream.Position = 0;  // Works
baseStream.Dispose();  // Manual disposal

Performance Considerations

Memory Efficiency

  1. Avoid loading entire archive in memory - Use Reader API for large files
  2. Process entries sequentially - Especially for solid archives
  3. Use appropriate buffer sizes - Larger buffers for network I/O
  4. Dispose streams promptly - Free resources when done

Algorithm Selection

  1. Archive API - Fast for small archives with random access
  2. Reader API - Efficient for large files or streaming
  3. Solid archives - Sequential extraction much faster
  4. Compression levels - Trade-off between speed and size

Testing Guidelines

Test Coverage

  1. Happy path - Normal extraction works
  2. Edge cases - Empty archives, single file, many files
  3. Corrupted data - Handle gracefully
  4. Error cases - Missing passwords, unsupported compression
  5. Async operations - Both sync and async code paths

Test Archives

  • Use tests/TestArchives/ for test data
  • Create format-specific subdirectories
  • Include encrypted, corrupted, and edge case archives
  • Don't recreate existing archives

Test Patterns

[Fact]
public void Archive_Extraction_Works()
{
    // Arrange
    var testArchive = Path.Combine(TEST_ARCHIVES_PATH, "test.zip");
    
    // Act
    using (var archive = ZipArchive.OpenArchive(testArchive))
    {
        archive.WriteToDirectory(SCRATCH_FILES_PATH);
    }
    
    // Assert
    Assert.True(File.Exists(Path.Combine(SCRATCH_FILES_PATH, "file.txt")));
}