# SharpCompress Architecture Guide This guide explains the internal architecture and design patterns of SharpCompress for contributors. ## Overview SharpCompress is organized into three main layers: ``` ┌─────────────────────────────────────────┐ │ User-Facing APIs (Top Layer) │ │ Archive, Reader, Writer Factories │ ├─────────────────────────────────────────┤ │ Format-Specific Implementations │ │ ZipArchive, TarReader, GZipWriter, │ │ RarArchive, SevenZipArchive, etc. │ ├─────────────────────────────────────────┤ │ Compression & Crypto (Bottom Layer) │ │ Deflate, LZMA, BZip2, AES, CRC32 │ └─────────────────────────────────────────┘ ``` --- ## Directory Structure ### `src/SharpCompress/` #### `Archives/` - Archive Implementations Contains `IArchive` implementations for seekable, random-access APIs. **Key Files:** - `AbstractArchive.cs` - Base class for all archives - `IArchive.cs` - Archive interface definition - `ArchiveFactory.cs` - Factory for opening archives - Format-specific: `ZipArchive.cs`, `TarArchive.cs`, `RarArchive.cs`, `SevenZipArchive.cs`, `GZipArchive.cs` **Use Archive API when:** - Stream is seekable (file, memory) - Need random access to entries - Archive fits in memory - Simplicity is important #### `Readers/` - Reader Implementations Contains `IReader` implementations for forward-only, non-seekable APIs. **Key Files:** - `AbstractReader.cs` - Base reader class - `IReader.cs` - Reader interface - `ReaderFactory.cs` - Auto-detection factory - `ReaderOptions.cs` - Configuration for readers - Format-specific: `ZipReader.cs`, `TarReader.cs`, `GZipReader.cs`, `RarReader.cs`, etc. **Use Reader API when:** - Stream is non-seekable (network, pipe, compressed) - Processing large files - Memory is limited - Forward-only processing is acceptable #### `Writers/` - Writer Implementations Contains `IWriter` implementations for forward-only writing. **Key Files:** - `AbstractWriter.cs` - Base writer class - `IWriter.cs` - Writer interface - `WriterFactory.cs` - Factory for creating writers - `WriterOptions.cs` - Configuration for writers - Format-specific: `ZipWriter.cs`, `TarWriter.cs`, `GZipWriter.cs` #### `Factories/` - Format Detection Factory classes for auto-detecting archive format and creating appropriate readers/writers. **Key Files:** - `Factory.cs` - Base factory class - `IFactory.cs` - Factory interface - Format-specific: `ZipFactory.cs`, `TarFactory.cs`, `RarFactory.cs`, etc. **How It Works:** 1. `ReaderFactory.OpenReader(stream)` probes stream signatures 2. Identifies format by magic bytes 3. Creates appropriate reader instance 4. Returns generic `IReader` interface #### `Common/` - Shared Types Common types, options, and enumerations used across formats. **Key Files:** - `IEntry.cs` - Entry interface (file within archive) - `Entry.cs` - Entry implementation - `ArchiveType.cs` - Enum for archive formats - `CompressionType.cs` - Enum for compression methods - `ArchiveEncoding.cs` - Character encoding configuration - `ExtractionOptions.cs` - Extraction configuration - Format-specific headers: `Zip/Headers/`, `Tar/Headers/`, `Rar/Headers/`, etc. #### `Compressors/` - Compression Algorithms Low-level compression streams implementing specific algorithms. **Algorithms:** - `Deflate/` - DEFLATE compression (Zip default) - `BZip2/` - BZip2 compression - `LZMA/` - LZMA compression (7Zip, XZ, LZip) - `PPMd/` - Prediction by Partial Matching (Zip, 7Zip) - `ZStandard/` - ZStandard compression (decompression only) - `Xz/` - XZ format (decompression only) - `Rar/` - RAR-specific unpacking - `Arj/`, `Arc/`, `Ace/` - Legacy format decompression - `Filters/` - BCJ/BCJ2 filters for executable compression **Each Compressor:** - Implements a `Stream` subclass - Provides both compression and decompression - Some are read-only (decompression only) #### `Crypto/` - Encryption & Hashing Cryptographic functions and stream wrappers. **Key Files:** - `Crc32Stream.cs` - CRC32 calculation wrapper - `BlockTransformer.cs` - Block cipher transformations - AES, PKWare, WinZip encryption implementations #### `IO/` - Stream Utilities Stream wrappers and utilities. **Key Classes:** - `SharpCompressStream` - Base stream class - `ProgressReportingStream` - Progress tracking wrapper - `MarkingBinaryReader` - Binary reader with position marks - `BufferedSubStream` - Buffered read-only substream - `ReadOnlySubStream` - Read-only view of parent stream - `NonDisposingStream` - Prevents wrapped stream disposal --- ## Design Patterns ### 1. Factory Pattern **Purpose:** Auto-detect format and create appropriate reader/writer. **Example:** ```csharp // User calls factory using (var reader = ReaderFactory.OpenReader(stream)) // Returns IReader { while (reader.MoveToNextEntry()) { // Process entry } } // Behind the scenes: // 1. Factory.Open() probes stream signatures // 2. Detects format (Zip, Tar, Rar, etc.) // 3. Creates appropriate reader (ZipReader, TarReader, etc.) // 4. Returns as generic IReader interface ``` **Files:** - `src/SharpCompress/Factories/ReaderFactory.cs` - `src/SharpCompress/Factories/WriterFactory.cs` - `src/SharpCompress/Factories/ArchiveFactory.cs` ### 2. Strategy Pattern **Purpose:** Encapsulate compression algorithms as swappable strategies. **Example:** ```csharp // Different compression strategies CompressionType.Deflate // DEFLATE CompressionType.BZip2 // BZip2 CompressionType.LZMA // LZMA CompressionType.PPMd // PPMd // Writer uses strategy pattern var archive = ZipArchive.CreateArchive(); archive.SaveTo("output.zip", CompressionType.Deflate); // Use Deflate archive.SaveTo("output.bz2", CompressionType.BZip2); // Use BZip2 ``` **Files:** - `src/SharpCompress/Compressors/` - Strategy implementations ### 3. Decorator Pattern **Purpose:** Wrap streams with additional functionality. **Example:** ```csharp // Progress reporting decorator var progressStream = new ProgressReportingStream(baseStream, progressReporter); progressStream.Read(buffer, 0, buffer.Length); // Reports progress // Non-disposing decorator var nonDisposingStream = new NonDisposingStream(baseStream); using (var compressor = new DeflateStream(nonDisposingStream)) { // baseStream won't be disposed when compressor is disposed } ``` **Files:** - `src/SharpCompress/IO/ProgressReportingStream.cs` - `src/SharpCompress/IO/NonDisposingStream.cs` ### 4. Template Method Pattern **Purpose:** Define algorithm skeleton in base class, let subclasses fill details. **Example:** ```csharp // AbstractArchive defines common archive operations public abstract class AbstractArchive : IArchive { // Template methods public virtual void WriteToDirectory(string destinationDirectory, ExtractionOptions options) { // Common extraction logic foreach (var entry in Entries) { // Call subclass method entry.WriteToFile(destinationPath, options); } } // Subclasses override format-specific details protected abstract Entry CreateEntry(EntryData data); } ``` **Files:** - `src/SharpCompress/Archives/AbstractArchive.cs` - `src/SharpCompress/Readers/AbstractReader.cs` ### 5. Iterator Pattern **Purpose:** Provide sequential access to entries. **Example:** ```csharp // Archive API - provides collection IEnumerable entries = archive.Entries; foreach (var entry in entries) { // Random access - entries already in memory } // Reader API - provides iterator IReader reader = ReaderFactory.OpenReader(stream); while (reader.MoveToNextEntry()) { // Forward-only iteration - one entry at a time var entry = reader.Entry; } ``` --- ## Key Interfaces ### IArchive - Random Access API ```csharp public interface IArchive : IDisposable { IEnumerable Entries { get; } void WriteToDirectory(string destinationDirectory, ExtractionOptions options = null); IEntry FirstOrDefault(Func predicate); // ... format-specific methods } ``` **Implementations:** `ZipArchive`, `TarArchive`, `RarArchive`, `SevenZipArchive`, `GZipArchive` ### IReader - Forward-Only API ```csharp public interface IReader : IDisposable { IEntry Entry { get; } bool MoveToNextEntry(); void WriteEntryToDirectory(string destinationDirectory, ExtractionOptions options = null); Stream OpenEntryStream(); // ... async variants } ``` **Implementations:** `ZipReader`, `TarReader`, `RarReader`, `GZipReader`, etc. ### IWriter - Writing API ```csharp public interface IWriter : IDisposable { void Write(string entryPath, Stream source, DateTime? modificationTime = null); void WriteAll(string sourceDirectory, string searchPattern, SearchOption searchOption); // ... async variants } ``` **Implementations:** `ZipWriter`, `TarWriter`, `GZipWriter` ### IEntry - Archive Entry ```csharp public interface IEntry { string Key { get; } uint Size { get; } uint CompressedSize { get; } bool IsDirectory { get; } DateTime? LastModifiedTime { get; } CompressionType CompressionType { get; } void WriteToFile(string fullPath, ExtractionOptions options = null); void WriteToStream(Stream destinationStream); Stream OpenEntryStream(); // ... async variants } ``` --- ## Adding Support for a New Format ### Step 1: Understand the Format - Research format specification - Understand compression/encryption used - Study existing similar formats in codebase ### Step 2: Create Format Structure Classes **Create:** `src/SharpCompress/Common/NewFormat/` ```csharp // Headers and data structures public class NewFormatHeader { public uint Magic { get; set; } public ushort Version { get; set; } // ... other fields public static NewFormatHeader Read(BinaryReader reader) { // Deserialize from binary } } public class NewFormatEntry { public string FileName { get; set; } public uint CompressedSize { get; set; } public uint UncompressedSize { get; set; } // ... other fields } ``` ### Step 3: Create Archive Implementation **Create:** `src/SharpCompress/Archives/NewFormat/NewFormatArchive.cs` ```csharp public class NewFormatArchive : AbstractArchive { private NewFormatHeader _header; private List _entries; public static NewFormatArchive OpenArchive(Stream stream) { var archive = new NewFormatArchive(); archive._header = NewFormatHeader.Read(stream); archive.LoadEntries(stream); return archive; } public override IEnumerable Entries => _entries.Select(e => new Entry(e)); protected override Stream OpenEntryStream(Entry entry) { // Return decompressed stream for entry } // ... other abstract method implementations } ``` ### Step 4: Create Reader Implementation **Create:** `src/SharpCompress/Readers/NewFormat/NewFormatReader.cs` ```csharp public class NewFormatReader : AbstractReader { private NewFormatHeader _header; private BinaryReader _reader; public NewFormatReader(Stream stream) { _reader = new BinaryReader(stream); _header = NewFormatHeader.Read(_reader); } public override bool MoveToNextEntry() { // Read next entry header if (!_reader.BaseStream.CanRead) return false; var entryData = NewFormatEntry.Read(_reader); // ... set this.Entry return entryData != null; } // ... other abstract method implementations } ``` ### Step 5: Create Factory **Create:** `src/SharpCompress/Factories/NewFormatFactory.cs` ```csharp public class NewFormatFactory : Factory, IArchiveFactory, IReaderFactory { // Archive format magic bytes (signature) private static readonly byte[] NewFormatSignature = new byte[] { 0x4E, 0x46 }; // "NF" public static NewFormatFactory Instance { get; } = new(); public IArchive CreateArchive(Stream stream) => NewFormatArchive.OpenArchive(stream); public IReader CreateReader(Stream stream, ReaderOptions options) => new NewFormatReader(stream) { Options = options }; public bool Matches(Stream stream, ReadOnlySpan signature) => signature.StartsWith(NewFormatSignature); } ``` ### Step 6: Register Factory **Update:** `src/SharpCompress/Factories/ArchiveFactory.cs` ```csharp private static readonly IFactory[] Factories = { ZipFactory.Instance, TarFactory.Instance, RarFactory.Instance, SevenZipFactory.Instance, GZipFactory.Instance, NewFormatFactory.Instance, // Add here // ... other factories }; ``` ### Step 7: Add Tests **Create:** `tests/SharpCompress.Test/NewFormat/NewFormatTests.cs` ```csharp public class NewFormatTests : TestBase { [Fact] public void NewFormat_Extracts_Successfully() { var archivePath = Path.Combine(TEST_ARCHIVES_PATH, "archive.newformat"); using (var archive = NewFormatArchive.OpenArchive(archivePath)) { archive.WriteToDirectory(SCRATCH_FILES_PATH); // Assert extraction } } [Fact] public void NewFormat_Reader_Works() { var archivePath = Path.Combine(TEST_ARCHIVES_PATH, "archive.newformat"); using (var stream = File.OpenRead(archivePath)) using (var reader = new NewFormatReader(stream)) { Assert.True(reader.MoveToNextEntry()); Assert.NotNull(reader.Entry); } } } ``` ### Step 8: Add Test Archives Place test files in `tests/TestArchives/Archives/NewFormat/` directory. ### Step 9: Document Update `docs/FORMATS.md` with format support information. --- ## Compression Algorithm Implementation ### Creating a New Compression Stream **Example:** Creating `CustomStream` for a custom compression algorithm ```csharp public class CustomStream : Stream { private readonly Stream _baseStream; private readonly bool _leaveOpen; public CustomStream(Stream baseStream, bool leaveOpen = false) { _baseStream = baseStream; _leaveOpen = leaveOpen; } public override int Read(byte[] buffer, int offset, int count) { // Decompress data from _baseStream into buffer // Return number of decompressed bytes } public override void Write(byte[] buffer, int offset, int count) { // Compress data from buffer into _baseStream } protected override void Dispose(bool disposing) { if (disposing && !_leaveOpen) { _baseStream?.Dispose(); } base.Dispose(disposing); } } ``` --- ## Stream Handling Best Practices ### Disposal Pattern ```csharp // Correct: Nested using blocks using (var fileStream = File.OpenRead("archive.zip")) using (var archive = ZipArchive.OpenArchive(fileStream)) { archive.WriteToDirectory(@"C:\output"); } // Both archive and fileStream properly disposed // Correct: Using with options var options = new ReaderOptions { LeaveStreamOpen = true }; var stream = File.OpenRead("archive.zip"); using (var archive = ZipArchive.OpenArchive(stream, options)) { archive.WriteToDirectory(@"C:\output"); } stream.Dispose(); // Manually dispose if LeaveStreamOpen = true ``` ### NonDisposingStream Wrapper ```csharp // Prevent unwanted stream closure var baseStream = File.OpenRead("data.bin"); var nonDisposing = new NonDisposingStream(baseStream); using (var compressor = new DeflateStream(nonDisposing)) { // Compressor won't close baseStream when disposed } // baseStream still usable baseStream.Position = 0; // Works baseStream.Dispose(); // Manual disposal ``` --- ## Performance Considerations ### Memory Efficiency 1. **Avoid loading entire archive in memory** - Use Reader API for large files 2. **Process entries sequentially** - Especially for solid archives 3. **Use appropriate buffer sizes** - Larger buffers for network I/O 4. **Dispose streams promptly** - Free resources when done ### Algorithm Selection 1. **Archive API** - Fast for small archives with random access 2. **Reader API** - Efficient for large files or streaming 3. **Solid archives** - Sequential extraction much faster 4. **Compression levels** - Trade-off between speed and size --- ## Testing Guidelines ### Test Coverage 1. **Happy path** - Normal extraction works 2. **Edge cases** - Empty archives, single file, many files 3. **Corrupted data** - Handle gracefully 4. **Error cases** - Missing passwords, unsupported compression 5. **Async operations** - Both sync and async code paths ### Test Archives - Use `tests/TestArchives/` for test data - Create format-specific subdirectories - Include encrypted, corrupted, and edge case archives - Don't recreate existing archives ### Test Patterns ```csharp [Fact] public void Archive_Extraction_Works() { // Arrange var testArchive = Path.Combine(TEST_ARCHIVES_PATH, "test.zip"); // Act using (var archive = ZipArchive.OpenArchive(testArchive)) { archive.WriteToDirectory(SCRATCH_FILES_PATH); } // Assert Assert.True(File.Exists(Path.Combine(SCRATCH_FILES_PATH, "file.txt"))); } ``` --- ## Related Documentation - [AGENTS.md](../AGENTS.md) - Development guidelines - [FORMATS.md](FORMATS.md) - Supported formats