- Remove CONTRIBUTING.md reference from ARCHITECTURE.md - Remove ERRORS.md reference from API.md - Remove TROUBLESHOOTING.md reference from ENCODING.md - Remove TROUBLESHOOTING.md reference from PERFORMANCE.md All markdown files now reference only existing documentation.
18 KiB
SharpCompress Architecture Guide
This guide explains the internal architecture and design patterns of SharpCompress for contributors.
Overview
SharpCompress is organized into three main layers:
┌─────────────────────────────────────────┐
│ User-Facing APIs (Top Layer) │
│ Archive, Reader, Writer Factories │
├─────────────────────────────────────────┤
│ Format-Specific Implementations │
│ ZipArchive, TarReader, GZipWriter, │
│ RarArchive, SevenZipArchive, etc. │
├─────────────────────────────────────────┤
│ Compression & Crypto (Bottom Layer) │
│ Deflate, LZMA, BZip2, AES, CRC32 │
└─────────────────────────────────────────┘
Directory Structure
src/SharpCompress/
Archives/ - Archive Implementations
Contains IArchive implementations for seekable, random-access APIs.
Key Files:
AbstractArchive.cs- Base class for all archivesIArchive.cs- Archive interface definitionArchiveFactory.cs- Factory for opening archives- Format-specific:
ZipArchive.cs,TarArchive.cs,RarArchive.cs,SevenZipArchive.cs,GZipArchive.cs
Use Archive API when:
- Stream is seekable (file, memory)
- Need random access to entries
- Archive fits in memory
- Simplicity is important
Readers/ - Reader Implementations
Contains IReader implementations for forward-only, non-seekable APIs.
Key Files:
AbstractReader.cs- Base reader classIReader.cs- Reader interfaceReaderFactory.cs- Auto-detection factoryReaderOptions.cs- Configuration for readers- Format-specific:
ZipReader.cs,TarReader.cs,GZipReader.cs,RarReader.cs, etc.
Use Reader API when:
- Stream is non-seekable (network, pipe, compressed)
- Processing large files
- Memory is limited
- Forward-only processing is acceptable
Writers/ - Writer Implementations
Contains IWriter implementations for forward-only writing.
Key Files:
AbstractWriter.cs- Base writer classIWriter.cs- Writer interfaceWriterFactory.cs- Factory for creating writersWriterOptions.cs- Configuration for writers- Format-specific:
ZipWriter.cs,TarWriter.cs,GZipWriter.cs
Factories/ - Format Detection
Factory classes for auto-detecting archive format and creating appropriate readers/writers.
Key Files:
Factory.cs- Base factory classIFactory.cs- Factory interface- Format-specific:
ZipFactory.cs,TarFactory.cs,RarFactory.cs, etc.
How It Works:
ReaderFactory.Open(stream)probes stream signatures- Identifies format by magic bytes
- Creates appropriate reader instance
- Returns generic
IReaderinterface
Common/ - Shared Types
Common types, options, and enumerations used across formats.
Key Files:
IEntry.cs- Entry interface (file within archive)Entry.cs- Entry implementationArchiveType.cs- Enum for archive formatsCompressionType.cs- Enum for compression methodsArchiveEncoding.cs- Character encoding configurationExtractionOptions.cs- Extraction configuration- Format-specific headers:
Zip/Headers/,Tar/Headers/,Rar/Headers/, etc.
Compressors/ - Compression Algorithms
Low-level compression streams implementing specific algorithms.
Algorithms:
Deflate/- DEFLATE compression (Zip default)BZip2/- BZip2 compressionLZMA/- LZMA compression (7Zip, XZ, LZip)PPMd/- Prediction by Partial Matching (Zip, 7Zip)ZStandard/- ZStandard compression (decompression only)Xz/- XZ format (decompression only)Rar/- RAR-specific unpackingArj/,Arc/,Ace/- Legacy format decompressionFilters/- BCJ/BCJ2 filters for executable compression
Each Compressor:
- Implements a
Streamsubclass - Provides both compression and decompression
- Some are read-only (decompression only)
Crypto/ - Encryption & Hashing
Cryptographic functions and stream wrappers.
Key Files:
Crc32Stream.cs- CRC32 calculation wrapperBlockTransformer.cs- Block cipher transformations- AES, PKWare, WinZip encryption implementations
IO/ - Stream Utilities
Stream wrappers and utilities.
Key Classes:
SharpCompressStream- Base stream classProgressReportingStream- Progress tracking wrapperMarkingBinaryReader- Binary reader with position marksBufferedSubStream- Buffered read-only substreamReadOnlySubStream- Read-only view of parent streamNonDisposingStream- Prevents wrapped stream disposal
Design Patterns
1. Factory Pattern
Purpose: Auto-detect format and create appropriate reader/writer.
Example:
// User calls factory
using (var reader = ReaderFactory.Open(stream)) // Returns IReader
{
while (reader.MoveToNextEntry())
{
// Process entry
}
}
// Behind the scenes:
// 1. Factory.Open() probes stream signatures
// 2. Detects format (Zip, Tar, Rar, etc.)
// 3. Creates appropriate reader (ZipReader, TarReader, etc.)
// 4. Returns as generic IReader interface
Files:
src/SharpCompress/Factories/ReaderFactory.cssrc/SharpCompress/Factories/WriterFactory.cssrc/SharpCompress/Factories/ArchiveFactory.cs
2. Strategy Pattern
Purpose: Encapsulate compression algorithms as swappable strategies.
Example:
// Different compression strategies
CompressionType.Deflate // DEFLATE
CompressionType.BZip2 // BZip2
CompressionType.LZMA // LZMA
CompressionType.PPMd // PPMd
// Writer uses strategy pattern
var archive = ZipArchive.Create();
archive.SaveTo("output.zip", CompressionType.Deflate); // Use Deflate
archive.SaveTo("output.bz2", CompressionType.BZip2); // Use BZip2
Files:
src/SharpCompress/Compressors/- Strategy implementations
3. Decorator Pattern
Purpose: Wrap streams with additional functionality.
Example:
// Progress reporting decorator
var progressStream = new ProgressReportingStream(baseStream, progressReporter);
progressStream.Read(buffer, 0, buffer.Length); // Reports progress
// Non-disposing decorator
var nonDisposingStream = new NonDisposingStream(baseStream);
using (var compressor = new DeflateStream(nonDisposingStream))
{
// baseStream won't be disposed when compressor is disposed
}
Files:
src/SharpCompress/IO/ProgressReportingStream.cssrc/SharpCompress/IO/NonDisposingStream.cs
4. Template Method Pattern
Purpose: Define algorithm skeleton in base class, let subclasses fill details.
Example:
// AbstractArchive defines common archive operations
public abstract class AbstractArchive : IArchive
{
// Template methods
public virtual void WriteToDirectory(string destinationDirectory, ExtractionOptions options)
{
// Common extraction logic
foreach (var entry in Entries)
{
// Call subclass method
entry.WriteToFile(destinationPath, options);
}
}
// Subclasses override format-specific details
protected abstract Entry CreateEntry(EntryData data);
}
Files:
src/SharpCompress/Archives/AbstractArchive.cssrc/SharpCompress/Readers/AbstractReader.cs
5. Iterator Pattern
Purpose: Provide sequential access to entries.
Example:
// Archive API - provides collection
IEnumerable<IEntry> entries = archive.Entries;
foreach (var entry in entries)
{
// Random access - entries already in memory
}
// Reader API - provides iterator
IReader reader = ReaderFactory.Open(stream);
while (reader.MoveToNextEntry())
{
// Forward-only iteration - one entry at a time
var entry = reader.Entry;
}
Key Interfaces
IArchive - Random Access API
public interface IArchive : IDisposable
{
IEnumerable<IEntry> Entries { get; }
void WriteToDirectory(string destinationDirectory,
ExtractionOptions options = null);
IEntry FirstOrDefault(Func<IEntry, bool> predicate);
// ... format-specific methods
}
Implementations: ZipArchive, TarArchive, RarArchive, SevenZipArchive, GZipArchive
IReader - Forward-Only API
public interface IReader : IDisposable
{
IEntry Entry { get; }
bool MoveToNextEntry();
void WriteEntryToDirectory(string destinationDirectory,
ExtractionOptions options = null);
Stream OpenEntryStream();
// ... async variants
}
Implementations: ZipReader, TarReader, RarReader, GZipReader, etc.
IWriter - Writing API
public interface IWriter : IDisposable
{
void Write(string entryPath, Stream source,
DateTime? modificationTime = null);
void WriteAll(string sourceDirectory, string searchPattern,
SearchOption searchOption);
// ... async variants
}
Implementations: ZipWriter, TarWriter, GZipWriter
IEntry - Archive Entry
public interface IEntry
{
string Key { get; }
uint Size { get; }
uint CompressedSize { get; }
bool IsDirectory { get; }
DateTime? LastModifiedTime { get; }
CompressionType CompressionType { get; }
void WriteToFile(string fullPath, ExtractionOptions options = null);
void WriteToStream(Stream destinationStream);
Stream OpenEntryStream();
// ... async variants
}
Adding Support for a New Format
Step 1: Understand the Format
- Research format specification
- Understand compression/encryption used
- Study existing similar formats in codebase
Step 2: Create Format Structure Classes
Create: src/SharpCompress/Common/NewFormat/
// Headers and data structures
public class NewFormatHeader
{
public uint Magic { get; set; }
public ushort Version { get; set; }
// ... other fields
public static NewFormatHeader Read(BinaryReader reader)
{
// Deserialize from binary
}
}
public class NewFormatEntry
{
public string FileName { get; set; }
public uint CompressedSize { get; set; }
public uint UncompressedSize { get; set; }
// ... other fields
}
Step 3: Create Archive Implementation
Create: src/SharpCompress/Archives/NewFormat/NewFormatArchive.cs
public class NewFormatArchive : AbstractArchive
{
private NewFormatHeader _header;
private List<NewFormatEntry> _entries;
public static NewFormatArchive Open(Stream stream)
{
var archive = new NewFormatArchive();
archive._header = NewFormatHeader.Read(stream);
archive.LoadEntries(stream);
return archive;
}
public override IEnumerable<IEntry> Entries => _entries.Select(e => new Entry(e));
protected override Stream OpenEntryStream(Entry entry)
{
// Return decompressed stream for entry
}
// ... other abstract method implementations
}
Step 4: Create Reader Implementation
Create: src/SharpCompress/Readers/NewFormat/NewFormatReader.cs
public class NewFormatReader : AbstractReader
{
private NewFormatHeader _header;
private BinaryReader _reader;
public NewFormatReader(Stream stream)
{
_reader = new BinaryReader(stream);
_header = NewFormatHeader.Read(_reader);
}
public override bool MoveToNextEntry()
{
// Read next entry header
if (!_reader.BaseStream.CanRead) return false;
var entryData = NewFormatEntry.Read(_reader);
// ... set this.Entry
return entryData != null;
}
// ... other abstract method implementations
}
Step 5: Create Factory
Create: src/SharpCompress/Factories/NewFormatFactory.cs
public class NewFormatFactory : Factory, IArchiveFactory, IReaderFactory
{
// Archive format magic bytes (signature)
private static readonly byte[] NewFormatSignature = new byte[] { 0x4E, 0x46 }; // "NF"
public static NewFormatFactory Instance { get; } = new();
public IArchive CreateArchive(Stream stream)
=> NewFormatArchive.Open(stream);
public IReader CreateReader(Stream stream, ReaderOptions options)
=> new NewFormatReader(stream) { Options = options };
public bool Matches(Stream stream, ReadOnlySpan<byte> signature)
=> signature.StartsWith(NewFormatSignature);
}
Step 6: Register Factory
Update: src/SharpCompress/Factories/ArchiveFactory.cs
private static readonly IFactory[] Factories =
{
ZipFactory.Instance,
TarFactory.Instance,
RarFactory.Instance,
SevenZipFactory.Instance,
GZipFactory.Instance,
NewFormatFactory.Instance, // Add here
// ... other factories
};
Step 7: Add Tests
Create: tests/SharpCompress.Test/NewFormat/NewFormatTests.cs
public class NewFormatTests : TestBase
{
[Fact]
public void NewFormat_Extracts_Successfully()
{
var archivePath = Path.Combine(TEST_ARCHIVES_PATH, "archive.newformat");
using (var archive = NewFormatArchive.Open(archivePath))
{
archive.WriteToDirectory(SCRATCH_FILES_PATH);
// Assert extraction
}
}
[Fact]
public void NewFormat_Reader_Works()
{
var archivePath = Path.Combine(TEST_ARCHIVES_PATH, "archive.newformat");
using (var stream = File.OpenRead(archivePath))
using (var reader = new NewFormatReader(stream))
{
Assert.True(reader.MoveToNextEntry());
Assert.NotNull(reader.Entry);
}
}
}
Step 8: Add Test Archives
Place test files in tests/TestArchives/Archives/NewFormat/ directory.
Step 9: Document
Update docs/FORMATS.md with format support information.
Compression Algorithm Implementation
Creating a New Compression Stream
Example: Creating CustomStream for a custom compression algorithm
public class CustomStream : Stream
{
private readonly Stream _baseStream;
private readonly bool _leaveOpen;
public CustomStream(Stream baseStream, bool leaveOpen = false)
{
_baseStream = baseStream;
_leaveOpen = leaveOpen;
}
public override int Read(byte[] buffer, int offset, int count)
{
// Decompress data from _baseStream into buffer
// Return number of decompressed bytes
}
public override void Write(byte[] buffer, int offset, int count)
{
// Compress data from buffer into _baseStream
}
protected override void Dispose(bool disposing)
{
if (disposing && !_leaveOpen)
{
_baseStream?.Dispose();
}
base.Dispose(disposing);
}
}
Stream Handling Best Practices
Disposal Pattern
// Correct: Nested using blocks
using (var fileStream = File.OpenRead("archive.zip"))
using (var archive = ZipArchive.Open(fileStream))
{
archive.WriteToDirectory(@"C:\output");
}
// Both archive and fileStream properly disposed
// Correct: Using with options
var options = new ReaderOptions { LeaveStreamOpen = true };
var stream = File.OpenRead("archive.zip");
using (var archive = ZipArchive.Open(stream, options))
{
archive.WriteToDirectory(@"C:\output");
}
stream.Dispose(); // Manually dispose if LeaveStreamOpen = true
NonDisposingStream Wrapper
// Prevent unwanted stream closure
var baseStream = File.OpenRead("data.bin");
var nonDisposing = new NonDisposingStream(baseStream);
using (var compressor = new DeflateStream(nonDisposing))
{
// Compressor won't close baseStream when disposed
}
// baseStream still usable
baseStream.Position = 0; // Works
baseStream.Dispose(); // Manual disposal
Performance Considerations
Memory Efficiency
- Avoid loading entire archive in memory - Use Reader API for large files
- Process entries sequentially - Especially for solid archives
- Use appropriate buffer sizes - Larger buffers for network I/O
- Dispose streams promptly - Free resources when done
Algorithm Selection
- Archive API - Fast for small archives with random access
- Reader API - Efficient for large files or streaming
- Solid archives - Sequential extraction much faster
- Compression levels - Trade-off between speed and size
Testing Guidelines
Test Coverage
- Happy path - Normal extraction works
- Edge cases - Empty archives, single file, many files
- Corrupted data - Handle gracefully
- Error cases - Missing passwords, unsupported compression
- Async operations - Both sync and async code paths
Test Archives
- Use
tests/TestArchives/for test data - Create format-specific subdirectories
- Include encrypted, corrupted, and edge case archives
- Don't recreate existing archives
Test Patterns
[Fact]
public void Archive_Extraction_Works()
{
// Arrange
var testArchive = Path.Combine(TEST_ARCHIVES_PATH, "test.zip");
// Act
using (var archive = ZipArchive.Open(testArchive))
{
archive.WriteToDirectory(SCRATCH_FILES_PATH);
}
// Assert
Assert.True(File.Exists(Path.Combine(SCRATCH_FILES_PATH, "file.txt")));
}
Related Documentation
- AGENTS.md - Development guidelines
- FORMATS.md - Supported formats