mirror of
https://github.com/adamhathcock/sharpcompress.git
synced 2026-02-03 21:23:38 +00:00
Add more documentation
This commit is contained in:
@@ -110,7 +110,7 @@ SharpCompress supports multiple archive and compression formats:
|
||||
- **Archive Formats**: Zip, Tar, 7Zip, Rar (read-only)
|
||||
- **Compression**: DEFLATE, BZip2, LZMA/LZMA2, PPMd, ZStandard (decompress only), Deflate64 (decompress only)
|
||||
- **Combined Formats**: Tar.GZip, Tar.BZip2, Tar.LZip, Tar.XZ, Tar.ZStandard
|
||||
- See FORMATS.md for complete format support matrix
|
||||
- See [docs/FORMATS.md](docs/FORMATS.md) for complete format support matrix
|
||||
|
||||
### Stream Handling Rules
|
||||
- **Disposal**: As of version 0.21, SharpCompress closes wrapped streams by default
|
||||
|
||||
@@ -4,7 +4,7 @@ SharpCompress is a compression library in pure C# for .NET Framework 4.8, .NET 8
|
||||
|
||||
The major feature is support for non-seekable streams so large files can be processed on the fly (i.e. download stream).
|
||||
|
||||
**NEW:** All I/O operations now support async/await for improved performance and scalability. See the [USAGE.md](USAGE.md#async-examples) for examples.
|
||||
**NEW:** All I/O operations now support async/await for improved performance and scalability. See the [USAGE.md](docs/USAGE.md#async-examples) for examples.
|
||||
|
||||
GitHub Actions Build -
|
||||
[](https://github.com/adamhathcock/sharpcompress/actions/workflows/dotnetcore.yml)
|
||||
@@ -14,7 +14,7 @@ GitHub Actions Build -
|
||||
|
||||
Post Issues on Github!
|
||||
|
||||
Check the [Supported Formats](FORMATS.md) and [Basic Usage.](USAGE.md)
|
||||
Check the [Supported Formats](docs/FORMATS.md) and [Basic Usage.](docs/USAGE.md)
|
||||
|
||||
## Recommended Formats
|
||||
|
||||
|
||||
490
docs/API.md
Normal file
490
docs/API.md
Normal file
@@ -0,0 +1,490 @@
|
||||
# API Quick Reference
|
||||
|
||||
Quick reference for commonly used SharpCompress APIs.
|
||||
|
||||
## Factory Methods
|
||||
|
||||
### Opening Archives
|
||||
|
||||
```csharp
|
||||
// Auto-detect format
|
||||
using (var reader = ReaderFactory.Open(stream))
|
||||
{
|
||||
// Works with Zip, Tar, GZip, Rar, 7Zip, etc.
|
||||
}
|
||||
|
||||
// Specific format - Archive API
|
||||
using (var archive = ZipArchive.Open("file.zip"))
|
||||
using (var archive = TarArchive.Open("file.tar"))
|
||||
using (var archive = RarArchive.Open("file.rar"))
|
||||
using (var archive = SevenZipArchive.Open("file.7z"))
|
||||
using (var archive = GZipArchive.Open("file.gz"))
|
||||
|
||||
// With options
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
Password = "password",
|
||||
LeaveStreamOpen = true,
|
||||
ArchiveEncoding = new ArchiveEncoding { Default = Encoding.GetEncoding(932) }
|
||||
};
|
||||
using (var archive = ZipArchive.Open("encrypted.zip", options))
|
||||
```
|
||||
|
||||
### Creating Archives
|
||||
|
||||
```csharp
|
||||
// Writer Factory
|
||||
using (var writer = WriterFactory.Open(stream, ArchiveType.Zip, CompressionType.Deflate))
|
||||
{
|
||||
// Write entries
|
||||
}
|
||||
|
||||
// Specific writer
|
||||
using (var archive = ZipArchive.Create())
|
||||
using (var archive = TarArchive.Create())
|
||||
using (var archive = GZipArchive.Create())
|
||||
|
||||
// With options
|
||||
var options = new WriterOptions(CompressionType.Deflate)
|
||||
{
|
||||
CompressionLevel = 9,
|
||||
LeaveStreamOpen = false
|
||||
};
|
||||
using (var archive = ZipArchive.Create())
|
||||
{
|
||||
archive.SaveTo("output.zip", options);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Archive API Methods
|
||||
|
||||
### Reading/Extracting
|
||||
|
||||
```csharp
|
||||
using (var archive = ZipArchive.Open("file.zip"))
|
||||
{
|
||||
// Get all entries
|
||||
IEnumerable<IEntry> entries = archive.Entries;
|
||||
|
||||
// Find specific entry
|
||||
var entry = archive.Entries.FirstOrDefault(e => e.Key == "file.txt");
|
||||
|
||||
// Extract all
|
||||
archive.WriteToDirectory(@"C:\output", new ExtractionOptions
|
||||
{
|
||||
ExtractFullPath = true,
|
||||
Overwrite = true
|
||||
});
|
||||
|
||||
// Extract single entry
|
||||
var entry = archive.Entries.First();
|
||||
entry.WriteToFile(@"C:\output\file.txt");
|
||||
entry.WriteToFile(@"C:\output\file.txt", new ExtractionOptions { Overwrite = true });
|
||||
|
||||
// Get entry stream
|
||||
using (var stream = entry.OpenEntryStream())
|
||||
{
|
||||
stream.CopyTo(outputStream);
|
||||
}
|
||||
}
|
||||
|
||||
// Async variants
|
||||
await archive.WriteToDirectoryAsync(@"C:\output", options, cancellationToken);
|
||||
using (var stream = await entry.OpenEntryStreamAsync(cancellationToken))
|
||||
{
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
### Entry Properties
|
||||
|
||||
```csharp
|
||||
foreach (var entry in archive.Entries)
|
||||
{
|
||||
string name = entry.Key; // Entry name/path
|
||||
long size = entry.Size; // Uncompressed size
|
||||
long compressedSize = entry.CompressedSize;
|
||||
bool isDir = entry.IsDirectory;
|
||||
DateTime? modTime = entry.LastModifiedTime;
|
||||
CompressionType compression = entry.CompressionType;
|
||||
}
|
||||
```
|
||||
|
||||
### Creating Archives
|
||||
|
||||
```csharp
|
||||
using (var archive = ZipArchive.Create())
|
||||
{
|
||||
// Add file
|
||||
archive.AddEntry("file.txt", "C:\\source\\file.txt");
|
||||
|
||||
// Add multiple files
|
||||
archive.AddAllFromDirectory("C:\\source");
|
||||
archive.AddAllFromDirectory("C:\\source", "*.txt"); // Pattern
|
||||
|
||||
// Save to file
|
||||
archive.SaveTo("output.zip", CompressionType.Deflate);
|
||||
|
||||
// Save to stream
|
||||
archive.SaveTo(outputStream, new WriterOptions(CompressionType.Deflate)
|
||||
{
|
||||
CompressionLevel = 9,
|
||||
LeaveStreamOpen = true
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Reader API Methods
|
||||
|
||||
### Forward-Only Reading
|
||||
|
||||
```csharp
|
||||
using (var stream = File.OpenRead("file.zip"))
|
||||
using (var reader = ReaderFactory.Open(stream))
|
||||
{
|
||||
while (reader.MoveToNextEntry())
|
||||
{
|
||||
IEntry entry = reader.Entry;
|
||||
|
||||
if (!entry.IsDirectory)
|
||||
{
|
||||
// Extract entry
|
||||
reader.WriteEntryToDirectory(@"C:\output");
|
||||
reader.WriteEntryToFile(@"C:\output\file.txt");
|
||||
|
||||
// Or get stream
|
||||
using (var entryStream = reader.OpenEntryStream())
|
||||
{
|
||||
entryStream.CopyTo(outputStream);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Async variants
|
||||
while (await reader.MoveToNextEntryAsync())
|
||||
{
|
||||
await reader.WriteEntryToFileAsync(@"C:\output\" + reader.Entry.Key, cancellationToken);
|
||||
}
|
||||
|
||||
// Async extraction
|
||||
await reader.WriteAllToDirectoryAsync(@"C:\output",
|
||||
new ExtractionOptions { ExtractFullPath = true, Overwrite = true },
|
||||
cancellationToken);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Writer API Methods
|
||||
|
||||
### Creating Archives (Streaming)
|
||||
|
||||
```csharp
|
||||
using (var stream = File.Create("output.zip"))
|
||||
using (var writer = WriterFactory.Open(stream, ArchiveType.Zip, CompressionType.Deflate))
|
||||
{
|
||||
// Write single file
|
||||
using (var fileStream = File.OpenRead("source.txt"))
|
||||
{
|
||||
writer.Write("entry.txt", fileStream, DateTime.Now);
|
||||
}
|
||||
|
||||
// Write directory
|
||||
writer.WriteAll("C:\\source", "*", SearchOption.AllDirectories);
|
||||
writer.WriteAll("C:\\source", "*.txt", SearchOption.TopDirectoryOnly);
|
||||
|
||||
// Async variants
|
||||
using (var fileStream = File.OpenRead("source.txt"))
|
||||
{
|
||||
await writer.WriteAsync("entry.txt", fileStream, DateTime.Now, cancellationToken);
|
||||
}
|
||||
|
||||
await writer.WriteAllAsync("C:\\source", "*", SearchOption.AllDirectories, cancellationToken);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Options
|
||||
|
||||
### ReaderOptions
|
||||
|
||||
```csharp
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
Password = "password", // For encrypted archives
|
||||
LeaveStreamOpen = true, // Don't close wrapped stream
|
||||
ArchiveEncoding = new ArchiveEncoding // Custom character encoding
|
||||
{
|
||||
Default = Encoding.GetEncoding(932)
|
||||
}
|
||||
};
|
||||
using (var archive = ZipArchive.Open("file.zip", options))
|
||||
{
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
### WriterOptions
|
||||
|
||||
```csharp
|
||||
var options = new WriterOptions(CompressionType.Deflate)
|
||||
{
|
||||
CompressionLevel = 9, // 0-9 for Deflate
|
||||
LeaveStreamOpen = true, // Don't close stream
|
||||
};
|
||||
archive.SaveTo("output.zip", options);
|
||||
```
|
||||
|
||||
### ExtractionOptions
|
||||
|
||||
```csharp
|
||||
var options = new ExtractionOptions
|
||||
{
|
||||
ExtractFullPath = true, // Recreate directory structure
|
||||
Overwrite = true, // Overwrite existing files
|
||||
PreserveFileTime = true // Keep original timestamps
|
||||
};
|
||||
archive.WriteToDirectory(@"C:\output", options);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Compression Types
|
||||
|
||||
### Available Compressions
|
||||
|
||||
```csharp
|
||||
// For creating archives
|
||||
CompressionType.None // No compression (store)
|
||||
CompressionType.Deflate // DEFLATE (default for ZIP/GZip)
|
||||
CompressionType.BZip2 // BZip2
|
||||
CompressionType.LZMA // LZMA (for 7Zip, LZip, XZ)
|
||||
CompressionType.PPMd // PPMd (for ZIP)
|
||||
CompressionType.Rar // RAR compression (read-only)
|
||||
|
||||
// For Tar archives
|
||||
// Use CompressionType in TarWriter constructor
|
||||
using (var writer = TarWriter(stream, CompressionType.GZip)) // Tar.GZip
|
||||
using (var writer = TarWriter(stream, CompressionType.BZip2)) // Tar.BZip2
|
||||
```
|
||||
|
||||
### Archive Types
|
||||
|
||||
```csharp
|
||||
ArchiveType.Zip
|
||||
ArchiveType.Tar
|
||||
ArchiveType.GZip
|
||||
ArchiveType.BZip2
|
||||
ArchiveType.Rar
|
||||
ArchiveType.SevenZip
|
||||
ArchiveType.XZ
|
||||
ArchiveType.ZStandard
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Patterns & Examples
|
||||
|
||||
### Extract with Error Handling
|
||||
|
||||
```csharp
|
||||
try
|
||||
{
|
||||
using (var archive = ZipArchive.Open("archive.zip",
|
||||
new ReaderOptions { Password = "password" }))
|
||||
{
|
||||
archive.WriteToDirectory(@"C:\output", new ExtractionOptions
|
||||
{
|
||||
ExtractFullPath = true,
|
||||
Overwrite = true
|
||||
});
|
||||
}
|
||||
}
|
||||
catch (PasswordRequiredException)
|
||||
{
|
||||
Console.WriteLine("Password required");
|
||||
}
|
||||
catch (InvalidArchiveException)
|
||||
{
|
||||
Console.WriteLine("Archive is invalid");
|
||||
}
|
||||
catch (SharpCompressException ex)
|
||||
{
|
||||
Console.WriteLine($"Error: {ex.Message}");
|
||||
}
|
||||
```
|
||||
|
||||
### Extract with Progress
|
||||
|
||||
```csharp
|
||||
var progress = new Progress<ProgressReport>(report =>
|
||||
{
|
||||
Console.WriteLine($"Extracting {report.EntryPath}: {report.PercentComplete}%");
|
||||
});
|
||||
|
||||
var options = new ReaderOptions { Progress = progress };
|
||||
using (var archive = ZipArchive.Open("archive.zip", options))
|
||||
{
|
||||
archive.WriteToDirectory(@"C:\output");
|
||||
}
|
||||
```
|
||||
|
||||
### Async Extract with Cancellation
|
||||
|
||||
```csharp
|
||||
var cts = new CancellationTokenSource();
|
||||
cts.CancelAfter(TimeSpan.FromMinutes(5));
|
||||
|
||||
try
|
||||
{
|
||||
using (var archive = ZipArchive.Open("archive.zip"))
|
||||
{
|
||||
await archive.WriteToDirectoryAsync(@"C:\output",
|
||||
new ExtractionOptions { ExtractFullPath = true, Overwrite = true },
|
||||
cts.Token);
|
||||
}
|
||||
}
|
||||
catch (OperationCanceledException)
|
||||
{
|
||||
Console.WriteLine("Extraction cancelled");
|
||||
}
|
||||
```
|
||||
|
||||
### Create with Custom Compression
|
||||
|
||||
```csharp
|
||||
using (var archive = ZipArchive.Create())
|
||||
{
|
||||
archive.AddAllFromDirectory(@"D:\source");
|
||||
|
||||
// Fastest
|
||||
archive.SaveTo("fast.zip", new WriterOptions(CompressionType.Deflate)
|
||||
{
|
||||
CompressionLevel = 1
|
||||
});
|
||||
|
||||
// Balanced (default)
|
||||
archive.SaveTo("normal.zip", CompressionType.Deflate);
|
||||
|
||||
// Best compression
|
||||
archive.SaveTo("best.zip", new WriterOptions(CompressionType.Deflate)
|
||||
{
|
||||
CompressionLevel = 9
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
### Stream Processing (No File I/O)
|
||||
|
||||
```csharp
|
||||
using (var outputStream = new MemoryStream())
|
||||
using (var archive = ZipArchive.Create())
|
||||
{
|
||||
// Add content from memory
|
||||
using (var contentStream = new MemoryStream(Encoding.UTF8.GetBytes("Hello")))
|
||||
{
|
||||
archive.AddEntry("file.txt", contentStream);
|
||||
}
|
||||
|
||||
// Save to memory
|
||||
archive.SaveTo(outputStream, CompressionType.Deflate);
|
||||
|
||||
// Get bytes
|
||||
byte[] archiveBytes = outputStream.ToArray();
|
||||
}
|
||||
```
|
||||
|
||||
### Extract Specific Files
|
||||
|
||||
```csharp
|
||||
using (var archive = ZipArchive.Open("archive.zip"))
|
||||
{
|
||||
var filesToExtract = new[] { "file1.txt", "file2.txt" };
|
||||
|
||||
foreach (var entry in archive.Entries.Where(e => filesToExtract.Contains(e.Key)))
|
||||
{
|
||||
entry.WriteToFile(@"C:\output\" + entry.Key);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### List Archive Contents
|
||||
|
||||
```csharp
|
||||
using (var archive = ZipArchive.Open("archive.zip"))
|
||||
{
|
||||
foreach (var entry in archive.Entries)
|
||||
{
|
||||
if (entry.IsDirectory)
|
||||
Console.WriteLine($"[DIR] {entry.Key}");
|
||||
else
|
||||
Console.WriteLine($"[FILE] {entry.Key} ({entry.Size} bytes)");
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Mistakes
|
||||
|
||||
### ✗ Wrong - Stream not disposed
|
||||
|
||||
```csharp
|
||||
var stream = File.OpenRead("archive.zip");
|
||||
var archive = ZipArchive.Open(stream);
|
||||
archive.WriteToDirectory(@"C:\output");
|
||||
// stream not disposed - leaked resource
|
||||
```
|
||||
|
||||
### ✓ Correct - Using blocks
|
||||
|
||||
```csharp
|
||||
using (var stream = File.OpenRead("archive.zip"))
|
||||
using (var archive = ZipArchive.Open(stream))
|
||||
{
|
||||
archive.WriteToDirectory(@"C:\output");
|
||||
}
|
||||
// Both properly disposed
|
||||
```
|
||||
|
||||
### ✗ Wrong - Mixing API styles
|
||||
|
||||
```csharp
|
||||
// Loading entire archive then iterating
|
||||
using (var archive = ZipArchive.Open("large.zip"))
|
||||
{
|
||||
var entries = archive.Entries.ToList(); // Loads all in memory
|
||||
foreach (var e in entries)
|
||||
{
|
||||
e.WriteToFile(...); // Then extracts each
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### ✓ Correct - Use Reader for large files
|
||||
|
||||
```csharp
|
||||
// Streaming iteration
|
||||
using (var stream = File.OpenRead("large.zip"))
|
||||
using (var reader = ReaderFactory.Open(stream))
|
||||
{
|
||||
while (reader.MoveToNextEntry())
|
||||
{
|
||||
reader.WriteEntryToDirectory(@"C:\output");
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [USAGE.md](USAGE.md) - Complete code examples
|
||||
- [FORMATS.md](FORMATS.md) - Supported formats
|
||||
- [PERFORMANCE.md](PERFORMANCE.md) - API selection guide
|
||||
- [ERRORS.md](ERRORS.md) - Exception handling
|
||||
660
docs/ARCHITECTURE.md
Normal file
660
docs/ARCHITECTURE.md
Normal file
@@ -0,0 +1,660 @@
|
||||
# SharpCompress Architecture Guide
|
||||
|
||||
This guide explains the internal architecture and design patterns of SharpCompress for contributors.
|
||||
|
||||
## Overview
|
||||
|
||||
SharpCompress is organized into three main layers:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────┐
|
||||
│ User-Facing APIs (Top Layer) │
|
||||
│ Archive, Reader, Writer Factories │
|
||||
├─────────────────────────────────────────┤
|
||||
│ Format-Specific Implementations │
|
||||
│ ZipArchive, TarReader, GZipWriter, │
|
||||
│ RarArchive, SevenZipArchive, etc. │
|
||||
├─────────────────────────────────────────┤
|
||||
│ Compression & Crypto (Bottom Layer) │
|
||||
│ Deflate, LZMA, BZip2, AES, CRC32 │
|
||||
└─────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Directory Structure
|
||||
|
||||
### `src/SharpCompress/`
|
||||
|
||||
#### `Archives/` - Archive Implementations
|
||||
Contains `IArchive` implementations for seekable, random-access APIs.
|
||||
|
||||
**Key Files:**
|
||||
- `AbstractArchive.cs` - Base class for all archives
|
||||
- `IArchive.cs` - Archive interface definition
|
||||
- `ArchiveFactory.cs` - Factory for opening archives
|
||||
- Format-specific: `ZipArchive.cs`, `TarArchive.cs`, `RarArchive.cs`, `SevenZipArchive.cs`, `GZipArchive.cs`
|
||||
|
||||
**Use Archive API when:**
|
||||
- Stream is seekable (file, memory)
|
||||
- Need random access to entries
|
||||
- Archive fits in memory
|
||||
- Simplicity is important
|
||||
|
||||
#### `Readers/` - Reader Implementations
|
||||
Contains `IReader` implementations for forward-only, non-seekable APIs.
|
||||
|
||||
**Key Files:**
|
||||
- `AbstractReader.cs` - Base reader class
|
||||
- `IReader.cs` - Reader interface
|
||||
- `ReaderFactory.cs` - Auto-detection factory
|
||||
- `ReaderOptions.cs` - Configuration for readers
|
||||
- Format-specific: `ZipReader.cs`, `TarReader.cs`, `GZipReader.cs`, `RarReader.cs`, etc.
|
||||
|
||||
**Use Reader API when:**
|
||||
- Stream is non-seekable (network, pipe, compressed)
|
||||
- Processing large files
|
||||
- Memory is limited
|
||||
- Forward-only processing is acceptable
|
||||
|
||||
#### `Writers/` - Writer Implementations
|
||||
Contains `IWriter` implementations for forward-only writing.
|
||||
|
||||
**Key Files:**
|
||||
- `AbstractWriter.cs` - Base writer class
|
||||
- `IWriter.cs` - Writer interface
|
||||
- `WriterFactory.cs` - Factory for creating writers
|
||||
- `WriterOptions.cs` - Configuration for writers
|
||||
- Format-specific: `ZipWriter.cs`, `TarWriter.cs`, `GZipWriter.cs`
|
||||
|
||||
#### `Factories/` - Format Detection
|
||||
Factory classes for auto-detecting archive format and creating appropriate readers/writers.
|
||||
|
||||
**Key Files:**
|
||||
- `Factory.cs` - Base factory class
|
||||
- `IFactory.cs` - Factory interface
|
||||
- Format-specific: `ZipFactory.cs`, `TarFactory.cs`, `RarFactory.cs`, etc.
|
||||
|
||||
**How It Works:**
|
||||
1. `ReaderFactory.Open(stream)` probes stream signatures
|
||||
2. Identifies format by magic bytes
|
||||
3. Creates appropriate reader instance
|
||||
4. Returns generic `IReader` interface
|
||||
|
||||
#### `Common/` - Shared Types
|
||||
Common types, options, and enumerations used across formats.
|
||||
|
||||
**Key Files:**
|
||||
- `IEntry.cs` - Entry interface (file within archive)
|
||||
- `Entry.cs` - Entry implementation
|
||||
- `ArchiveType.cs` - Enum for archive formats
|
||||
- `CompressionType.cs` - Enum for compression methods
|
||||
- `ArchiveEncoding.cs` - Character encoding configuration
|
||||
- `ExtractionOptions.cs` - Extraction configuration
|
||||
- Format-specific headers: `Zip/Headers/`, `Tar/Headers/`, `Rar/Headers/`, etc.
|
||||
|
||||
#### `Compressors/` - Compression Algorithms
|
||||
Low-level compression streams implementing specific algorithms.
|
||||
|
||||
**Algorithms:**
|
||||
- `Deflate/` - DEFLATE compression (Zip default)
|
||||
- `BZip2/` - BZip2 compression
|
||||
- `LZMA/` - LZMA compression (7Zip, XZ, LZip)
|
||||
- `PPMd/` - Prediction by Partial Matching (Zip, 7Zip)
|
||||
- `ZStandard/` - ZStandard compression (decompression only)
|
||||
- `Xz/` - XZ format (decompression only)
|
||||
- `Rar/` - RAR-specific unpacking
|
||||
- `Arj/`, `Arc/`, `Ace/` - Legacy format decompression
|
||||
- `Filters/` - BCJ/BCJ2 filters for executable compression
|
||||
|
||||
**Each Compressor:**
|
||||
- Implements a `Stream` subclass
|
||||
- Provides both compression and decompression
|
||||
- Some are read-only (decompression only)
|
||||
|
||||
#### `Crypto/` - Encryption & Hashing
|
||||
Cryptographic functions and stream wrappers.
|
||||
|
||||
**Key Files:**
|
||||
- `Crc32Stream.cs` - CRC32 calculation wrapper
|
||||
- `BlockTransformer.cs` - Block cipher transformations
|
||||
- AES, PKWare, WinZip encryption implementations
|
||||
|
||||
#### `IO/` - Stream Utilities
|
||||
Stream wrappers and utilities.
|
||||
|
||||
**Key Classes:**
|
||||
- `SharpCompressStream` - Base stream class
|
||||
- `ProgressReportingStream` - Progress tracking wrapper
|
||||
- `MarkingBinaryReader` - Binary reader with position marks
|
||||
- `BufferedSubStream` - Buffered read-only substream
|
||||
- `ReadOnlySubStream` - Read-only view of parent stream
|
||||
- `NonDisposingStream` - Prevents wrapped stream disposal
|
||||
|
||||
---
|
||||
|
||||
## Design Patterns
|
||||
|
||||
### 1. Factory Pattern
|
||||
|
||||
**Purpose:** Auto-detect format and create appropriate reader/writer.
|
||||
|
||||
**Example:**
|
||||
```csharp
|
||||
// User calls factory
|
||||
using (var reader = ReaderFactory.Open(stream)) // Returns IReader
|
||||
{
|
||||
while (reader.MoveToNextEntry())
|
||||
{
|
||||
// Process entry
|
||||
}
|
||||
}
|
||||
|
||||
// Behind the scenes:
|
||||
// 1. Factory.Open() probes stream signatures
|
||||
// 2. Detects format (Zip, Tar, Rar, etc.)
|
||||
// 3. Creates appropriate reader (ZipReader, TarReader, etc.)
|
||||
// 4. Returns as generic IReader interface
|
||||
```
|
||||
|
||||
**Files:**
|
||||
- `src/SharpCompress/Factories/ReaderFactory.cs`
|
||||
- `src/SharpCompress/Factories/WriterFactory.cs`
|
||||
- `src/SharpCompress/Factories/ArchiveFactory.cs`
|
||||
|
||||
### 2. Strategy Pattern
|
||||
|
||||
**Purpose:** Encapsulate compression algorithms as swappable strategies.
|
||||
|
||||
**Example:**
|
||||
```csharp
|
||||
// Different compression strategies
|
||||
CompressionType.Deflate // DEFLATE
|
||||
CompressionType.BZip2 // BZip2
|
||||
CompressionType.LZMA // LZMA
|
||||
CompressionType.PPMd // PPMd
|
||||
|
||||
// Writer uses strategy pattern
|
||||
var archive = ZipArchive.Create();
|
||||
archive.SaveTo("output.zip", CompressionType.Deflate); // Use Deflate
|
||||
archive.SaveTo("output.bz2", CompressionType.BZip2); // Use BZip2
|
||||
```
|
||||
|
||||
**Files:**
|
||||
- `src/SharpCompress/Compressors/` - Strategy implementations
|
||||
|
||||
### 3. Decorator Pattern
|
||||
|
||||
**Purpose:** Wrap streams with additional functionality.
|
||||
|
||||
**Example:**
|
||||
```csharp
|
||||
// Progress reporting decorator
|
||||
var progressStream = new ProgressReportingStream(baseStream, progressReporter);
|
||||
progressStream.Read(buffer, 0, buffer.Length); // Reports progress
|
||||
|
||||
// Non-disposing decorator
|
||||
var nonDisposingStream = new NonDisposingStream(baseStream);
|
||||
using (var compressor = new DeflateStream(nonDisposingStream))
|
||||
{
|
||||
// baseStream won't be disposed when compressor is disposed
|
||||
}
|
||||
```
|
||||
|
||||
**Files:**
|
||||
- `src/SharpCompress/IO/ProgressReportingStream.cs`
|
||||
- `src/SharpCompress/IO/NonDisposingStream.cs`
|
||||
|
||||
### 4. Template Method Pattern
|
||||
|
||||
**Purpose:** Define algorithm skeleton in base class, let subclasses fill details.
|
||||
|
||||
**Example:**
|
||||
```csharp
|
||||
// AbstractArchive defines common archive operations
|
||||
public abstract class AbstractArchive : IArchive
|
||||
{
|
||||
// Template methods
|
||||
public virtual void WriteToDirectory(string destinationDirectory, ExtractionOptions options)
|
||||
{
|
||||
// Common extraction logic
|
||||
foreach (var entry in Entries)
|
||||
{
|
||||
// Call subclass method
|
||||
entry.WriteToFile(destinationPath, options);
|
||||
}
|
||||
}
|
||||
|
||||
// Subclasses override format-specific details
|
||||
protected abstract Entry CreateEntry(EntryData data);
|
||||
}
|
||||
```
|
||||
|
||||
**Files:**
|
||||
- `src/SharpCompress/Archives/AbstractArchive.cs`
|
||||
- `src/SharpCompress/Readers/AbstractReader.cs`
|
||||
|
||||
### 5. Iterator Pattern
|
||||
|
||||
**Purpose:** Provide sequential access to entries.
|
||||
|
||||
**Example:**
|
||||
```csharp
|
||||
// Archive API - provides collection
|
||||
IEnumerable<IEntry> entries = archive.Entries;
|
||||
foreach (var entry in entries)
|
||||
{
|
||||
// Random access - entries already in memory
|
||||
}
|
||||
|
||||
// Reader API - provides iterator
|
||||
IReader reader = ReaderFactory.Open(stream);
|
||||
while (reader.MoveToNextEntry())
|
||||
{
|
||||
// Forward-only iteration - one entry at a time
|
||||
var entry = reader.Entry;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Interfaces
|
||||
|
||||
### IArchive - Random Access API
|
||||
|
||||
```csharp
|
||||
public interface IArchive : IDisposable
|
||||
{
|
||||
IEnumerable<IEntry> Entries { get; }
|
||||
|
||||
void WriteToDirectory(string destinationDirectory,
|
||||
ExtractionOptions options = null);
|
||||
|
||||
IEntry FirstOrDefault(Func<IEntry, bool> predicate);
|
||||
|
||||
// ... format-specific methods
|
||||
}
|
||||
```
|
||||
|
||||
**Implementations:** `ZipArchive`, `TarArchive`, `RarArchive`, `SevenZipArchive`, `GZipArchive`
|
||||
|
||||
### IReader - Forward-Only API
|
||||
|
||||
```csharp
|
||||
public interface IReader : IDisposable
|
||||
{
|
||||
IEntry Entry { get; }
|
||||
|
||||
bool MoveToNextEntry();
|
||||
|
||||
void WriteEntryToDirectory(string destinationDirectory,
|
||||
ExtractionOptions options = null);
|
||||
|
||||
Stream OpenEntryStream();
|
||||
|
||||
// ... async variants
|
||||
}
|
||||
```
|
||||
|
||||
**Implementations:** `ZipReader`, `TarReader`, `RarReader`, `GZipReader`, etc.
|
||||
|
||||
### IWriter - Writing API
|
||||
|
||||
```csharp
|
||||
public interface IWriter : IDisposable
|
||||
{
|
||||
void Write(string entryPath, Stream source,
|
||||
DateTime? modificationTime = null);
|
||||
|
||||
void WriteAll(string sourceDirectory, string searchPattern,
|
||||
SearchOption searchOption);
|
||||
|
||||
// ... async variants
|
||||
}
|
||||
```
|
||||
|
||||
**Implementations:** `ZipWriter`, `TarWriter`, `GZipWriter`
|
||||
|
||||
### IEntry - Archive Entry
|
||||
|
||||
```csharp
|
||||
public interface IEntry
|
||||
{
|
||||
string Key { get; }
|
||||
uint Size { get; }
|
||||
uint CompressedSize { get; }
|
||||
bool IsDirectory { get; }
|
||||
DateTime? LastModifiedTime { get; }
|
||||
CompressionType CompressionType { get; }
|
||||
|
||||
void WriteToFile(string fullPath, ExtractionOptions options = null);
|
||||
void WriteToStream(Stream destinationStream);
|
||||
Stream OpenEntryStream();
|
||||
|
||||
// ... async variants
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Adding Support for a New Format
|
||||
|
||||
### Step 1: Understand the Format
|
||||
- Research format specification
|
||||
- Understand compression/encryption used
|
||||
- Study existing similar formats in codebase
|
||||
|
||||
### Step 2: Create Format Structure Classes
|
||||
|
||||
**Create:** `src/SharpCompress/Common/NewFormat/`
|
||||
|
||||
```csharp
|
||||
// Headers and data structures
|
||||
public class NewFormatHeader
|
||||
{
|
||||
public uint Magic { get; set; }
|
||||
public ushort Version { get; set; }
|
||||
// ... other fields
|
||||
|
||||
public static NewFormatHeader Read(BinaryReader reader)
|
||||
{
|
||||
// Deserialize from binary
|
||||
}
|
||||
}
|
||||
|
||||
public class NewFormatEntry
|
||||
{
|
||||
public string FileName { get; set; }
|
||||
public uint CompressedSize { get; set; }
|
||||
public uint UncompressedSize { get; set; }
|
||||
// ... other fields
|
||||
}
|
||||
```
|
||||
|
||||
### Step 3: Create Archive Implementation
|
||||
|
||||
**Create:** `src/SharpCompress/Archives/NewFormat/NewFormatArchive.cs`
|
||||
|
||||
```csharp
|
||||
public class NewFormatArchive : AbstractArchive
|
||||
{
|
||||
private NewFormatHeader _header;
|
||||
private List<NewFormatEntry> _entries;
|
||||
|
||||
public static NewFormatArchive Open(Stream stream)
|
||||
{
|
||||
var archive = new NewFormatArchive();
|
||||
archive._header = NewFormatHeader.Read(stream);
|
||||
archive.LoadEntries(stream);
|
||||
return archive;
|
||||
}
|
||||
|
||||
public override IEnumerable<IEntry> Entries => _entries.Select(e => new Entry(e));
|
||||
|
||||
protected override Stream OpenEntryStream(Entry entry)
|
||||
{
|
||||
// Return decompressed stream for entry
|
||||
}
|
||||
|
||||
// ... other abstract method implementations
|
||||
}
|
||||
```
|
||||
|
||||
### Step 4: Create Reader Implementation
|
||||
|
||||
**Create:** `src/SharpCompress/Readers/NewFormat/NewFormatReader.cs`
|
||||
|
||||
```csharp
|
||||
public class NewFormatReader : AbstractReader
|
||||
{
|
||||
private NewFormatHeader _header;
|
||||
private BinaryReader _reader;
|
||||
|
||||
public NewFormatReader(Stream stream)
|
||||
{
|
||||
_reader = new BinaryReader(stream);
|
||||
_header = NewFormatHeader.Read(_reader);
|
||||
}
|
||||
|
||||
public override bool MoveToNextEntry()
|
||||
{
|
||||
// Read next entry header
|
||||
if (!_reader.BaseStream.CanRead) return false;
|
||||
|
||||
var entryData = NewFormatEntry.Read(_reader);
|
||||
// ... set this.Entry
|
||||
return entryData != null;
|
||||
}
|
||||
|
||||
// ... other abstract method implementations
|
||||
}
|
||||
```
|
||||
|
||||
### Step 5: Create Factory
|
||||
|
||||
**Create:** `src/SharpCompress/Factories/NewFormatFactory.cs`
|
||||
|
||||
```csharp
|
||||
public class NewFormatFactory : Factory, IArchiveFactory, IReaderFactory
|
||||
{
|
||||
// Archive format magic bytes (signature)
|
||||
private static readonly byte[] NewFormatSignature = new byte[] { 0x4E, 0x46 }; // "NF"
|
||||
|
||||
public static NewFormatFactory Instance { get; } = new();
|
||||
|
||||
public IArchive CreateArchive(Stream stream)
|
||||
=> NewFormatArchive.Open(stream);
|
||||
|
||||
public IReader CreateReader(Stream stream, ReaderOptions options)
|
||||
=> new NewFormatReader(stream) { Options = options };
|
||||
|
||||
public bool Matches(Stream stream, ReadOnlySpan<byte> signature)
|
||||
=> signature.StartsWith(NewFormatSignature);
|
||||
}
|
||||
```
|
||||
|
||||
### Step 6: Register Factory
|
||||
|
||||
**Update:** `src/SharpCompress/Factories/ArchiveFactory.cs`
|
||||
|
||||
```csharp
|
||||
private static readonly IFactory[] Factories =
|
||||
{
|
||||
ZipFactory.Instance,
|
||||
TarFactory.Instance,
|
||||
RarFactory.Instance,
|
||||
SevenZipFactory.Instance,
|
||||
GZipFactory.Instance,
|
||||
NewFormatFactory.Instance, // Add here
|
||||
// ... other factories
|
||||
};
|
||||
```
|
||||
|
||||
### Step 7: Add Tests
|
||||
|
||||
**Create:** `tests/SharpCompress.Test/NewFormat/NewFormatTests.cs`
|
||||
|
||||
```csharp
|
||||
public class NewFormatTests : TestBase
|
||||
{
|
||||
[Fact]
|
||||
public void NewFormat_Extracts_Successfully()
|
||||
{
|
||||
var archivePath = Path.Combine(TEST_ARCHIVES_PATH, "archive.newformat");
|
||||
using (var archive = NewFormatArchive.Open(archivePath))
|
||||
{
|
||||
archive.WriteToDirectory(SCRATCH_FILES_PATH);
|
||||
// Assert extraction
|
||||
}
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void NewFormat_Reader_Works()
|
||||
{
|
||||
var archivePath = Path.Combine(TEST_ARCHIVES_PATH, "archive.newformat");
|
||||
using (var stream = File.OpenRead(archivePath))
|
||||
using (var reader = new NewFormatReader(stream))
|
||||
{
|
||||
Assert.True(reader.MoveToNextEntry());
|
||||
Assert.NotNull(reader.Entry);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Step 8: Add Test Archives
|
||||
|
||||
Place test files in `tests/TestArchives/Archives/NewFormat/` directory.
|
||||
|
||||
### Step 9: Document
|
||||
|
||||
Update `docs/FORMATS.md` with format support information.
|
||||
|
||||
---
|
||||
|
||||
## Compression Algorithm Implementation
|
||||
|
||||
### Creating a New Compression Stream
|
||||
|
||||
**Example:** Creating `CustomStream` for a custom compression algorithm
|
||||
|
||||
```csharp
|
||||
public class CustomStream : Stream
|
||||
{
|
||||
private readonly Stream _baseStream;
|
||||
private readonly bool _leaveOpen;
|
||||
|
||||
public CustomStream(Stream baseStream, bool leaveOpen = false)
|
||||
{
|
||||
_baseStream = baseStream;
|
||||
_leaveOpen = leaveOpen;
|
||||
}
|
||||
|
||||
public override int Read(byte[] buffer, int offset, int count)
|
||||
{
|
||||
// Decompress data from _baseStream into buffer
|
||||
// Return number of decompressed bytes
|
||||
}
|
||||
|
||||
public override void Write(byte[] buffer, int offset, int count)
|
||||
{
|
||||
// Compress data from buffer into _baseStream
|
||||
}
|
||||
|
||||
protected override void Dispose(bool disposing)
|
||||
{
|
||||
if (disposing && !_leaveOpen)
|
||||
{
|
||||
_baseStream?.Dispose();
|
||||
}
|
||||
base.Dispose(disposing);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Stream Handling Best Practices
|
||||
|
||||
### Disposal Pattern
|
||||
|
||||
```csharp
|
||||
// Correct: Nested using blocks
|
||||
using (var fileStream = File.OpenRead("archive.zip"))
|
||||
using (var archive = ZipArchive.Open(fileStream))
|
||||
{
|
||||
archive.WriteToDirectory(@"C:\output");
|
||||
}
|
||||
// Both archive and fileStream properly disposed
|
||||
|
||||
// Correct: Using with options
|
||||
var options = new ReaderOptions { LeaveStreamOpen = true };
|
||||
var stream = File.OpenRead("archive.zip");
|
||||
using (var archive = ZipArchive.Open(stream, options))
|
||||
{
|
||||
archive.WriteToDirectory(@"C:\output");
|
||||
}
|
||||
stream.Dispose(); // Manually dispose if LeaveStreamOpen = true
|
||||
```
|
||||
|
||||
### NonDisposingStream Wrapper
|
||||
|
||||
```csharp
|
||||
// Prevent unwanted stream closure
|
||||
var baseStream = File.OpenRead("data.bin");
|
||||
var nonDisposing = new NonDisposingStream(baseStream);
|
||||
|
||||
using (var compressor = new DeflateStream(nonDisposing))
|
||||
{
|
||||
// Compressor won't close baseStream when disposed
|
||||
}
|
||||
|
||||
// baseStream still usable
|
||||
baseStream.Position = 0; // Works
|
||||
baseStream.Dispose(); // Manual disposal
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Memory Efficiency
|
||||
|
||||
1. **Avoid loading entire archive in memory** - Use Reader API for large files
|
||||
2. **Process entries sequentially** - Especially for solid archives
|
||||
3. **Use appropriate buffer sizes** - Larger buffers for network I/O
|
||||
4. **Dispose streams promptly** - Free resources when done
|
||||
|
||||
### Algorithm Selection
|
||||
|
||||
1. **Archive API** - Fast for small archives with random access
|
||||
2. **Reader API** - Efficient for large files or streaming
|
||||
3. **Solid archives** - Sequential extraction much faster
|
||||
4. **Compression levels** - Trade-off between speed and size
|
||||
|
||||
---
|
||||
|
||||
## Testing Guidelines
|
||||
|
||||
### Test Coverage
|
||||
|
||||
1. **Happy path** - Normal extraction works
|
||||
2. **Edge cases** - Empty archives, single file, many files
|
||||
3. **Corrupted data** - Handle gracefully
|
||||
4. **Error cases** - Missing passwords, unsupported compression
|
||||
5. **Async operations** - Both sync and async code paths
|
||||
|
||||
### Test Archives
|
||||
|
||||
- Use `tests/TestArchives/` for test data
|
||||
- Create format-specific subdirectories
|
||||
- Include encrypted, corrupted, and edge case archives
|
||||
- Don't recreate existing archives
|
||||
|
||||
### Test Patterns
|
||||
|
||||
```csharp
|
||||
[Fact]
|
||||
public void Archive_Extraction_Works()
|
||||
{
|
||||
// Arrange
|
||||
var testArchive = Path.Combine(TEST_ARCHIVES_PATH, "test.zip");
|
||||
|
||||
// Act
|
||||
using (var archive = ZipArchive.Open(testArchive))
|
||||
{
|
||||
archive.WriteToDirectory(SCRATCH_FILES_PATH);
|
||||
}
|
||||
|
||||
// Assert
|
||||
Assert.True(File.Exists(Path.Combine(SCRATCH_FILES_PATH, "file.txt")));
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [CONTRIBUTING.md](CONTRIBUTING.md) - How to contribute
|
||||
- [AGENTS.md](../AGENTS.md) - Development guidelines
|
||||
- [FORMATS.md](FORMATS.md) - Supported formats
|
||||
611
docs/ENCODING.md
Normal file
611
docs/ENCODING.md
Normal file
@@ -0,0 +1,611 @@
|
||||
# SharpCompress Character Encoding Guide
|
||||
|
||||
This guide explains how SharpCompress handles character encoding for archive entries (filenames, comments, etc.).
|
||||
|
||||
## Overview
|
||||
|
||||
Most archive formats store filenames and metadata as bytes. SharpCompress must convert these bytes to strings using the appropriate character encoding.
|
||||
|
||||
**Common Problem:** Archives created on systems with non-UTF8 encodings (especially Japanese, Chinese systems) appear with corrupted filenames when extracted on systems that assume UTF8.
|
||||
|
||||
---
|
||||
|
||||
## ArchiveEncoding Class
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```csharp
|
||||
using SharpCompress.Common;
|
||||
using SharpCompress.Readers;
|
||||
|
||||
// Configure encoding before opening archive
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
ArchiveEncoding = new ArchiveEncoding
|
||||
{
|
||||
Default = Encoding.GetEncoding(932) // cp932 for Japanese
|
||||
}
|
||||
};
|
||||
|
||||
using (var archive = ZipArchive.Open("japanese.zip", options))
|
||||
{
|
||||
foreach (var entry in archive.Entries)
|
||||
{
|
||||
Console.WriteLine(entry.Key); // Now shows correct characters
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### ArchiveEncoding Properties
|
||||
|
||||
| Property | Purpose |
|
||||
|----------|---------|
|
||||
| `Default` | Default encoding for filenames (fallback) |
|
||||
| `CustomDecoder` | Custom decoding function for special cases |
|
||||
|
||||
### Setting for Different APIs
|
||||
|
||||
**Archive API:**
|
||||
```csharp
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
ArchiveEncoding = new ArchiveEncoding { Default = Encoding.GetEncoding(932) }
|
||||
};
|
||||
using (var archive = ZipArchive.Open("file.zip", options))
|
||||
{
|
||||
// Use archive with correct encoding
|
||||
}
|
||||
```
|
||||
|
||||
**Reader API:**
|
||||
```csharp
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
ArchiveEncoding = new ArchiveEncoding { Default = Encoding.GetEncoding(932) }
|
||||
};
|
||||
using (var stream = File.OpenRead("file.zip"))
|
||||
using (var reader = ReaderFactory.Open(stream, options))
|
||||
{
|
||||
while (reader.MoveToNextEntry())
|
||||
{
|
||||
// Filenames decoded correctly
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Encodings
|
||||
|
||||
### Asian Encodings
|
||||
|
||||
#### cp932 (Japanese)
|
||||
```csharp
|
||||
// Windows-31J, Shift-JIS variant used on Japanese Windows
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
ArchiveEncoding = new ArchiveEncoding
|
||||
{
|
||||
Default = Encoding.GetEncoding(932)
|
||||
}
|
||||
};
|
||||
using (var archive = ZipArchive.Open("japanese.zip", options))
|
||||
{
|
||||
// Correctly decodes Japanese filenames
|
||||
}
|
||||
```
|
||||
|
||||
**When to use:**
|
||||
- Archives from Japanese Windows systems
|
||||
- Files with Japanese characters in names
|
||||
|
||||
#### gb2312 (Simplified Chinese)
|
||||
```csharp
|
||||
// Simplified Chinese
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
ArchiveEncoding = new ArchiveEncoding
|
||||
{
|
||||
Default = Encoding.GetEncoding("gb2312")
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
#### gbk (Extended Simplified Chinese)
|
||||
```csharp
|
||||
// Extended Simplified Chinese (more characters than gb2312)
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
ArchiveEncoding = new ArchiveEncoding
|
||||
{
|
||||
Default = Encoding.GetEncoding("gbk")
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
#### big5 (Traditional Chinese)
|
||||
```csharp
|
||||
// Traditional Chinese (Taiwan, Hong Kong)
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
ArchiveEncoding = new ArchiveEncoding
|
||||
{
|
||||
Default = Encoding.GetEncoding("big5")
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
#### euc-jp (Japanese, Unix)
|
||||
```csharp
|
||||
// Extended Unix Code for Japanese
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
ArchiveEncoding = new ArchiveEncoding
|
||||
{
|
||||
Default = Encoding.GetEncoding("eucjp")
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
#### euc-kr (Korean)
|
||||
```csharp
|
||||
// Extended Unix Code for Korean
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
ArchiveEncoding = new ArchiveEncoding
|
||||
{
|
||||
Default = Encoding.GetEncoding("euc-kr")
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
### Western European Encodings
|
||||
|
||||
#### iso-8859-1 (Latin-1)
|
||||
```csharp
|
||||
// Western European (includes accented characters)
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
ArchiveEncoding = new ArchiveEncoding
|
||||
{
|
||||
Default = Encoding.GetEncoding("iso-8859-1")
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
**When to use:**
|
||||
- Archives from French, German, Spanish systems
|
||||
- Files with accented characters (é, ñ, ü, etc.)
|
||||
|
||||
#### cp1252 (Windows-1252)
|
||||
```csharp
|
||||
// Windows Western European
|
||||
// Very similar to iso-8859-1 but with additional printable characters
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
ArchiveEncoding = new ArchiveEncoding
|
||||
{
|
||||
Default = Encoding.GetEncoding("cp1252")
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
**When to use:**
|
||||
- Archives from older Western European Windows systems
|
||||
- Files with smart quotes and other Windows-specific characters
|
||||
|
||||
#### iso-8859-15 (Latin-9)
|
||||
```csharp
|
||||
// Western European with Euro symbol support
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
ArchiveEncoding = new ArchiveEncoding
|
||||
{
|
||||
Default = Encoding.GetEncoding("iso-8859-15")
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
### Cyrillic Encodings
|
||||
|
||||
#### cp1251 (Windows Cyrillic)
|
||||
```csharp
|
||||
// Russian, Serbian, Bulgarian, etc.
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
ArchiveEncoding = new ArchiveEncoding
|
||||
{
|
||||
Default = Encoding.GetEncoding("cp1251")
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
#### koi8-r (KOI8 Russian)
|
||||
```csharp
|
||||
// Russian (Unix standard)
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
ArchiveEncoding = new ArchiveEncoding
|
||||
{
|
||||
Default = Encoding.GetEncoding("koi8-r")
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
### UTF Encodings (Modern)
|
||||
|
||||
#### UTF-8 (Default)
|
||||
```csharp
|
||||
// Modern standard - usually correct for new archives
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
ArchiveEncoding = new ArchiveEncoding
|
||||
{
|
||||
Default = Encoding.UTF8
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
#### UTF-16
|
||||
```csharp
|
||||
// Unicode - rarely used in archives
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
ArchiveEncoding = new ArchiveEncoding
|
||||
{
|
||||
Default = Encoding.Unicode
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Encoding Auto-Detection
|
||||
|
||||
SharpCompress attempts to auto-detect encoding, but this isn't always reliable:
|
||||
|
||||
```csharp
|
||||
// Auto-detection (default)
|
||||
using (var archive = ZipArchive.Open("file.zip")) // Uses UTF8 by default
|
||||
{
|
||||
// May show corrupted characters if archive uses different encoding
|
||||
}
|
||||
|
||||
// Explicit encoding (more reliable)
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
ArchiveEncoding = new ArchiveEncoding { Default = Encoding.GetEncoding(932) }
|
||||
};
|
||||
using (var archive = ZipArchive.Open("file.zip", options))
|
||||
{
|
||||
// Correct characters displayed
|
||||
}
|
||||
```
|
||||
|
||||
### When Manual Override is Needed
|
||||
|
||||
| Situation | Solution |
|
||||
|-----------|----------|
|
||||
| Archive shows corrupted characters | Specify the encoding explicitly |
|
||||
| Archives from specific region | Use that region's encoding |
|
||||
| Mixed encodings in archive | Use CustomDecoder |
|
||||
| Testing with international files | Try different encodings |
|
||||
|
||||
---
|
||||
|
||||
## Custom Decoder
|
||||
|
||||
For complex scenarios where a single encoding isn't sufficient:
|
||||
|
||||
### Basic Custom Decoder
|
||||
|
||||
```csharp
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
ArchiveEncoding = new ArchiveEncoding
|
||||
{
|
||||
CustomDecoder = (data, offset, length) =>
|
||||
{
|
||||
// Custom decoding logic
|
||||
var bytes = new byte[length];
|
||||
Array.Copy(data, offset, bytes, 0, length);
|
||||
|
||||
// Try UTF8 first
|
||||
try
|
||||
{
|
||||
return Encoding.UTF8.GetString(bytes);
|
||||
}
|
||||
catch
|
||||
{
|
||||
// Fallback to cp932 if UTF8 fails
|
||||
return Encoding.GetEncoding(932).GetString(bytes);
|
||||
}
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
using (var archive = ZipArchive.Open("mixed.zip", options))
|
||||
{
|
||||
foreach (var entry in archive.Entries)
|
||||
{
|
||||
Console.WriteLine(entry.Key); // Uses custom decoder
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Advanced: Detect Encoding by Content
|
||||
|
||||
```csharp
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
ArchiveEncoding = new ArchiveEncoding
|
||||
{
|
||||
CustomDecoder = DetectAndDecode
|
||||
}
|
||||
};
|
||||
|
||||
private static string DetectAndDecode(byte[] data, int offset, int length)
|
||||
{
|
||||
var bytes = new byte[length];
|
||||
Array.Copy(data, offset, bytes, 0, length);
|
||||
|
||||
// Try UTF8 (most modern archives)
|
||||
try
|
||||
{
|
||||
var str = Encoding.UTF8.GetString(bytes);
|
||||
// Verify it decoded correctly (no replacement characters)
|
||||
if (!str.Contains('\uFFFD'))
|
||||
return str;
|
||||
}
|
||||
catch { }
|
||||
|
||||
// Try cp932 (Japanese)
|
||||
try
|
||||
{
|
||||
var str = Encoding.GetEncoding(932).GetString(bytes);
|
||||
if (!str.Contains('\uFFFD'))
|
||||
return str;
|
||||
}
|
||||
catch { }
|
||||
|
||||
// Fallback to iso-8859-1 (always succeeds)
|
||||
return Encoding.GetEncoding("iso-8859-1").GetString(bytes);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Code Examples
|
||||
|
||||
### Extract Archive with Japanese Filenames
|
||||
|
||||
```csharp
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
ArchiveEncoding = new ArchiveEncoding
|
||||
{
|
||||
Default = Encoding.GetEncoding(932) // cp932
|
||||
}
|
||||
};
|
||||
|
||||
using (var archive = ZipArchive.Open("japanese_files.zip", options))
|
||||
{
|
||||
archive.WriteToDirectory(@"C:\output", new ExtractionOptions
|
||||
{
|
||||
ExtractFullPath = true,
|
||||
Overwrite = true
|
||||
});
|
||||
}
|
||||
// Files extracted with correct Japanese names
|
||||
```
|
||||
|
||||
### Extract Archive with Western European Filenames
|
||||
|
||||
```csharp
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
ArchiveEncoding = new ArchiveEncoding
|
||||
{
|
||||
Default = Encoding.GetEncoding("iso-8859-1")
|
||||
}
|
||||
};
|
||||
|
||||
using (var archive = ZipArchive.Open("french_files.zip", options))
|
||||
{
|
||||
archive.WriteToDirectory(@"C:\output");
|
||||
}
|
||||
// Accented characters (é, è, ê, etc.) display correctly
|
||||
```
|
||||
|
||||
### Extract Archive with Chinese Filenames
|
||||
|
||||
```csharp
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
ArchiveEncoding = new ArchiveEncoding
|
||||
{
|
||||
Default = Encoding.GetEncoding("gbk") // Simplified Chinese
|
||||
}
|
||||
};
|
||||
|
||||
using (var archive = ZipArchive.Open("chinese_files.zip", options))
|
||||
{
|
||||
archive.WriteToDirectory(@"C:\output");
|
||||
}
|
||||
```
|
||||
|
||||
### Extract Archive with Russian Filenames
|
||||
|
||||
```csharp
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
ArchiveEncoding = new ArchiveEncoding
|
||||
{
|
||||
Default = Encoding.GetEncoding("cp1251") // Windows Cyrillic
|
||||
}
|
||||
};
|
||||
|
||||
using (var archive = ZipArchive.Open("russian_files.zip", options))
|
||||
{
|
||||
archive.WriteToDirectory(@"C:\output");
|
||||
}
|
||||
```
|
||||
|
||||
### Reader API with Encoding
|
||||
|
||||
```csharp
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
ArchiveEncoding = new ArchiveEncoding
|
||||
{
|
||||
Default = Encoding.GetEncoding(932)
|
||||
}
|
||||
};
|
||||
|
||||
using (var stream = File.OpenRead("japanese.zip"))
|
||||
using (var reader = ReaderFactory.Open(stream, options))
|
||||
{
|
||||
while (reader.MoveToNextEntry())
|
||||
{
|
||||
if (!reader.Entry.IsDirectory)
|
||||
{
|
||||
Console.WriteLine(reader.Entry.Key); // Correct characters
|
||||
reader.WriteEntryToDirectory(@"C:\output");
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Creating Archives with Correct Encoding
|
||||
|
||||
When creating archives, SharpCompress uses UTF8 by default (recommended):
|
||||
|
||||
```csharp
|
||||
// Create with UTF8 (default, recommended)
|
||||
using (var archive = ZipArchive.Create())
|
||||
{
|
||||
archive.AddAllFromDirectory(@"D:\my_files");
|
||||
archive.SaveTo("output.zip", CompressionType.Deflate);
|
||||
// Archives created with UTF8 encoding
|
||||
}
|
||||
```
|
||||
|
||||
If you need to create archives for systems that expect specific encodings:
|
||||
|
||||
```csharp
|
||||
// Note: SharpCompress Writer API uses UTF8 for encoding
|
||||
// To create archives with other encodings, consider:
|
||||
// 1. Let users on those systems create archives
|
||||
// 2. Use system tools (7-Zip, WinRAR) with desired encoding
|
||||
// 3. Post-process archives if absolutely necessary
|
||||
|
||||
// For now, recommend modern UTF8-based archives
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting Encoding Issues
|
||||
|
||||
### Filenames Show Question Marks (?)
|
||||
|
||||
```
|
||||
✗ Wrong encoding detected
|
||||
test文件.txt → test???.txt
|
||||
```
|
||||
|
||||
**Solution:** Specify correct encoding explicitly
|
||||
|
||||
```csharp
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
ArchiveEncoding = new ArchiveEncoding
|
||||
{
|
||||
Default = Encoding.GetEncoding("gbk") // Try different encodings
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
### Filenames Show Replacement Character ()
|
||||
|
||||
```
|
||||
✗ Invalid bytes for selected encoding
|
||||
café.txt → caf.txt
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
1. Try a different encoding (see Common Encodings table)
|
||||
2. Use CustomDecoder with fallback encoding
|
||||
3. Archive might be corrupted
|
||||
|
||||
### Mixed Encodings in Single Archive
|
||||
|
||||
```csharp
|
||||
// Use CustomDecoder to handle mixed encodings
|
||||
var options = new ReaderOptions
|
||||
{
|
||||
ArchiveEncoding = new ArchiveEncoding
|
||||
{
|
||||
CustomDecoder = (data, offset, length) =>
|
||||
{
|
||||
// Try multiple encodings in priority order
|
||||
var bytes = new byte[length];
|
||||
Array.Copy(data, offset, bytes, 0, length);
|
||||
|
||||
foreach (var encoding in new[]
|
||||
{
|
||||
Encoding.UTF8,
|
||||
Encoding.GetEncoding(932),
|
||||
Encoding.GetEncoding("iso-8859-1")
|
||||
})
|
||||
{
|
||||
try
|
||||
{
|
||||
var str = encoding.GetString(bytes);
|
||||
if (!str.Contains('\uFFFD'))
|
||||
return str;
|
||||
}
|
||||
catch { }
|
||||
}
|
||||
|
||||
// Final fallback
|
||||
return Encoding.GetEncoding("iso-8859-1").GetString(bytes);
|
||||
}
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Encoding Reference Table
|
||||
|
||||
| Encoding | Code | Use Case |
|
||||
|----------|------|----------|
|
||||
| UTF-8 | (default) | Modern archives, recommended |
|
||||
| cp932 | 932 | Japanese Windows |
|
||||
| gb2312 | "gb2312" | Simplified Chinese |
|
||||
| gbk | "gbk" | Extended Simplified Chinese |
|
||||
| big5 | "big5" | Traditional Chinese |
|
||||
| iso-8859-1 | "iso-8859-1" | Western European |
|
||||
| cp1252 | "cp1252" | Windows Western European |
|
||||
| cp1251 | "cp1251" | Russian/Cyrillic |
|
||||
| euc-jp | "eucjp" | Japanese Unix |
|
||||
| euc-kr | "euc-kr" | Korean |
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use UTF-8 for new archives** - Most modern systems support it
|
||||
2. **Ask the archive creator** - When receiving archives with corrupted names
|
||||
3. **Provide encoding options** - If your app handles user archives
|
||||
4. **Document your assumption** - Tell users what encoding you're using
|
||||
5. **Test with international files** - Before releasing production code
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [TROUBLESHOOTING.md](TROUBLESHOOTING.md#garbled-filenames) - Encoding troubleshooting
|
||||
- [USAGE.md](USAGE.md#extract-zip-which-has-non-utf8-encoded-filenamycp932) - Usage examples
|
||||
557
docs/PERFORMANCE.md
Normal file
557
docs/PERFORMANCE.md
Normal file
@@ -0,0 +1,557 @@
|
||||
# SharpCompress Performance Guide
|
||||
|
||||
This guide helps you optimize SharpCompress for performance in various scenarios.
|
||||
|
||||
## API Selection Guide
|
||||
|
||||
### Archive API vs Reader API
|
||||
|
||||
Choose the right API based on your use case:
|
||||
|
||||
| Aspect | Archive API | Reader API |
|
||||
|--------|------------|-----------|
|
||||
| **Stream Type** | Seekable only | Non-seekable OK |
|
||||
| **Memory Usage** | All entries in memory | One entry at a time |
|
||||
| **Random Access** | ✓ Yes | ✗ No |
|
||||
| **Best For** | Small-to-medium archives | Large or streaming data |
|
||||
| **Performance** | Fast for random access | Better for large files |
|
||||
|
||||
### Archive API (Fast for Random Access)
|
||||
|
||||
```csharp
|
||||
// Use when:
|
||||
// - Archive fits in memory
|
||||
// - You need random access to entries
|
||||
// - Stream is seekable (file, MemoryStream)
|
||||
|
||||
using (var archive = ZipArchive.Open("archive.zip"))
|
||||
{
|
||||
// Random access - all entries available
|
||||
var specific = archive.Entries.FirstOrDefault(e => e.Key == "file.txt");
|
||||
if (specific != null)
|
||||
{
|
||||
specific.WriteToFile(@"C:\output\file.txt");
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Performance Characteristics:**
|
||||
- ✓ Instant entry lookup
|
||||
- ✓ Parallel extraction possible
|
||||
- ✗ Entire archive in memory
|
||||
- ✗ Can't process while downloading
|
||||
|
||||
### Reader API (Best for Large Files)
|
||||
|
||||
```csharp
|
||||
// Use when:
|
||||
// - Processing large archives (>100 MB)
|
||||
// - Streaming from network/pipe
|
||||
// - Memory is constrained
|
||||
// - Forward-only processing is acceptable
|
||||
|
||||
using (var stream = File.OpenRead("large.zip"))
|
||||
using (var reader = ReaderFactory.Open(stream))
|
||||
{
|
||||
while (reader.MoveToNextEntry())
|
||||
{
|
||||
// Process one entry at a time
|
||||
reader.WriteEntryToDirectory(@"C:\output");
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Performance Characteristics:**
|
||||
- ✓ Minimal memory footprint
|
||||
- ✓ Works with non-seekable streams
|
||||
- ✓ Can process while downloading
|
||||
- ✗ Forward-only (no random access)
|
||||
- ✗ Entry lookup requires iteration
|
||||
|
||||
---
|
||||
|
||||
## Buffer Sizing
|
||||
|
||||
### Understanding Buffers
|
||||
|
||||
SharpCompress uses internal buffers for reading compressed data. Buffer size affects:
|
||||
- **Speed:** Larger buffers = fewer I/O operations = faster
|
||||
- **Memory:** Larger buffers = higher memory usage
|
||||
|
||||
### Recommended Buffer Sizes
|
||||
|
||||
| Scenario | Size | Notes |
|
||||
|----------|------|-------|
|
||||
| Embedded/IoT devices | 4-8 KB | Minimal memory usage |
|
||||
| Memory-constrained | 16-32 KB | Conservative default |
|
||||
| Standard use (default) | 64 KB | Recommended default |
|
||||
| Large file streaming | 256 KB | Better throughput |
|
||||
| High-speed SSD | 512 KB - 1 MB | Maximum throughput |
|
||||
|
||||
### How Buffer Size Affects Performance
|
||||
|
||||
```csharp
|
||||
// SharpCompress manages buffers internally
|
||||
// You can't directly set buffer size, but you can:
|
||||
|
||||
// 1. Use Stream.CopyTo with explicit buffer size
|
||||
using (var entryStream = reader.OpenEntryStream())
|
||||
using (var fileStream = File.Create(@"C:\output\file.txt"))
|
||||
{
|
||||
// 64 KB buffer (default)
|
||||
entryStream.CopyTo(fileStream);
|
||||
|
||||
// Or specify larger buffer for faster copy
|
||||
entryStream.CopyTo(fileStream, bufferSize: 262144); // 256 KB
|
||||
}
|
||||
|
||||
// 2. Use custom buffer for writing
|
||||
using (var entryStream = reader.OpenEntryStream())
|
||||
using (var fileStream = File.Create(@"C:\output\file.txt"))
|
||||
{
|
||||
byte[] buffer = new byte[262144]; // 256 KB
|
||||
int bytesRead;
|
||||
while ((bytesRead = entryStream.Read(buffer, 0, buffer.Length)) > 0)
|
||||
{
|
||||
fileStream.Write(buffer, 0, bytesRead);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Streaming Large Files
|
||||
|
||||
### Non-Seekable Stream Patterns
|
||||
|
||||
For processing archives from downloads or pipes:
|
||||
|
||||
```csharp
|
||||
// Download stream (non-seekable)
|
||||
using (var httpStream = await httpClient.GetStreamAsync(url))
|
||||
using (var reader = ReaderFactory.Open(httpStream))
|
||||
{
|
||||
// Process entries as they arrive
|
||||
while (reader.MoveToNextEntry())
|
||||
{
|
||||
if (!reader.Entry.IsDirectory)
|
||||
{
|
||||
reader.WriteEntryToDirectory(@"C:\output");
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Performance Tips:**
|
||||
- Don't try to buffer the entire stream
|
||||
- Process entries immediately
|
||||
- Use async APIs for better responsiveness
|
||||
|
||||
### Download-Then-Extract vs Streaming
|
||||
|
||||
Choose based on your constraints:
|
||||
|
||||
| Approach | When to Use |
|
||||
|----------|------------|
|
||||
| **Download then extract** | Moderate size, need random access |
|
||||
| **Stream during download** | Large files, bandwidth limited, memory constrained |
|
||||
|
||||
```csharp
|
||||
// Download then extract (requires disk space)
|
||||
var archivePath = await DownloadFile(url, @"C:\temp\archive.zip");
|
||||
using (var archive = ZipArchive.Open(archivePath))
|
||||
{
|
||||
archive.WriteToDirectory(@"C:\output");
|
||||
}
|
||||
|
||||
// Stream during download (on-the-fly extraction)
|
||||
using (var httpStream = await httpClient.GetStreamAsync(url))
|
||||
using (var reader = ReaderFactory.Open(httpStream))
|
||||
{
|
||||
while (reader.MoveToNextEntry())
|
||||
{
|
||||
reader.WriteEntryToDirectory(@"C:\output");
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Solid Archive Optimization
|
||||
|
||||
### Why Solid Archives Are Slow
|
||||
|
||||
Solid archives (Rar, 7Zip) group files together in a single compressed stream:
|
||||
|
||||
```
|
||||
Solid Archive Layout:
|
||||
[Header] [Compressed Stream] [Footer]
|
||||
├─ File1 compressed data
|
||||
├─ File2 compressed data
|
||||
├─ File3 compressed data
|
||||
└─ File4 compressed data
|
||||
```
|
||||
|
||||
Extracting File3 requires decompressing File1 and File2 first.
|
||||
|
||||
### Sequential vs Random Extraction
|
||||
|
||||
**Random Extraction (Slow):**
|
||||
```csharp
|
||||
using (var archive = RarArchive.Open("solid.rar"))
|
||||
{
|
||||
foreach (var entry in archive.Entries)
|
||||
{
|
||||
entry.WriteToFile(@"C:\output\" + entry.Key); // ✗ Slow!
|
||||
// Each entry triggers full decompression from start
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Sequential Extraction (Fast):**
|
||||
```csharp
|
||||
using (var archive = RarArchive.Open("solid.rar"))
|
||||
{
|
||||
// Method 1: Use WriteToDirectory (recommended)
|
||||
archive.WriteToDirectory(@"C:\output", new ExtractionOptions
|
||||
{
|
||||
ExtractFullPath = true,
|
||||
Overwrite = true
|
||||
});
|
||||
|
||||
// Method 2: Use ExtractAllEntries
|
||||
archive.ExtractAllEntries();
|
||||
|
||||
// Method 3: Use Reader API (also sequential)
|
||||
using (var reader = RarReader.Open(File.OpenRead("solid.rar")))
|
||||
{
|
||||
while (reader.MoveToNextEntry())
|
||||
{
|
||||
reader.WriteEntryToDirectory(@"C:\output");
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Performance Impact:**
|
||||
- Random extraction: O(n²) - very slow for many files
|
||||
- Sequential extraction: O(n) - 10-100x faster
|
||||
|
||||
### Best Practices for Solid Archives
|
||||
|
||||
1. **Always extract sequentially** when possible
|
||||
2. **Use Reader API** for large solid archives
|
||||
3. **Process entries in order** from the archive
|
||||
4. **Consider using 7Zip command-line** for scripted extractions
|
||||
|
||||
---
|
||||
|
||||
## Compression Level Trade-offs
|
||||
|
||||
### Deflate/GZip Levels
|
||||
|
||||
```csharp
|
||||
// Level 1 = Fastest, largest size
|
||||
// Level 6 = Default (balanced)
|
||||
// Level 9 = Slowest, best compression
|
||||
|
||||
// Write with different compression levels
|
||||
using (var archive = ZipArchive.Create())
|
||||
{
|
||||
archive.AddAllFromDirectory(@"D:\data");
|
||||
|
||||
// Fast compression (level 1)
|
||||
archive.SaveTo("fast.zip", new WriterOptions(CompressionType.Deflate)
|
||||
{
|
||||
CompressionLevel = 1
|
||||
});
|
||||
|
||||
// Default compression (level 6)
|
||||
archive.SaveTo("default.zip", CompressionType.Deflate);
|
||||
|
||||
// Best compression (level 9)
|
||||
archive.SaveTo("best.zip", new WriterOptions(CompressionType.Deflate)
|
||||
{
|
||||
CompressionLevel = 9
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
**Speed vs Size:**
|
||||
| Level | Speed | Size | Use Case |
|
||||
|-------|-------|------|----------|
|
||||
| 1 | 10x | 90% | Network, streaming |
|
||||
| 6 | 1x | 75% | Default (good balance) |
|
||||
| 9 | 0.1x | 65% | Archival, static storage |
|
||||
|
||||
### BZip2 Block Size
|
||||
|
||||
```csharp
|
||||
// BZip2 block size affects memory and compression
|
||||
// 100K to 900K (default 900K)
|
||||
|
||||
// Smaller block size = lower memory, faster
|
||||
// Larger block size = better compression, slower
|
||||
|
||||
using (var archive = TarArchive.Create())
|
||||
{
|
||||
archive.AddAllFromDirectory(@"D:\data");
|
||||
|
||||
// These are preset in WriterOptions via CompressionLevel
|
||||
archive.SaveTo("archive.tar.bz2", CompressionType.BZip2);
|
||||
}
|
||||
```
|
||||
|
||||
### LZMA Settings
|
||||
|
||||
LZMA compression is very powerful but memory-intensive:
|
||||
|
||||
```csharp
|
||||
// LZMA (7Zip, .tar.lzma):
|
||||
// - Dictionary size: 16 KB to 1 GB (default 32 MB)
|
||||
// - Faster preset: smaller dictionary
|
||||
// - Better compression: larger dictionary
|
||||
|
||||
// Preset via CompressionType
|
||||
using (var archive = TarArchive.Create())
|
||||
{
|
||||
archive.AddAllFromDirectory(@"D:\data");
|
||||
archive.SaveTo("archive.tar.xz", CompressionType.LZMA); // Default settings
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Async Performance
|
||||
|
||||
### When Async Helps
|
||||
|
||||
Async is beneficial when:
|
||||
- **Long I/O operations** (network, slow disks)
|
||||
- **UI responsiveness** needed (Windows Forms, WPF, Blazor)
|
||||
- **Server applications** (ASP.NET, multiple concurrent operations)
|
||||
|
||||
```csharp
|
||||
// Async extraction (non-blocking)
|
||||
using (var archive = ZipArchive.Open("archive.zip"))
|
||||
{
|
||||
await archive.WriteToDirectoryAsync(
|
||||
@"C:\output",
|
||||
new ExtractionOptions { ExtractFullPath = true, Overwrite = true },
|
||||
cancellationToken
|
||||
);
|
||||
}
|
||||
// Thread can handle other work while I/O happens
|
||||
```
|
||||
|
||||
### When Async Doesn't Help
|
||||
|
||||
Async doesn't improve performance for:
|
||||
- **CPU-bound operations** (already fast)
|
||||
- **Local SSD I/O** (I/O is fast enough)
|
||||
- **Single-threaded scenarios** (no parallelism benefit)
|
||||
|
||||
```csharp
|
||||
// Sync extraction (simpler, same performance on fast I/O)
|
||||
using (var archive = ZipArchive.Open("archive.zip"))
|
||||
{
|
||||
archive.WriteToDirectory(
|
||||
@"C:\output",
|
||||
new ExtractionOptions { ExtractFullPath = true, Overwrite = true }
|
||||
);
|
||||
}
|
||||
// Simple and fast - no async needed
|
||||
```
|
||||
|
||||
### Cancellation Pattern
|
||||
|
||||
```csharp
|
||||
var cts = new CancellationTokenSource();
|
||||
|
||||
// Cancel after 5 minutes
|
||||
cts.CancelAfter(TimeSpan.FromMinutes(5));
|
||||
|
||||
try
|
||||
{
|
||||
using (var archive = ZipArchive.Open("archive.zip"))
|
||||
{
|
||||
await archive.WriteToDirectoryAsync(
|
||||
@"C:\output",
|
||||
new ExtractionOptions { ExtractFullPath = true, Overwrite = true },
|
||||
cts.Token
|
||||
);
|
||||
}
|
||||
}
|
||||
catch (OperationCanceledException)
|
||||
{
|
||||
Console.WriteLine("Extraction cancelled");
|
||||
// Clean up partial extraction if needed
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Memory Efficiency
|
||||
|
||||
### Reducing Allocations
|
||||
|
||||
```csharp
|
||||
// ✗ Wrong - creates new options object each iteration
|
||||
foreach (var archiveFile in archiveFiles)
|
||||
{
|
||||
using (var archive = ZipArchive.Open(archiveFile))
|
||||
{
|
||||
archive.WriteToDirectory(outputDir, new ExtractionOptions
|
||||
{
|
||||
ExtractFullPath = true,
|
||||
Overwrite = true
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// ✓ Better - reuse options object
|
||||
var options = new ExtractionOptions
|
||||
{
|
||||
ExtractFullPath = true,
|
||||
Overwrite = true
|
||||
};
|
||||
foreach (var archiveFile in archiveFiles)
|
||||
{
|
||||
using (var archive = ZipArchive.Open(archiveFile))
|
||||
{
|
||||
archive.WriteToDirectory(outputDir, options);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Object Pooling for Repeated Operations
|
||||
|
||||
```csharp
|
||||
// For very high-throughput scenarios, consider pooling
|
||||
public class ArchiveExtractionPool
|
||||
{
|
||||
private readonly ArrayPool<byte> _bufferPool = ArrayPool<byte>.Shared;
|
||||
|
||||
public void ExtractMany(IEnumerable<string> archiveFiles, string outputDir)
|
||||
{
|
||||
var options = new ExtractionOptions
|
||||
{
|
||||
ExtractFullPath = true,
|
||||
Overwrite = true
|
||||
};
|
||||
|
||||
foreach (var archiveFile in archiveFiles)
|
||||
{
|
||||
using (var stream = File.OpenRead(archiveFile))
|
||||
using (var archive = ZipArchive.Open(stream))
|
||||
{
|
||||
archive.WriteToDirectory(outputDir, options);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Practical Performance Tips
|
||||
|
||||
### 1. Choose the Right API
|
||||
|
||||
| Scenario | API | Why |
|
||||
|----------|-----|-----|
|
||||
| Small archives | Archive | Faster random access |
|
||||
| Large archives | Reader | Lower memory |
|
||||
| Streaming | Reader | Works on non-seekable streams |
|
||||
| Download streams | Reader | Async extraction while downloading |
|
||||
|
||||
### 2. Batch Operations
|
||||
|
||||
```csharp
|
||||
// ✗ Slow - opens each archive separately
|
||||
foreach (var file in files)
|
||||
{
|
||||
using (var archive = ZipArchive.Open("archive.zip"))
|
||||
{
|
||||
archive.WriteToDirectory(@"C:\output");
|
||||
}
|
||||
}
|
||||
|
||||
// ✓ Better - process multiple entries at once
|
||||
using (var archive = ZipArchive.Open("archive.zip"))
|
||||
{
|
||||
archive.WriteToDirectory(@"C:\output");
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Use Appropriate Compression
|
||||
|
||||
```csharp
|
||||
// For distribution/storage: Best compression
|
||||
archive.SaveTo("archive.zip", new WriterOptions(CompressionType.Deflate)
|
||||
{
|
||||
CompressionLevel = 9
|
||||
});
|
||||
|
||||
// For daily backups: Balanced compression
|
||||
archive.SaveTo("backup.zip", CompressionType.Deflate); // Default level 6
|
||||
|
||||
// For temporary/streaming: Fast compression
|
||||
archive.SaveTo("temp.zip", new WriterOptions(CompressionType.Deflate)
|
||||
{
|
||||
CompressionLevel = 1
|
||||
});
|
||||
```
|
||||
|
||||
### 4. Profile Your Code
|
||||
|
||||
```csharp
|
||||
var sw = Stopwatch.StartNew();
|
||||
using (var archive = ZipArchive.Open("large.zip"))
|
||||
{
|
||||
archive.WriteToDirectory(@"C:\output");
|
||||
}
|
||||
sw.Stop();
|
||||
|
||||
Console.WriteLine($"Extraction took {sw.ElapsedMilliseconds}ms");
|
||||
|
||||
// Measure memory before/after
|
||||
var beforeMem = GC.GetTotalMemory(true);
|
||||
// ... do work ...
|
||||
var afterMem = GC.GetTotalMemory(true);
|
||||
Console.WriteLine($"Memory used: {(afterMem - beforeMem) / 1024 / 1024}MB");
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting Performance
|
||||
|
||||
### Extraction is Slow
|
||||
|
||||
1. **Check if solid archive** → Use sequential extraction
|
||||
2. **Check API** → Reader API might be faster for large files
|
||||
3. **Check compression level** → Higher levels are slower to decompress
|
||||
4. **Check I/O** → Network drives are much slower than SSD
|
||||
5. **Check buffer size** → May need larger buffers for network
|
||||
|
||||
### High Memory Usage
|
||||
|
||||
1. **Use Reader API** instead of Archive API
|
||||
2. **Process entries immediately** rather than buffering
|
||||
3. **Reduce compression level** if writing
|
||||
4. **Check for memory leaks** in your code
|
||||
|
||||
### CPU Usage at 100%
|
||||
|
||||
1. **Normal for compression** - especially with high compression levels
|
||||
2. **Consider lower level** for faster processing
|
||||
3. **Reduce parallelism** if processing multiple archives
|
||||
4. **Check if awaiting properly** in async code
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [PERFORMANCE.md](USAGE.md) - Usage examples with performance considerations
|
||||
- [FORMATS.md](FORMATS.md) - Format-specific performance notes
|
||||
- [TROUBLESHOOTING.md](TROUBLESHOOTING.md) - Solving common issues
|
||||
@@ -1,6 +1,6 @@
|
||||
# SharpCompress Usage
|
||||
|
||||
## Async/Await Support
|
||||
## Async/Await Support (Beta)
|
||||
|
||||
SharpCompress now provides full async/await support for all I/O operations. All `Read`, `Write`, and extraction operations have async equivalents ending in `Async` that accept an optional `CancellationToken`. This enables better performance and scalability for I/O-bound operations.
|
||||
|
||||
@@ -13,7 +13,7 @@ SharpCompress now provides full async/await support for all I/O operations. All
|
||||
|
||||
See [Async Examples](#async-examples) section below for usage patterns.
|
||||
|
||||
## Stream Rules (changed with 0.21)
|
||||
## Stream Rules
|
||||
|
||||
When dealing with Streams, the rule should be that you don't close a stream you didn't create. This, in effect, should mean you should always put a Stream in a using block to dispose it.
|
||||
|
||||
Reference in New Issue
Block a user