mirror of
https://github.com/adamhathcock/sharpcompress.git
synced 2026-02-04 05:25:00 +00:00
558 lines
14 KiB
Markdown
558 lines
14 KiB
Markdown
|
|
# SharpCompress Performance Guide
|
||
|
|
|
||
|
|
This guide helps you optimize SharpCompress for performance in various scenarios.
|
||
|
|
|
||
|
|
## API Selection Guide
|
||
|
|
|
||
|
|
### Archive API vs Reader API
|
||
|
|
|
||
|
|
Choose the right API based on your use case:
|
||
|
|
|
||
|
|
| Aspect | Archive API | Reader API |
|
||
|
|
|--------|------------|-----------|
|
||
|
|
| **Stream Type** | Seekable only | Non-seekable OK |
|
||
|
|
| **Memory Usage** | All entries in memory | One entry at a time |
|
||
|
|
| **Random Access** | ✓ Yes | ✗ No |
|
||
|
|
| **Best For** | Small-to-medium archives | Large or streaming data |
|
||
|
|
| **Performance** | Fast for random access | Better for large files |
|
||
|
|
|
||
|
|
### Archive API (Fast for Random Access)
|
||
|
|
|
||
|
|
```csharp
|
||
|
|
// Use when:
|
||
|
|
// - Archive fits in memory
|
||
|
|
// - You need random access to entries
|
||
|
|
// - Stream is seekable (file, MemoryStream)
|
||
|
|
|
||
|
|
using (var archive = ZipArchive.Open("archive.zip"))
|
||
|
|
{
|
||
|
|
// Random access - all entries available
|
||
|
|
var specific = archive.Entries.FirstOrDefault(e => e.Key == "file.txt");
|
||
|
|
if (specific != null)
|
||
|
|
{
|
||
|
|
specific.WriteToFile(@"C:\output\file.txt");
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Performance Characteristics:**
|
||
|
|
- ✓ Instant entry lookup
|
||
|
|
- ✓ Parallel extraction possible
|
||
|
|
- ✗ Entire archive in memory
|
||
|
|
- ✗ Can't process while downloading
|
||
|
|
|
||
|
|
### Reader API (Best for Large Files)
|
||
|
|
|
||
|
|
```csharp
|
||
|
|
// Use when:
|
||
|
|
// - Processing large archives (>100 MB)
|
||
|
|
// - Streaming from network/pipe
|
||
|
|
// - Memory is constrained
|
||
|
|
// - Forward-only processing is acceptable
|
||
|
|
|
||
|
|
using (var stream = File.OpenRead("large.zip"))
|
||
|
|
using (var reader = ReaderFactory.Open(stream))
|
||
|
|
{
|
||
|
|
while (reader.MoveToNextEntry())
|
||
|
|
{
|
||
|
|
// Process one entry at a time
|
||
|
|
reader.WriteEntryToDirectory(@"C:\output");
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Performance Characteristics:**
|
||
|
|
- ✓ Minimal memory footprint
|
||
|
|
- ✓ Works with non-seekable streams
|
||
|
|
- ✓ Can process while downloading
|
||
|
|
- ✗ Forward-only (no random access)
|
||
|
|
- ✗ Entry lookup requires iteration
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Buffer Sizing
|
||
|
|
|
||
|
|
### Understanding Buffers
|
||
|
|
|
||
|
|
SharpCompress uses internal buffers for reading compressed data. Buffer size affects:
|
||
|
|
- **Speed:** Larger buffers = fewer I/O operations = faster
|
||
|
|
- **Memory:** Larger buffers = higher memory usage
|
||
|
|
|
||
|
|
### Recommended Buffer Sizes
|
||
|
|
|
||
|
|
| Scenario | Size | Notes |
|
||
|
|
|----------|------|-------|
|
||
|
|
| Embedded/IoT devices | 4-8 KB | Minimal memory usage |
|
||
|
|
| Memory-constrained | 16-32 KB | Conservative default |
|
||
|
|
| Standard use (default) | 64 KB | Recommended default |
|
||
|
|
| Large file streaming | 256 KB | Better throughput |
|
||
|
|
| High-speed SSD | 512 KB - 1 MB | Maximum throughput |
|
||
|
|
|
||
|
|
### How Buffer Size Affects Performance
|
||
|
|
|
||
|
|
```csharp
|
||
|
|
// SharpCompress manages buffers internally
|
||
|
|
// You can't directly set buffer size, but you can:
|
||
|
|
|
||
|
|
// 1. Use Stream.CopyTo with explicit buffer size
|
||
|
|
using (var entryStream = reader.OpenEntryStream())
|
||
|
|
using (var fileStream = File.Create(@"C:\output\file.txt"))
|
||
|
|
{
|
||
|
|
// 64 KB buffer (default)
|
||
|
|
entryStream.CopyTo(fileStream);
|
||
|
|
|
||
|
|
// Or specify larger buffer for faster copy
|
||
|
|
entryStream.CopyTo(fileStream, bufferSize: 262144); // 256 KB
|
||
|
|
}
|
||
|
|
|
||
|
|
// 2. Use custom buffer for writing
|
||
|
|
using (var entryStream = reader.OpenEntryStream())
|
||
|
|
using (var fileStream = File.Create(@"C:\output\file.txt"))
|
||
|
|
{
|
||
|
|
byte[] buffer = new byte[262144]; // 256 KB
|
||
|
|
int bytesRead;
|
||
|
|
while ((bytesRead = entryStream.Read(buffer, 0, buffer.Length)) > 0)
|
||
|
|
{
|
||
|
|
fileStream.Write(buffer, 0, bytesRead);
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Streaming Large Files
|
||
|
|
|
||
|
|
### Non-Seekable Stream Patterns
|
||
|
|
|
||
|
|
For processing archives from downloads or pipes:
|
||
|
|
|
||
|
|
```csharp
|
||
|
|
// Download stream (non-seekable)
|
||
|
|
using (var httpStream = await httpClient.GetStreamAsync(url))
|
||
|
|
using (var reader = ReaderFactory.Open(httpStream))
|
||
|
|
{
|
||
|
|
// Process entries as they arrive
|
||
|
|
while (reader.MoveToNextEntry())
|
||
|
|
{
|
||
|
|
if (!reader.Entry.IsDirectory)
|
||
|
|
{
|
||
|
|
reader.WriteEntryToDirectory(@"C:\output");
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Performance Tips:**
|
||
|
|
- Don't try to buffer the entire stream
|
||
|
|
- Process entries immediately
|
||
|
|
- Use async APIs for better responsiveness
|
||
|
|
|
||
|
|
### Download-Then-Extract vs Streaming
|
||
|
|
|
||
|
|
Choose based on your constraints:
|
||
|
|
|
||
|
|
| Approach | When to Use |
|
||
|
|
|----------|------------|
|
||
|
|
| **Download then extract** | Moderate size, need random access |
|
||
|
|
| **Stream during download** | Large files, bandwidth limited, memory constrained |
|
||
|
|
|
||
|
|
```csharp
|
||
|
|
// Download then extract (requires disk space)
|
||
|
|
var archivePath = await DownloadFile(url, @"C:\temp\archive.zip");
|
||
|
|
using (var archive = ZipArchive.Open(archivePath))
|
||
|
|
{
|
||
|
|
archive.WriteToDirectory(@"C:\output");
|
||
|
|
}
|
||
|
|
|
||
|
|
// Stream during download (on-the-fly extraction)
|
||
|
|
using (var httpStream = await httpClient.GetStreamAsync(url))
|
||
|
|
using (var reader = ReaderFactory.Open(httpStream))
|
||
|
|
{
|
||
|
|
while (reader.MoveToNextEntry())
|
||
|
|
{
|
||
|
|
reader.WriteEntryToDirectory(@"C:\output");
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Solid Archive Optimization
|
||
|
|
|
||
|
|
### Why Solid Archives Are Slow
|
||
|
|
|
||
|
|
Solid archives (Rar, 7Zip) group files together in a single compressed stream:
|
||
|
|
|
||
|
|
```
|
||
|
|
Solid Archive Layout:
|
||
|
|
[Header] [Compressed Stream] [Footer]
|
||
|
|
├─ File1 compressed data
|
||
|
|
├─ File2 compressed data
|
||
|
|
├─ File3 compressed data
|
||
|
|
└─ File4 compressed data
|
||
|
|
```
|
||
|
|
|
||
|
|
Extracting File3 requires decompressing File1 and File2 first.
|
||
|
|
|
||
|
|
### Sequential vs Random Extraction
|
||
|
|
|
||
|
|
**Random Extraction (Slow):**
|
||
|
|
```csharp
|
||
|
|
using (var archive = RarArchive.Open("solid.rar"))
|
||
|
|
{
|
||
|
|
foreach (var entry in archive.Entries)
|
||
|
|
{
|
||
|
|
entry.WriteToFile(@"C:\output\" + entry.Key); // ✗ Slow!
|
||
|
|
// Each entry triggers full decompression from start
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Sequential Extraction (Fast):**
|
||
|
|
```csharp
|
||
|
|
using (var archive = RarArchive.Open("solid.rar"))
|
||
|
|
{
|
||
|
|
// Method 1: Use WriteToDirectory (recommended)
|
||
|
|
archive.WriteToDirectory(@"C:\output", new ExtractionOptions
|
||
|
|
{
|
||
|
|
ExtractFullPath = true,
|
||
|
|
Overwrite = true
|
||
|
|
});
|
||
|
|
|
||
|
|
// Method 2: Use ExtractAllEntries
|
||
|
|
archive.ExtractAllEntries();
|
||
|
|
|
||
|
|
// Method 3: Use Reader API (also sequential)
|
||
|
|
using (var reader = RarReader.Open(File.OpenRead("solid.rar")))
|
||
|
|
{
|
||
|
|
while (reader.MoveToNextEntry())
|
||
|
|
{
|
||
|
|
reader.WriteEntryToDirectory(@"C:\output");
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Performance Impact:**
|
||
|
|
- Random extraction: O(n²) - very slow for many files
|
||
|
|
- Sequential extraction: O(n) - 10-100x faster
|
||
|
|
|
||
|
|
### Best Practices for Solid Archives
|
||
|
|
|
||
|
|
1. **Always extract sequentially** when possible
|
||
|
|
2. **Use Reader API** for large solid archives
|
||
|
|
3. **Process entries in order** from the archive
|
||
|
|
4. **Consider using 7Zip command-line** for scripted extractions
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Compression Level Trade-offs
|
||
|
|
|
||
|
|
### Deflate/GZip Levels
|
||
|
|
|
||
|
|
```csharp
|
||
|
|
// Level 1 = Fastest, largest size
|
||
|
|
// Level 6 = Default (balanced)
|
||
|
|
// Level 9 = Slowest, best compression
|
||
|
|
|
||
|
|
// Write with different compression levels
|
||
|
|
using (var archive = ZipArchive.Create())
|
||
|
|
{
|
||
|
|
archive.AddAllFromDirectory(@"D:\data");
|
||
|
|
|
||
|
|
// Fast compression (level 1)
|
||
|
|
archive.SaveTo("fast.zip", new WriterOptions(CompressionType.Deflate)
|
||
|
|
{
|
||
|
|
CompressionLevel = 1
|
||
|
|
});
|
||
|
|
|
||
|
|
// Default compression (level 6)
|
||
|
|
archive.SaveTo("default.zip", CompressionType.Deflate);
|
||
|
|
|
||
|
|
// Best compression (level 9)
|
||
|
|
archive.SaveTo("best.zip", new WriterOptions(CompressionType.Deflate)
|
||
|
|
{
|
||
|
|
CompressionLevel = 9
|
||
|
|
});
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Speed vs Size:**
|
||
|
|
| Level | Speed | Size | Use Case |
|
||
|
|
|-------|-------|------|----------|
|
||
|
|
| 1 | 10x | 90% | Network, streaming |
|
||
|
|
| 6 | 1x | 75% | Default (good balance) |
|
||
|
|
| 9 | 0.1x | 65% | Archival, static storage |
|
||
|
|
|
||
|
|
### BZip2 Block Size
|
||
|
|
|
||
|
|
```csharp
|
||
|
|
// BZip2 block size affects memory and compression
|
||
|
|
// 100K to 900K (default 900K)
|
||
|
|
|
||
|
|
// Smaller block size = lower memory, faster
|
||
|
|
// Larger block size = better compression, slower
|
||
|
|
|
||
|
|
using (var archive = TarArchive.Create())
|
||
|
|
{
|
||
|
|
archive.AddAllFromDirectory(@"D:\data");
|
||
|
|
|
||
|
|
// These are preset in WriterOptions via CompressionLevel
|
||
|
|
archive.SaveTo("archive.tar.bz2", CompressionType.BZip2);
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### LZMA Settings
|
||
|
|
|
||
|
|
LZMA compression is very powerful but memory-intensive:
|
||
|
|
|
||
|
|
```csharp
|
||
|
|
// LZMA (7Zip, .tar.lzma):
|
||
|
|
// - Dictionary size: 16 KB to 1 GB (default 32 MB)
|
||
|
|
// - Faster preset: smaller dictionary
|
||
|
|
// - Better compression: larger dictionary
|
||
|
|
|
||
|
|
// Preset via CompressionType
|
||
|
|
using (var archive = TarArchive.Create())
|
||
|
|
{
|
||
|
|
archive.AddAllFromDirectory(@"D:\data");
|
||
|
|
archive.SaveTo("archive.tar.xz", CompressionType.LZMA); // Default settings
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Async Performance
|
||
|
|
|
||
|
|
### When Async Helps
|
||
|
|
|
||
|
|
Async is beneficial when:
|
||
|
|
- **Long I/O operations** (network, slow disks)
|
||
|
|
- **UI responsiveness** needed (Windows Forms, WPF, Blazor)
|
||
|
|
- **Server applications** (ASP.NET, multiple concurrent operations)
|
||
|
|
|
||
|
|
```csharp
|
||
|
|
// Async extraction (non-blocking)
|
||
|
|
using (var archive = ZipArchive.Open("archive.zip"))
|
||
|
|
{
|
||
|
|
await archive.WriteToDirectoryAsync(
|
||
|
|
@"C:\output",
|
||
|
|
new ExtractionOptions { ExtractFullPath = true, Overwrite = true },
|
||
|
|
cancellationToken
|
||
|
|
);
|
||
|
|
}
|
||
|
|
// Thread can handle other work while I/O happens
|
||
|
|
```
|
||
|
|
|
||
|
|
### When Async Doesn't Help
|
||
|
|
|
||
|
|
Async doesn't improve performance for:
|
||
|
|
- **CPU-bound operations** (already fast)
|
||
|
|
- **Local SSD I/O** (I/O is fast enough)
|
||
|
|
- **Single-threaded scenarios** (no parallelism benefit)
|
||
|
|
|
||
|
|
```csharp
|
||
|
|
// Sync extraction (simpler, same performance on fast I/O)
|
||
|
|
using (var archive = ZipArchive.Open("archive.zip"))
|
||
|
|
{
|
||
|
|
archive.WriteToDirectory(
|
||
|
|
@"C:\output",
|
||
|
|
new ExtractionOptions { ExtractFullPath = true, Overwrite = true }
|
||
|
|
);
|
||
|
|
}
|
||
|
|
// Simple and fast - no async needed
|
||
|
|
```
|
||
|
|
|
||
|
|
### Cancellation Pattern
|
||
|
|
|
||
|
|
```csharp
|
||
|
|
var cts = new CancellationTokenSource();
|
||
|
|
|
||
|
|
// Cancel after 5 minutes
|
||
|
|
cts.CancelAfter(TimeSpan.FromMinutes(5));
|
||
|
|
|
||
|
|
try
|
||
|
|
{
|
||
|
|
using (var archive = ZipArchive.Open("archive.zip"))
|
||
|
|
{
|
||
|
|
await archive.WriteToDirectoryAsync(
|
||
|
|
@"C:\output",
|
||
|
|
new ExtractionOptions { ExtractFullPath = true, Overwrite = true },
|
||
|
|
cts.Token
|
||
|
|
);
|
||
|
|
}
|
||
|
|
}
|
||
|
|
catch (OperationCanceledException)
|
||
|
|
{
|
||
|
|
Console.WriteLine("Extraction cancelled");
|
||
|
|
// Clean up partial extraction if needed
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Memory Efficiency
|
||
|
|
|
||
|
|
### Reducing Allocations
|
||
|
|
|
||
|
|
```csharp
|
||
|
|
// ✗ Wrong - creates new options object each iteration
|
||
|
|
foreach (var archiveFile in archiveFiles)
|
||
|
|
{
|
||
|
|
using (var archive = ZipArchive.Open(archiveFile))
|
||
|
|
{
|
||
|
|
archive.WriteToDirectory(outputDir, new ExtractionOptions
|
||
|
|
{
|
||
|
|
ExtractFullPath = true,
|
||
|
|
Overwrite = true
|
||
|
|
});
|
||
|
|
}
|
||
|
|
}
|
||
|
|
|
||
|
|
// ✓ Better - reuse options object
|
||
|
|
var options = new ExtractionOptions
|
||
|
|
{
|
||
|
|
ExtractFullPath = true,
|
||
|
|
Overwrite = true
|
||
|
|
};
|
||
|
|
foreach (var archiveFile in archiveFiles)
|
||
|
|
{
|
||
|
|
using (var archive = ZipArchive.Open(archiveFile))
|
||
|
|
{
|
||
|
|
archive.WriteToDirectory(outputDir, options);
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Object Pooling for Repeated Operations
|
||
|
|
|
||
|
|
```csharp
|
||
|
|
// For very high-throughput scenarios, consider pooling
|
||
|
|
public class ArchiveExtractionPool
|
||
|
|
{
|
||
|
|
private readonly ArrayPool<byte> _bufferPool = ArrayPool<byte>.Shared;
|
||
|
|
|
||
|
|
public void ExtractMany(IEnumerable<string> archiveFiles, string outputDir)
|
||
|
|
{
|
||
|
|
var options = new ExtractionOptions
|
||
|
|
{
|
||
|
|
ExtractFullPath = true,
|
||
|
|
Overwrite = true
|
||
|
|
};
|
||
|
|
|
||
|
|
foreach (var archiveFile in archiveFiles)
|
||
|
|
{
|
||
|
|
using (var stream = File.OpenRead(archiveFile))
|
||
|
|
using (var archive = ZipArchive.Open(stream))
|
||
|
|
{
|
||
|
|
archive.WriteToDirectory(outputDir, options);
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Practical Performance Tips
|
||
|
|
|
||
|
|
### 1. Choose the Right API
|
||
|
|
|
||
|
|
| Scenario | API | Why |
|
||
|
|
|----------|-----|-----|
|
||
|
|
| Small archives | Archive | Faster random access |
|
||
|
|
| Large archives | Reader | Lower memory |
|
||
|
|
| Streaming | Reader | Works on non-seekable streams |
|
||
|
|
| Download streams | Reader | Async extraction while downloading |
|
||
|
|
|
||
|
|
### 2. Batch Operations
|
||
|
|
|
||
|
|
```csharp
|
||
|
|
// ✗ Slow - opens each archive separately
|
||
|
|
foreach (var file in files)
|
||
|
|
{
|
||
|
|
using (var archive = ZipArchive.Open("archive.zip"))
|
||
|
|
{
|
||
|
|
archive.WriteToDirectory(@"C:\output");
|
||
|
|
}
|
||
|
|
}
|
||
|
|
|
||
|
|
// ✓ Better - process multiple entries at once
|
||
|
|
using (var archive = ZipArchive.Open("archive.zip"))
|
||
|
|
{
|
||
|
|
archive.WriteToDirectory(@"C:\output");
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### 3. Use Appropriate Compression
|
||
|
|
|
||
|
|
```csharp
|
||
|
|
// For distribution/storage: Best compression
|
||
|
|
archive.SaveTo("archive.zip", new WriterOptions(CompressionType.Deflate)
|
||
|
|
{
|
||
|
|
CompressionLevel = 9
|
||
|
|
});
|
||
|
|
|
||
|
|
// For daily backups: Balanced compression
|
||
|
|
archive.SaveTo("backup.zip", CompressionType.Deflate); // Default level 6
|
||
|
|
|
||
|
|
// For temporary/streaming: Fast compression
|
||
|
|
archive.SaveTo("temp.zip", new WriterOptions(CompressionType.Deflate)
|
||
|
|
{
|
||
|
|
CompressionLevel = 1
|
||
|
|
});
|
||
|
|
```
|
||
|
|
|
||
|
|
### 4. Profile Your Code
|
||
|
|
|
||
|
|
```csharp
|
||
|
|
var sw = Stopwatch.StartNew();
|
||
|
|
using (var archive = ZipArchive.Open("large.zip"))
|
||
|
|
{
|
||
|
|
archive.WriteToDirectory(@"C:\output");
|
||
|
|
}
|
||
|
|
sw.Stop();
|
||
|
|
|
||
|
|
Console.WriteLine($"Extraction took {sw.ElapsedMilliseconds}ms");
|
||
|
|
|
||
|
|
// Measure memory before/after
|
||
|
|
var beforeMem = GC.GetTotalMemory(true);
|
||
|
|
// ... do work ...
|
||
|
|
var afterMem = GC.GetTotalMemory(true);
|
||
|
|
Console.WriteLine($"Memory used: {(afterMem - beforeMem) / 1024 / 1024}MB");
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Troubleshooting Performance
|
||
|
|
|
||
|
|
### Extraction is Slow
|
||
|
|
|
||
|
|
1. **Check if solid archive** → Use sequential extraction
|
||
|
|
2. **Check API** → Reader API might be faster for large files
|
||
|
|
3. **Check compression level** → Higher levels are slower to decompress
|
||
|
|
4. **Check I/O** → Network drives are much slower than SSD
|
||
|
|
5. **Check buffer size** → May need larger buffers for network
|
||
|
|
|
||
|
|
### High Memory Usage
|
||
|
|
|
||
|
|
1. **Use Reader API** instead of Archive API
|
||
|
|
2. **Process entries immediately** rather than buffering
|
||
|
|
3. **Reduce compression level** if writing
|
||
|
|
4. **Check for memory leaks** in your code
|
||
|
|
|
||
|
|
### CPU Usage at 100%
|
||
|
|
|
||
|
|
1. **Normal for compression** - especially with high compression levels
|
||
|
|
2. **Consider lower level** for faster processing
|
||
|
|
3. **Reduce parallelism** if processing multiple archives
|
||
|
|
4. **Check if awaiting properly** in async code
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Related Documentation
|
||
|
|
|
||
|
|
- [PERFORMANCE.md](USAGE.md) - Usage examples with performance considerations
|
||
|
|
- [FORMATS.md](FORMATS.md) - Format-specific performance notes
|
||
|
|
- [TROUBLESHOOTING.md](TROUBLESHOOTING.md) - Solving common issues
|