# SharpCompress Performance Guide This guide helps you optimize SharpCompress for performance in various scenarios. ## API Selection Guide ### Archive API vs Reader API Choose the right API based on your use case: | Aspect | Archive API | Reader API | |--------|------------|-----------| | **Stream Type** | Seekable only | Non-seekable OK | | **Memory Usage** | All entries in memory | One entry at a time | | **Random Access** | ✓ Yes | ✗ No | | **Best For** | Small-to-medium archives | Large or streaming data | | **Performance** | Fast for random access | Better for large files | ### Archive API (Fast for Random Access) ```csharp // Use when: // - Archive fits in memory // - You need random access to entries // - Stream is seekable (file, MemoryStream) using (var archive = ZipArchive.OpenArchive("archive.zip")) { // Random access - all entries available var specific = archive.Entries.FirstOrDefault(e => e.Key == "file.txt"); if (specific != null) { specific.WriteToFile(@"C:\output\file.txt"); } } ``` **Performance Characteristics:** - ✓ Instant entry lookup - ✓ Parallel extraction possible - ✗ Entire archive in memory - ✗ Can't process while downloading ### Reader API (Best for Large Files) ```csharp // Use when: // - Processing large archives (>100 MB) // - Streaming from network/pipe // - Memory is constrained // - Forward-only processing is acceptable using (var stream = File.OpenRead("large.zip")) using (var reader = ReaderFactory.OpenReader(stream)) { while (reader.MoveToNextEntry()) { // Process one entry at a time reader.WriteEntryToDirectory(@"C:\output"); } } ``` **Performance Characteristics:** - ✓ Minimal memory footprint - ✓ Works with non-seekable streams - ✓ Can process while downloading - ✗ Forward-only (no random access) - ✗ Entry lookup requires iteration --- ## Buffer Sizing ### Understanding Buffers SharpCompress uses internal buffers for reading compressed data. Buffer size affects: - **Speed:** Larger buffers = fewer I/O operations = faster - **Memory:** Larger buffers = higher memory usage ### Recommended Buffer Sizes | Scenario | Size | Notes | |----------|------|-------| | Embedded/IoT devices | 4-8 KB | Minimal memory usage | | Memory-constrained | 16-32 KB | Conservative default | | Standard use (default) | 64 KB | Recommended default | | Large file streaming | 256 KB | Better throughput | | High-speed SSD | 512 KB - 1 MB | Maximum throughput | ### How Buffer Size Affects Performance ```csharp // SharpCompress manages buffers internally // You can't directly set buffer size, but you can: // 1. Use Stream.CopyTo with explicit buffer size using (var entryStream = reader.OpenEntryStream()) using (var fileStream = File.Create(@"C:\output\file.txt")) { // 64 KB buffer (default) entryStream.CopyTo(fileStream); // Or specify larger buffer for faster copy entryStream.CopyTo(fileStream, bufferSize: 262144); // 256 KB } // 2. Use custom buffer for writing using (var entryStream = reader.OpenEntryStream()) using (var fileStream = File.Create(@"C:\output\file.txt")) { byte[] buffer = new byte[262144]; // 256 KB int bytesRead; while ((bytesRead = entryStream.Read(buffer, 0, buffer.Length)) > 0) { fileStream.Write(buffer, 0, bytesRead); } } ``` --- ## Streaming Large Files ### Non-Seekable Stream Patterns For processing archives from downloads or pipes: ```csharp // Download stream (non-seekable) using (var httpStream = await httpClient.GetStreamAsync(url)) using (var reader = ReaderFactory.OpenReader(httpStream)) { // Process entries as they arrive while (reader.MoveToNextEntry()) { if (!reader.Entry.IsDirectory) { reader.WriteEntryToDirectory(@"C:\output"); } } } ``` **Performance Tips:** - Don't try to buffer the entire stream - Process entries immediately - Use async APIs for better responsiveness ### Download-Then-Extract vs Streaming Choose based on your constraints: | Approach | When to Use | |----------|------------| | **Download then extract** | Moderate size, need random access | | **Stream during download** | Large files, bandwidth limited, memory constrained | ```csharp // Download then extract (requires disk space) var archivePath = await DownloadFile(url, @"C:\temp\archive.zip"); using (var archive = ZipArchive.OpenArchive(archivePath)) { archive.WriteToDirectory(@"C:\output"); } // Stream during download (on-the-fly extraction) using (var httpStream = await httpClient.GetStreamAsync(url)) using (var reader = ReaderFactory.OpenReader(httpStream)) { while (reader.MoveToNextEntry()) { reader.WriteEntryToDirectory(@"C:\output"); } } ``` --- ## Solid Archive Optimization ### Why Solid Archives Are Slow Solid archives (Rar, 7Zip) group files together in a single compressed stream: ``` Solid Archive Layout: [Header] [Compressed Stream] [Footer] ├─ File1 compressed data ├─ File2 compressed data ├─ File3 compressed data └─ File4 compressed data ``` Extracting File3 requires decompressing File1 and File2 first. ### Sequential vs Random Extraction **Random Extraction (Slow):** ```csharp using (var archive = RarArchive.OpenArchive("solid.rar")) { foreach (var entry in archive.Entries) { entry.WriteToFile(@"C:\output\" + entry.Key); // ✗ Slow! // Each entry triggers full decompression from start } } ``` **Sequential Extraction (Fast):** ```csharp using (var archive = RarArchive.OpenArchive("solid.rar")) { // Method 1: Use WriteToDirectory (recommended) archive.WriteToDirectory(@"C:\output", new ExtractionOptions { ExtractFullPath = true, Overwrite = true }); // Method 2: Use ExtractAllEntries archive.ExtractAllEntries(); // Method 3: Use Reader API (also sequential) using (var reader = RarReader.Open(File.OpenRead("solid.rar"))) { while (reader.MoveToNextEntry()) { reader.WriteEntryToDirectory(@"C:\output"); } } } ``` **Performance Impact:** - Random extraction: O(n²) - very slow for many files - Sequential extraction: O(n) - 10-100x faster ### Best Practices for Solid Archives 1. **Always extract sequentially** when possible 2. **Use Reader API** for large solid archives 3. **Process entries in order** from the archive 4. **Consider using 7Zip command-line** for scripted extractions --- ## Compression Level Trade-offs ### Deflate/GZip Levels ```csharp // Level 1 = Fastest, largest size // Level 6 = Default (balanced) // Level 9 = Slowest, best compression // Write with different compression levels using (var archive = ZipArchive.CreateArchive()) { archive.AddAllFromDirectory(@"D:\data"); // Fast compression (level 1) archive.SaveTo("fast.zip", new WriterOptions(CompressionType.Deflate) { CompressionLevel = 1 }); // Default compression (level 6) archive.SaveTo("default.zip", CompressionType.Deflate); // Best compression (level 9) archive.SaveTo("best.zip", new WriterOptions(CompressionType.Deflate) { CompressionLevel = 9 }); } ``` **Speed vs Size:** | Level | Speed | Size | Use Case | |-------|-------|------|----------| | 1 | 10x | 90% | Network, streaming | | 6 | 1x | 75% | Default (good balance) | | 9 | 0.1x | 65% | Archival, static storage | ### BZip2 Block Size ```csharp // BZip2 block size affects memory and compression // 100K to 900K (default 900K) // Smaller block size = lower memory, faster // Larger block size = better compression, slower using (var archive = TarArchive.CreateArchive()) { archive.AddAllFromDirectory(@"D:\data"); // These are preset in WriterOptions via CompressionLevel archive.SaveTo("archive.tar.bz2", CompressionType.BZip2); } ``` ### LZMA Settings LZMA compression is very powerful but memory-intensive: ```csharp // LZMA (7Zip, .tar.lzma): // - Dictionary size: 16 KB to 1 GB (default 32 MB) // - Faster preset: smaller dictionary // - Better compression: larger dictionary // Preset via CompressionType using (var archive = TarArchive.CreateArchive()) { archive.AddAllFromDirectory(@"D:\data"); archive.SaveTo("archive.tar.xz", CompressionType.LZMA); // Default settings } ``` --- ## Async Performance ### When Async Helps Async is beneficial when: - **Long I/O operations** (network, slow disks) - **UI responsiveness** needed (Windows Forms, WPF, Blazor) - **Server applications** (ASP.NET, multiple concurrent operations) ```csharp // Async extraction (non-blocking) using (var archive = ZipArchive.OpenArchive("archive.zip")) { await archive.WriteToDirectoryAsync( @"C:\output", new ExtractionOptions { ExtractFullPath = true, Overwrite = true }, cancellationToken ); } // Thread can handle other work while I/O happens ``` ### When Async Doesn't Help Async doesn't improve performance for: - **CPU-bound operations** (already fast) - **Local SSD I/O** (I/O is fast enough) - **Single-threaded scenarios** (no parallelism benefit) ```csharp // Sync extraction (simpler, same performance on fast I/O) using (var archive = ZipArchive.OpenArchive("archive.zip")) { archive.WriteToDirectory( @"C:\output", new ExtractionOptions { ExtractFullPath = true, Overwrite = true } ); } // Simple and fast - no async needed ``` ### Cancellation Pattern ```csharp var cts = new CancellationTokenSource(); // Cancel after 5 minutes cts.CancelAfter(TimeSpan.FromMinutes(5)); try { using (var archive = ZipArchive.OpenArchive("archive.zip")) { await archive.WriteToDirectoryAsync( @"C:\output", new ExtractionOptions { ExtractFullPath = true, Overwrite = true }, cts.Token ); } } catch (OperationCanceledException) { Console.WriteLine("Extraction cancelled"); // Clean up partial extraction if needed } ``` --- ## Practical Performance Tips ### 1. Choose the Right API | Scenario | API | Why | |----------|-----|-----| | Small archives | Archive | Faster random access | | Large archives | Reader | Lower memory | | Streaming | Reader | Works on non-seekable streams | | Download streams | Reader | Async extraction while downloading | ### 2. Batch Operations ```csharp // ✗ Slow - opens each archive separately foreach (var file in files) { using (var archive = ZipArchive.OpenArchive("archive.zip")) { archive.WriteToDirectory(@"C:\output"); } } // ✓ Better - process multiple entries at once using (var archive = ZipArchive.OpenArchive("archive.zip")) { archive.WriteToDirectory(@"C:\output"); } ``` ### 3. Profile Your Code ```csharp var sw = Stopwatch.StartNew(); using (var archive = ZipArchive.OpenArchive("large.zip")) { archive.WriteToDirectory(@"C:\output"); } sw.Stop(); Console.WriteLine($"Extraction took {sw.ElapsedMilliseconds}ms"); // Measure memory before/after var beforeMem = GC.GetTotalMemory(true); // ... do work ... var afterMem = GC.GetTotalMemory(true); Console.WriteLine($"Memory used: {(afterMem - beforeMem) / 1024 / 1024}MB"); ``` --- ## Troubleshooting Performance ### Extraction is Slow 1. **Check if solid archive** → Use sequential extraction 2. **Check API** → Reader API might be faster for large files 3. **Check compression level** → Higher levels are slower to decompress 4. **Check I/O** → Network drives are much slower than SSD 5. **Check buffer size** → May need larger buffers for network ### High Memory Usage 1. **Use Reader API** instead of Archive API 2. **Process entries immediately** rather than buffering 3. **Reduce compression level** if writing 4. **Check for memory leaks** in your code ### CPU Usage at 100% 1. **Normal for compression** - especially with high compression levels 2. **Consider lower level** for faster processing 3. **Reduce parallelism** if processing multiple archives 4. **Check if awaiting properly** in async code --- ## Related Documentation - [PERFORMANCE.md](USAGE.md) - Usage examples with performance considerations - [FORMATS.md](FORMATS.md) - Format-specific performance notes