OS specific invalid characters are causing extraction to corrupt file #649

New Issue

claunia · 2026-01-29T22:15:17Z

claunia commented

2026-01-29 22:15:17 +00:00

Originally created by @DineshSolanki on GitHub (Jul 24, 2024).

https://www.deviantart.com/zenoasis/art/Japanese-TV-Dorama-folder-icon-pack-162-1077192465
Download zip from there

Extract using SharpCompress,
we'll see that the files REAL 恋愛殺人捜査班 : Real - Renai Satsujin Sosa Han.png and あの子の子ども : My Girlfriend's Child.png are extracted as 0 byte file with following names REAL µüïµä¢µ«║Σ║║µì£µƒ╗τÅ¡ , πüéπü«σ¡Éπü«σ¡Éπü¿πéÖπéé

I also tried with IBM437 encoding but same result.

However when you extract using 7zip you can see that it extracts fine and 7zip makes some changes to file name - which seeems to be removing : character which might be either U+A789 or U+2236

filenames from 7zip extraction REAL 恋愛殺人捜査班 _ Real - Renai Satsujin Sosa Han.png, あの子の子ども _ My Girlfriend's Child.png

Originally created by @DineshSolanki on GitHub (Jul 24, 2024). https://www.deviantart.com/zenoasis/art/Japanese-TV-Dorama-folder-icon-pack-162-1077192465 Download zip from there ![image](https://github.com/user-attachments/assets/2193a4c4-ccc2-4251-b383-3f6577fbc908) Extract using SharpCompress, we'll see that the files `REAL 恋愛殺人捜査班 : Real - Renai Satsujin Sosa Han.png` and `あの子の子ども : My Girlfriend's Child.png` are extracted as 0 byte file with following names `REAL µüïµä¢µ«║Σ║║µì£µƒ╗τÅ¡ ` , `πüéπü«σ¡Éπü«σ¡Éπü¿πéÖπéé` I also tried with IBM437 encoding but same result. However when you extract using 7zip you can see that it extracts fine and 7zip makes some changes to file name - which seeems to be removing `:` character which might be either `U+A789` or `U+2236` filenames from 7zip extraction `REAL 恋愛殺人捜査班 _ Real - Renai Satsujin Sosa Han.png`, `あの子の子ども _ My Girlfriend's Child.png`

claunia added the bug up for grabs labels 2026-01-29 22:15:17 +00:00

claunia closed this issue

2026-01-29 22:15:17 +00:00

claunia commented

2026-01-29 22:15:18 +00:00

@Morilli commented on GitHub (Jul 24, 2024):

There is no validation of the destination file name, so an attempt is made to write to a filestream with a filename containing :.
Interestingly this seems to somewhat work and no library function throws an exception (not even Path.GetFullPath even though it's clearly documented to do so), but the output is kinda garbage.

While I'm actually kind of interested in figuring out what actually happens when this is attempted, the fix here is most likely to sanitize the output path before attempting to open any output:

diff --git a/src/SharpCompress/Common/ExtractionMethods.cs b/src/SharpCompress/Common/ExtractionMethods.cs
index 27d4164..80b8e9d 100644
--- a/src/SharpCompress/Common/ExtractionMethods.cs
+++ b/src/SharpCompress/Common/ExtractionMethods.cs
@@ -37,6 +37,7 @@ internal static class ExtractionMethods
         options ??= new ExtractionOptions() { Overwrite = true };

         var file = Path.GetFileName(entry.Key.NotNull("Entry Key is null")).NotNull("File is null");
+        file = string.Join("_", file.Split(Path.GetInvalidFileNameChars()));
         if (options.ExtractFullPath)
         {
             var folder = Path.GetDirectoryName(entry.Key.NotNull("Entry Key is null"))

@Morilli commented on GitHub (Jul 24, 2024): There is no validation of the destination file name, so an attempt is made to write to a filestream with a filename containing `:`. Interestingly this seems to somewhat work and no library function throws an exception (not even [Path.GetFullPath](https://learn.microsoft.com/en-us/dotnet/api/system.io.path.getfullpath) even though it's clearly documented to do so), but the output is kinda garbage. While I'm actually kind of interested in figuring out what actually happens when this is attempted, the fix here is most likely to sanitize the output path before attempting to open any output: ```diff diff --git a/src/SharpCompress/Common/ExtractionMethods.cs b/src/SharpCompress/Common/ExtractionMethods.cs index 27d4164..80b8e9d 100644 --- a/src/SharpCompress/Common/ExtractionMethods.cs +++ b/src/SharpCompress/Common/ExtractionMethods.cs @@ -37,6 +37,7 @@ internal static class ExtractionMethods options ??= new ExtractionOptions() { Overwrite = true }; var file = Path.GetFileName(entry.Key.NotNull("Entry Key is null")).NotNull("File is null"); + file = string.Join("_", file.Split(Path.GetInvalidFileNameChars())); if (options.ExtractFullPath) { var folder = Path.GetDirectoryName(entry.Key.NotNull("Entry Key is null")) ```

claunia commented

2026-01-29 22:15:18 +00:00

@DineshSolanki commented on GitHub (Jul 25, 2024):

Yes that seems to be the solution, I wonder if 7zip handles it in same way.
to answer on what happens when we try it, it seems to truncate the filename and stream is consumed somewhere else as its definitely not writing to that file with truncated name.
even windows explorer can't extract it

@DineshSolanki commented on GitHub (Jul 25, 2024): Yes that seems to be the solution, I wonder if 7zip handles it in same way. to answer on what happens when we try it, it seems to truncate the filename and stream is consumed somewhere else as its definitely not writing to that file with truncated name. even windows explorer can't extract it ![image](https://github.com/user-attachments/assets/5e169753-0c21-4c6a-b92e-cbed24c1d2b7)

claunia commented

2026-01-29 22:15:18 +00:00

@adamhathcock commented on GitHub (Jul 25, 2024):

Could it be as simple as put the string through the encoding?

@adamhathcock commented on GitHub (Jul 25, 2024): Could it be as simple as put the string through the encoding?

claunia commented

2026-01-29 22:15:18 +00:00

@DineshSolanki commented on GitHub (Jul 25, 2024):

Could it be as simple as put the string through the encoding?

didn't work, tried UTF 8 and IBM437

var file = Path.GetFileName(entry.Key.NotNull("Entry Key is null")).NotNull("File is null");
var encoding = Encoding.UTF8;
 file = encoding.GetString(encoding.GetBytes(file)) ?? "";

However a 7zip discussion also mentions the encoding https://sourceforge.net/p/sevenzip/discussion/45798/thread/82ae0f9c/
Maybe I'm encoding in wrong manner?

@DineshSolanki commented on GitHub (Jul 25, 2024): > Could it be as simple as put the string through the encoding? didn't work, tried UTF 8 and IBM437 ``` var file = Path.GetFileName(entry.Key.NotNull("Entry Key is null")).NotNull("File is null"); var encoding = Encoding.UTF8; file = encoding.GetString(encoding.GetBytes(file)) ?? ""; ``` However a 7zip discussion also mentions the encoding https://sourceforge.net/p/sevenzip/discussion/45798/thread/82ae0f9c/ Maybe I'm encoding in wrong manner?

claunia commented

2026-01-29 22:15:18 +00:00

@DineshSolanki commented on GitHub (Jul 25, 2024):

issue seems to be with invalid character itself instead of the encoding, I was wrong in saying that the colon was probably unicode U+A789.

the following code will run into same issue

File.WriteAllText("あの子の子ども : My Girlfriend's Child.txt", "test");

so we have to remove invalid characters as @Morilli suggested, I tried his solution and it works

7zip a7a1d4a241/CPP/7zip/UI/Common/ExtractingFilePath.cpp (L23)

and
the SevenZipSharp also seems to does the same d0bf5c4a3d/SevenZip/ArchiveExtractCallback.cs (L624C13-L641C14)

@DineshSolanki commented on GitHub (Jul 25, 2024): issue seems to be with invalid character itself instead of the encoding, I was wrong in saying that the colon was probably unicode `U+A789`. the following code will run into same issue ```csharp File.WriteAllText("あの子の子ども : My Girlfriend's Child.txt", "test"); ``` so we have to remove invalid characters as @Morilli suggested, I tried his solution and it works 7zip https://github.com/ip7z/7zip/blob/a7a1d4a241492e81f659a920f7379c193593ebc6/CPP/7zip/UI/Common/ExtractingFilePath.cpp#L23 and the [SevenZipSharp](https://github.com/StevenBonePgh/SevenZipSharp/tree/d0bf5c4a3d65ea5e85ffe7ef94312354475f5714/SevenZip) also seems to does the same https://github.com/StevenBonePgh/SevenZipSharp/blob/d0bf5c4a3d65ea5e85ffe7ef94312354475f5714/SevenZip/ArchiveExtractCallback.cs#L624C13-L641C14

claunia commented

2026-01-29 22:15:19 +00:00

@adamhathcock commented on GitHub (Jul 26, 2024):

Your fix names sense: we shouldn't put invalid path characters in....but seems like it wouldn't work for everything? I'm inclined to accept it as other things do it

@adamhathcock commented on GitHub (Jul 26, 2024): Your fix names sense: we shouldn't put invalid path characters in....but seems like it wouldn't work for everything? I'm inclined to accept it as other things do it

claunia commented

2026-01-29 22:15:19 +00:00

@DineshSolanki commented on GitHub (Jul 26, 2024):

@adamhathcock It surely work for the issue we are having, its the same logic that 7zip is using

@DineshSolanki commented on GitHub (Jul 26, 2024): @adamhathcock It surely work for the issue we are having, its the same logic that 7zip is using

Sign in to join this conversation.

Branches Tags

master

copilot/add-performance-benchmarks

adam/cleanup

copilot/fix-openentrystreamasync-memory-issue

release

adam/more-explode-async

copilot/fix-infinite-loop-rar-archive

adam/data-descriptor-fix

adam/fix-tests-with-proper-rewind

copilot/fix-data-descriptor-stream-bug

adam/lmza-investigation

adam/create-rar-async

adam/async-rar2

copilot/support-multi-threading-path

copilot/sub-pr-1132-again

adam/memory-perf

copilot/add-performance-benchmarking

copilot/sub-pr-1121

copilot/add-password-support-zip-files

copilot/add-so-optimized-zip-support

adam/rar-async-only

copilot/add-buffered-stream-async-read

copilot/sub-pr-1076

copilot/fix-decompression-exception

copilot/fix-archivefactory-issue

copilot/rationalize-sourcestream-volumes

adam/open-async

copilot/add-ace-archive-support

copilot/sub-pr-1040-again

adam/more-async-3

copilot/fix-tararchive-incomplete-iteration

adam/multi-threaded

copilot/sub-pr-1040

adam/awesome-copilot

copilot/fix-ziparchive-extraction-issue

copilot/fix-tararchive-open-crash

copilot/fix-tar-xz-file-reading-issue

copilot/setup-copilot-instructions

copilot/fix-decompression-performance-issue

copilot/convert-stream-access-to-async

adam/enable-agent

adam/async-deflate

adam/async-rar

adam/more-cleanup

adam/zstd

async-2

zstandard

net461-tests

dmg

async

build-netcore3

recycle-memory-stream

presentation

pax

netcore2

zip_encryption

dotnet-tool

tar_redux

native_zlib

Issue-197

system_buffers

TarNames

7zip_sfx

portable_crypto

WinRT

new_7zip

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/sharpcompress#649