mirror of
https://github.com/adamhathcock/sharpcompress.git
synced 2026-02-05 21:23:57 +00:00
OS specific invalid characters are causing extraction to corrupt file #649
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @DineshSolanki on GitHub (Jul 24, 2024).
https://www.deviantart.com/zenoasis/art/Japanese-TV-Dorama-folder-icon-pack-162-1077192465

Download zip from there
Extract using SharpCompress,
we'll see that the files
REAL 恋愛殺人捜査班 : Real - Renai Satsujin Sosa Han.pngandあの子の子ども : My Girlfriend's Child.pngare extracted as 0 byte file with following namesREAL µüïµä¢µ«║Σ║║µì£µƒ╗τÅ¡,πüéπü«σ¡Éπü«σ¡Éπü¿πéÖπééI also tried with IBM437 encoding but same result.
However when you extract using 7zip you can see that it extracts fine and 7zip makes some changes to file name - which seeems to be removing
:character which might be eitherU+A789orU+2236filenames from 7zip extraction
REAL 恋愛殺人捜査班 _ Real - Renai Satsujin Sosa Han.png,あの子の子ども _ My Girlfriend's Child.png@Morilli commented on GitHub (Jul 24, 2024):
There is no validation of the destination file name, so an attempt is made to write to a filestream with a filename containing
:.Interestingly this seems to somewhat work and no library function throws an exception (not even Path.GetFullPath even though it's clearly documented to do so), but the output is kinda garbage.
While I'm actually kind of interested in figuring out what actually happens when this is attempted, the fix here is most likely to sanitize the output path before attempting to open any output:
@DineshSolanki commented on GitHub (Jul 25, 2024):
Yes that seems to be the solution, I wonder if 7zip handles it in same way.
to answer on what happens when we try it, it seems to truncate the filename and stream is consumed somewhere else as its definitely not writing to that file with truncated name.
even windows explorer can't extract it
@adamhathcock commented on GitHub (Jul 25, 2024):
Could it be as simple as put the string through the encoding?
@DineshSolanki commented on GitHub (Jul 25, 2024):
didn't work, tried UTF 8 and IBM437
However a 7zip discussion also mentions the encoding https://sourceforge.net/p/sevenzip/discussion/45798/thread/82ae0f9c/
Maybe I'm encoding in wrong manner?
@DineshSolanki commented on GitHub (Jul 25, 2024):
issue seems to be with invalid character itself instead of the encoding, I was wrong in saying that the colon was probably unicode
U+A789.the following code will run into same issue
so we have to remove invalid characters as @Morilli suggested, I tried his solution and it works
7zip
a7a1d4a241/CPP/7zip/UI/Common/ExtractingFilePath.cpp (L23)and
the SevenZipSharp also seems to does the same
d0bf5c4a3d/SevenZip/ArchiveExtractCallback.cs (L624C13-L641C14)@adamhathcock commented on GitHub (Jul 26, 2024):
Your fix names sense: we shouldn't put invalid path characters in....but seems like it wouldn't work for everything? I'm inclined to accept it as other things do it
@DineshSolanki commented on GitHub (Jul 26, 2024):
@adamhathcock It surely work for the issue we are having, its the same logic that 7zip is using