mirror of
https://github.com/adamhathcock/sharpcompress.git
synced 2026-02-07 13:44:36 +00:00
Chinese simplified file compressed file name unzip garbled #104
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @qiufengilove on GitHub (Jul 9, 2016).
If the compressed file contains a simplified Chinese file, then the file will be garbled
@adamhathcock commented on GitHub (Jul 10, 2016):
You'll need to be more specific. The filename and contents? What container format? Got a snippet of code?
@qiufengilove commented on GitHub (Jul 10, 2016):
When I change the SharpCompress.Common.ArchiveEncoding.Default default encoding to Encoding.GetEncoding ("GB2312"), unzip the file to get the simplified Chinese file name is correct, but I'm not sure whether to do not affect other,thank you.
@qiufengilove commented on GitHub (Jul 10, 2016):
I'm sorry, my English is not good.
@qiufengilove commented on GitHub (Jul 10, 2016):
The compressed file format I operate is zip
@adamhathcock commented on GitHub (Sep 27, 2016):
Need a sample I guess
@wolfmanlyq commented on GitHub (Dec 24, 2017):
I encountered the same problem.
If the zip file to be loaded contains some files whose filename contain chinese characters, the filename will be messy in sharpcompress. Cause the unicode is treated as ansi.
eg. a file in zip "一个测试文档.txt" will be loaded like as "����˵����.txt"
@CasualPokePlayer commented on GitHub (May 10, 2023):
I've encountered a similer problem. The issue is simply the filename is encoded with GB2312 (or GB18030, either one works), which is not at all compatible with UTF8 nor ANSI. In the zip I have, which is set to not be UTF8, it would seem there isn't an actual fix outside of the user manually selecting an encoding for SharpCompress to use. Normally, zips only support UTF8 (general purpose bit 11 being set) or IBM Code Page 437 (aka IBM437). The zip standard does mention being able to specify a different encoding with the 0x0008 Extra Field, but the zip I have does not set any extra fields, and the zip standard states that the 0x0008 field is undefined, so it's not really useful even if the zip did specify it.
That said, there is a bug here in which SharpCompress is using ANSI / Encoding.Default encoding when it is supposed to use IBM Code Page 437 encoding (via
Encoding.GetEncoding("IBM437")) when the UTF8 bit is not set. This does end up showing a noticable difference when you compare filenames against 7zip or the Windows File Explorer, which "properly" use IBM Code Page 437 in such a case.These zips in any case can just be said to be unsupportable going off the zip standard (and really there doesn't appear to be a way to detect these zips), the solution is just for the user to manually specify encoding or manually fix up the zips to use UTF8 (which is what zips actually support, not the myriad of CJK encodings).