Chinese simplified file compressed file name unzip garbled #104

Open
opened 2026-01-29 22:06:38 +00:00 by claunia · 7 comments
Owner

Originally created by @qiufengilove on GitHub (Jul 9, 2016).

If the compressed file contains a simplified Chinese file, then the file will be garbled

Originally created by @qiufengilove on GitHub (Jul 9, 2016). If the compressed file contains a simplified Chinese file, then the file will be garbled
Author
Owner

@adamhathcock commented on GitHub (Jul 10, 2016):

You'll need to be more specific. The filename and contents? What container format? Got a snippet of code?

@adamhathcock commented on GitHub (Jul 10, 2016): You'll need to be more specific. The filename and contents? What container format? Got a snippet of code?
Author
Owner

@qiufengilove commented on GitHub (Jul 10, 2016):

When I change the SharpCompress.Common.ArchiveEncoding.Default default encoding to Encoding.GetEncoding ("GB2312"), unzip the file to get the simplified Chinese file name is correct, but I'm not sure whether to do not affect other,thank you.

@qiufengilove commented on GitHub (Jul 10, 2016): When I change the SharpCompress.Common.ArchiveEncoding.Default default encoding to Encoding.GetEncoding ("GB2312"), unzip the file to get the simplified Chinese file name is correct, but I'm not sure whether to do not affect other,thank you.
Author
Owner

@qiufengilove commented on GitHub (Jul 10, 2016):

I'm sorry, my English is not good.

@qiufengilove commented on GitHub (Jul 10, 2016): I'm sorry, my English is not good.
Author
Owner

@qiufengilove commented on GitHub (Jul 10, 2016):

The compressed file format I operate is zip

@qiufengilove commented on GitHub (Jul 10, 2016): The compressed file format I operate is zip
Author
Owner

@adamhathcock commented on GitHub (Sep 27, 2016):

Need a sample I guess

@adamhathcock commented on GitHub (Sep 27, 2016): Need a sample I guess
Author
Owner

@wolfmanlyq commented on GitHub (Dec 24, 2017):

I encountered the same problem.
If the zip file to be loaded contains some files whose filename contain chinese characters, the filename will be messy in sharpcompress. Cause the unicode is treated as ansi.
eg. a file in zip "一个测试文档.txt" will be loaded like as "����˵����.txt"

@wolfmanlyq commented on GitHub (Dec 24, 2017): I encountered the same problem. If the zip file to be loaded contains some files whose filename contain chinese characters, the filename will be messy in sharpcompress. Cause the unicode is treated as ansi. eg. a file in zip "一个测试文档.txt" will be loaded like as "����˵����.txt"
Author
Owner

@CasualPokePlayer commented on GitHub (May 10, 2023):

I've encountered a similer problem. The issue is simply the filename is encoded with GB2312 (or GB18030, either one works), which is not at all compatible with UTF8 nor ANSI. In the zip I have, which is set to not be UTF8, it would seem there isn't an actual fix outside of the user manually selecting an encoding for SharpCompress to use. Normally, zips only support UTF8 (general purpose bit 11 being set) or IBM Code Page 437 (aka IBM437). The zip standard does mention being able to specify a different encoding with the 0x0008 Extra Field, but the zip I have does not set any extra fields, and the zip standard states that the 0x0008 field is undefined, so it's not really useful even if the zip did specify it.

That said, there is a bug here in which SharpCompress is using ANSI / Encoding.Default encoding when it is supposed to use IBM Code Page 437 encoding (via Encoding.GetEncoding("IBM437")) when the UTF8 bit is not set. This does end up showing a noticable difference when you compare filenames against 7zip or the Windows File Explorer, which "properly" use IBM Code Page 437 in such a case.

These zips in any case can just be said to be unsupportable going off the zip standard (and really there doesn't appear to be a way to detect these zips), the solution is just for the user to manually specify encoding or manually fix up the zips to use UTF8 (which is what zips actually support, not the myriad of CJK encodings).

@CasualPokePlayer commented on GitHub (May 10, 2023): I've encountered a similer problem. The issue is simply the filename is encoded with GB2312 (or GB18030, either one works), which is not at all compatible with UTF8 nor ANSI. In the zip I have, which is set to not be UTF8, it would seem there isn't an actual fix outside of the user manually selecting an encoding for SharpCompress to use. Normally, zips only support UTF8 (general purpose bit 11 being set) or IBM Code Page 437 (aka IBM437). The zip standard does mention being able to specify a different encoding with the 0x0008 Extra Field, but the zip I have does not set any extra fields, and the zip standard states that the 0x0008 field is undefined, so it's not really useful even if the zip did specify it. That said, there is a bug here in which SharpCompress is using ANSI / Encoding.Default encoding when it is supposed to use IBM Code Page 437 encoding (via `Encoding.GetEncoding("IBM437")`) when the UTF8 bit is not set. This does end up showing a noticable difference when you compare filenames against 7zip or the Windows File Explorer, which "properly" use IBM Code Page 437 in such a case. These zips in any case can just be said to be unsupportable going off the zip standard (and really there doesn't appear to be a way to detect these zips), the solution is just for the user to manually specify encoding or manually fix up the zips to use UTF8 (which is what zips actually support, not the myriad of CJK encodings).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/sharpcompress#104