mirror of
https://github.com/adamhathcock/sharpcompress.git
synced 2026-02-04 05:25:00 +00:00
Chinese garbled character, Encoding options not work #305
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Wildcatii on GitHub (Jun 14, 2018).
At last, I located error in this section
`internal override void Read(BinaryReader reader)
{
Version = reader.ReadUInt16();
Flags = (HeaderFlags)reader.ReadUInt16();
CompressionMethod = (ZipCompressionMethod)reader.ReadUInt16();
LastModifiedTime = reader.ReadUInt16();
LastModifiedDate = reader.ReadUInt16();
Crc = reader.ReadUInt32();
CompressedSize = reader.ReadUInt32();
UncompressedSize = reader.ReadUInt32();
ushort nameLength = reader.ReadUInt16();
ushort extraLength = reader.ReadUInt16();
byte[] name = reader.ReadBytes(nameLength);
byte[] extra = reader.ReadBytes(extraLength);
I set ArchiveEncoding.Default=Encoding.GetEncoding("GBK") for Chinese
Flags result is Bit1, so the name is result of ArchiveEncoding.Decode437(name), not ArchiveEncoding.Decode(name); This makes garbled character.The name becomes
║═╞╜3G_20180201║═╞╜│╟╟°124549│ñ║⌠╙∩╥⌠┴¬═¿╓≈╜╨.rcu
I open the zip file by UE, these lines contains header.File data between ':' and ';'
00000000h: 50 4B 03 04 14 00 02 00 08 00 BD 6D 42 4C 86 7A ; PK........絤BL唞
00000010h: 36 26 75 65 6D 01 37 5C 6E 01 37 00 11 00 BA CD ; 6&uem.7\n.7...和
00000020h: C6 BD 5F 33 47 5F 32 30 31 38 30 32 30 31 5F BA ; 平_3G_20180201_?
00000030h: CD C6 BD B3 C7 C7 F8 5F 31 32 34 35 34 39 5F B3 ; 推匠乔鴂124549_?
00000040h: A4 BA F4 D3 EF D2 F4 5F C1 AA CD A8 5F D6 F7 BD ; ず粲镆鬫联通_主?
00000050h: D0 2E 72 63 75 55 54 0D 00 07 16 FB 73 5A C5 2C ; ?rcuUT....鹲Z?
The Flag result comes from first line , it's 02
I use other unzip software,it's ok, the name is
和平_3G_20180201_和平城区_124549_长呼语音_联通_主叫.rcu
I use SevenZipSharp,it's also ok.
I visit http://www.pkware.com/documents/casestudies/APPNOTE.TXT
Bit11(In this project it's Efs)
Bit 11: Language encoding flag (EFS). If this bit is set,
the filename and comment fields for this file
MUST be encoded using UTF-8. (see APPENDIX D)
Look, Bit11 must be encoded using utf-8.
So, I think some problem in this code.
if (Flags.HasFlag(HeaderFlags.Efs))
{
Name = ArchiveEncoding.Decode(name);
}
else
{
// Use IBM Code Page 437 (IBM PC character encoding set)
Name = ArchiveEncoding.Decode437(name);
}`
@sciarium commented on GitHub (Jul 9, 2018):
My pull request concerning similar issue was approved:
https://github.com/adamhathcock/sharpcompress/pull/385
I think it will solve your issue.
@Wildcatii commented on GitHub (Jul 10, 2018):
@sciarium Thank you, I download the latest version, it works well, the bug has fixed.