mirror of
https://github.com/adamhathcock/sharpcompress.git
synced 2026-02-06 13:34:58 +00:00
Deflate Issue (Somewhere....) #326
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @leezer3 on GitHub (Oct 4, 2018).
OK, so please forgive this rather headscratching issue.
I've been digging at this for a couple of days now, and have come to the conclusion that we've got a bug in the Deflate decompressor somewhere.
This bug also appears to exist in the Microsoft Deflate stream decompressor, but not in the current Zlib.
Some background:
I maintain openBVE (game). One of the things this has, is an importer for Microsoft DirectX formatted objects.
I've discovered one object which is in compressed DirectX binary format, which simply fails to import.
In simplified terms, the DirectX binary format consists of 26 bytes of header, followed by deflate compressed data.
Broken File:
treeSet2.zip
Decompressing the stream data in this one returns 32768 bytes of data.
If I dump the header and decompress with Zlib, I get 35712 bytes of data.
(n.b. The two streams are identical until the first chokes)
Reconstructing the file in textual format tells me that the decompressor chokes when it encounters the following 4-byte single: 0.858470
Anyone with thoughts would be most welcome.
@adamhathcock commented on GitHub (Oct 5, 2018):
Without doing any digging: this sounds like a gzip file?
I guess that doesn't make any real difference.
@adamhathcock commented on GitHub (Nov 16, 2018):
If you plugged in another Deflate implementation does it work?
Also, I don’t use the Microsoft deflate implementation that’s in the framework. This one came from DotNetZip.
@adamhathcock commented on GitHub (Nov 16, 2018):
Looking at the referenced issue from OpenBVE, it could be you’re right about the zlib implementations. If there’s a newer implementation in C# I’m happy to use it
@leezer3 commented on GitHub (Nov 18, 2018):
Hmm. From some further reading around and digging, I get the nasty suspicion that these may actually be MSZIP (cab) compression.
https://msdn.microsoft.com/library/bb417343.aspx#microsoftmszipdatacompressionformat
If I understand the implications from that documentation correctly, they're deflate, but flush all buffers other than the history buffer after each block.
Not sure that helps any, archive math is not an area I've ever had any major involvement in.
@leezer3 commented on GitHub (Nov 19, 2018):
OK, so I've managed to get a working example using a slightly modified version DotNetZip
This is the main byte array extractor code, which plumbs into a very slightly modified DotNetZip Zlib implementation:
b6e937b79c/source/Plugins/Formats.DirectX/MsZip.csEssentially, for a MSZIP stream, we must read each compressed block (which must decompress to 32k or less data), and then unconditionally pass the decompressed data as the dictionary to our decoder for the next block.
This required a slight modification to the set dictionary method from DotNetZip, as in the default form it checks whether it is required & whether the adler of the dictionary is correct; Both of these must be ignored.
Porting these changes to SharpCompress woudn't be too difficult, but it would need a new stream class, e.g. MSZip stream which then performed the appropriate length checks and stuff internally, as SharpCompress doesn't expose the inflater by default; I presume this was a design choice?
Further, I don't see any easy way to do much error checking on this, other than block length checks & a check on the output against a known value, as supplying a 'wrong' dictionary in this manner appears to work, but produces garbled data.
@adamhathcock commented on GitHub (Nov 19, 2018):
I’d probably rather have a custom Stream class like GZipStream instead of exposing the zlib inflater directly just for ease of use.
I think I have memories of using WinZip on cab files that are corrupted and didn’t know until too late. Seems like no error checking is a “feature.”
I haven’t looked at the code yet.
If you feel like making a PR for a MsZipStream then it would appreciated :)